Supplementary
May students have expressed particular interest in extracting tables from PDFs. For this, I will present here some tools for the specific use case. If your PDF is actually a set of images (as we discuss in this Unit), you might try ‘Nougat’ by Meta.
If interested, you can also view the following Colab notebook and video demonstration.
- Video Demonstration of Nougat
If your PDFs have the PostScript content intact, you might consider the relatively simpler Camelot package.