Supplementary

May students have expressed particular interest in extracting tables from PDFs. For this, I will present here some tools for the specific use case. If your PDF is actually a set of images (as we discuss in this Unit), you might try ‘Nougat’ by Meta.

If interested, you can also view the following Colab notebook and video demonstration.

Video Demonstration of Nougat

Example Notebook for Nougat

If your PDFs have the PostScript content intact, you might consider the relatively simpler Camelot package.