Supplementary

May students have expressed particular interest in extracting tables from PDFs. For this, I will present here some tools for the specific use case. If your PDF is actually a set of images (as we discuss in this Unit), you might try ‘Nougat’ by Meta.

If interested, you can also view the following Colab notebook and video demonstration.

If your PDFs have the PostScript content intact, you might consider the relatively simpler Camelot package.