pdfs

Google indexes images within PDF files

If you publish a PDF with images and text on the Internet, Google will be able to recognize what has been written to offer it as a result in your search engine, but also be able to find the document images to display them in its well-known Google images.

The OCR (text recognition system) that Google uses, has already been able to find the text of PDF files since 2008, but until now nothing had been said about what happens to images of this type of document.

Although Google has not been the source of this news (it has not been published as a new feature anywhere), it has been detected in the googlesystem, where they show some examples with results that come from pdf files.

Technically it does not represent much difficulty, now we must focus on a better classification of the images found (in Google Photos we have already been able to verify that the advances in this sense are quite good) and on a better system of recognition of handwritten characters (something that does not it is not at all simple).

About the license of these images: it is even more difficult to find out, since in theory the license must be the same as that of the PDF document, and many of the files with this format published on the Internet do not indicate the license with which it is shared, so be careful when copying them for your own projects.