A PDF, or a portable document format, is as common an office document type as a Word or Excel document. But did you know there are different types of PDFs? In fact, the type of PDF you have will determine if you can text search within the document.
You might wonder “why does it matter if I can text-search within a document?” But how many times have you saved a document but can’t remember where you saved it or the title of the document? If you can search within the text of the PDF, the chances of you finding the document during a search greatly improves.
Let’s take a look at four types of PDF files:
- Image-only PDF – When a document is scanned it becomes an imaged document. If you try to text search for the document you saved in this format, the search results won’t return this type, making this a document that can’t be “found.”
- Rendered PDF – This is a PDF created by a computer (i.e. a Word document converted to PDF). It contains computer readable text by default, so it is fully text-searchable.
- Hybrid PDF – This is a PDF that contains both images and rendered content or annotations. For example, if you scan a document and then use Adobe's markup tools to annotate the image, the PDF will be a hybrid PDF.
- Image + text PDF – This is a PDF that is created when the OCR engine 'reads' an image-only PDF and adds a layer of invisible, computer readable text to the original image. These files retain the exact original image, but also provide the ability to perform context sensitive search for text inside the PDF, as well as copying text to the Windows clipboard.
In the Image + text PDF type we talk about OCR, but what is OCR software? OCR (optical character recognition) software is a technology that enables the conversion of a scanned PDF document into text-searchable data. If your firm does any scanning then it is critical that you have an OCR solution that is seamless to use. Symphony OCR, Trumpet’s OCR engine, monitors your documents repository for any newly saved documents that need to be OCRed. When it finds a new document it automatically OCRs the document without you having to manually do anything. Once OCR is installed “it just works.”
If you’re curious about how many documents you have in your repository that can’t be “found,” request a free document analysis to find out how many image-only PDFs your firm has. It may be time to invest in the OCR solution.