<img height="1" width="1" src="https://www.facebook.com/tr?id=1902375720037756&amp;ev=PageView &amp;noscript=1">
post-header.png

Why It's Important to Use OCR to Make PDFs Work for YOU

trumpet pdfs and symphony ocr.jpeg

A PDF, or a portable document format, is as common an office document type as a Word or Excel document. But did you know there are different types of PDFs? In fact, the type of PDF you have will determine if you can text search within the document. 

You might wonder “why does it matter if I can text-search within a document?” But how many times have you saved a document but can’t remember where you saved it or the title of the document? If you can search within the text of the PDF, the chances of you finding the document during a search greatly improves.   

Let’s take a look at four types of PDF files:

  1. Image-only PDF – When a document is scanned it becomes an imaged document.  If you try to text search for the document you saved in this format, the search results won’t return this type, making this a document that can’t be “found.”
  2. Rendered PDF – This is a PDF created by a computer (i.e. a Word document converted to PDF). It contains computer readable text by default, so it is fully text-searchable.   
  3. Hybrid PDF – This is a PDF that contains both images and rendered content or annotations. For example, if you scan a document and then use Adobe's markup tools to annotate the image, the PDF will be a hybrid PDF.
  4. Image + text PDF – This is a PDF that is created when the OCR engine 'reads' an image-only PDF and adds a layer of invisible, computer readable text to the original image. These files retain the exact original image, but also provide the ability to perform context sensitive search for text inside the PDF, as well as copying text to the Windows clipboard.

Subscribe to our blog

In the Image + text PDF type we talk about OCR, but what is OCR software? OCR (optical character recognition) software is a technology that enables the conversion of a scanned PDF document into text-searchable data. If your firm does any scanning then it is critical that you have an OCR solution that is seamless to use. Symphony OCR, Trumpet’s OCR engine, monitors your documents repository for any newly saved documents that need to be OCRed. When it finds a new document it automatically OCRs the document without you having to manually do anything. Once OCR is installed “it just works.”

If you’re curious about how many documents you have in your repository that can’t be “found,” request a free document analysis to find out how many image-only PDFs your firm has. It may be time to invest in the OCR solution. 

Get a free trial with Symphony OCR Analyzer

Share:


Subscribe to The Efficiency Beat

Author

Liz Levenson

Liz Levenson

Before joining Trumpet, Liz was a co-owner of a small inbound marketing agency in Phoenix, Arizona. She loves creating marketing processes that allow marketers to take out the guesswork and focus on creative campaigns and a great customer experience. Her 4 years as a marketer has taught her that marketing is no longer a sales tool for the company: it is an education tool for customers. In her free time, Liz is either travelling or dreaming about travelling. She also enjoys cooking, playing board games with friends, playing her ukulele, spending time with her nieces, and lovingly harassing her cat, Holly.