As a plan reviewer, you may be able to work with several sets at a time — but you only have one pair of eyes to find the specific page you need. This makes the process challenging, no matter how efficient an organizational system you have. You're used to having paper plan sets and the storage space required to house them. It's not easy to keep track of it all, much less keep it sorted properly.
What if all of those plan sets were made digital? You wouldn't have to spend so much on storage space. You could sort and search the documents far more easily. And, perhaps best of all, you could collaborate on the same digital plan set in real time with other plan reviewers.
Optical character recognition (OCR) enables you to take all those paper plan sets and make them digitized and editable. When you mark up the set, everyone can see the changes you make as you're making them.
OCR is a technology for making digital, machine-readable and editable text from a physical document or PDF file to edit in another program such as Microsoft Word or Google Docs. With OCR capabilities, you can scan something like a paper or plan set and turn the written text into something machine-readable.
On paper, an OCR system is simple: OCR software works with a scanner to turn physical text into digitized text. This creates a picture for the software to analyze. When the image is formed, the software uses one of two methods — pattern recognition or feature detection — to look for individual letters, numbers and symbols. Another black-and-white image is created with all the text being black on a white background. This uses data extraction to create a machine-readable digital document that can be edited.
The challenge an OCR program aims to overcome is being able to read various fonts and styles of text while still accurately scanning and processing it. This is true for digital text but is even more of an issue with handwritten text. In this instance, the OCR software needs to be able to analyze the handwriting for patterns and design a font that matches the handwriting.
Another issue is separating the text from noise. Physical documents are rarely perfect and can contain dust, creases or other imperfections. OCR software needs to differentiate between the text and other artifacts to produce an accurate document.
There are two types of OCR: Pattern recognition and feature detection. Each works differently, though they have the same end of making text editable.
Pattern recognition works by comparing the physical text with the digital text in the software. The software has a library of letters, numbers and symbols it recognizes and matches them to the original document. Whenever it finds a match, it creates that text in editable form. The OCR software typically has a wide range of fonts and formats to better match the text to its digital counterpart.
Feature detection identifies text with rules for what makes up the information. For example, if it's looking for an A, then the software knows how an A is drawn utilizing its rules: the slants and lines. If it has curves, such as with the letter C, it recognizes the letter because it knows the curve makes a C a C. This is a slightly smarter method of OCR and is often used for deep learning.
Before OCR, the only option for turning text into digital form was to manually type it into a device. If you wanted to copy a newspaper article, for example, you would have to read the physical article and copy the text with a keyboard, similar to how scribes used to copy books by hand before the printing press was invented. Naturally, this was a painstaking process.
Some might argue that the first imagining of OCR was the Optophone, invented in 1913 by Dr. Edmund Fournier d'Albe. But Ray Kurzweil was the creator of the first modern OCR. Its original use was to enable blind people to read in 1974 through text-to-speech. Kurzweil's company, Kurzweil Computer Products, Inc., was sold to Xerox in 1980.
OCR took off as a means of digitizing newspapers in the early 1990s. Computer scientists have continuously improved OCR accuracy until we arrived at today's solutions.
STR is a form of OCR that utilizes computer vision to read text against natural scenes instead of merely as black letters on a white background. It's a common technology in self-driving cars, for example. With STR, the car can read road signs, logos and billboards, among other things.
Plan reviewers, doctors, lawyers, retail clerks, IT personnel — almost everyone can take advantage of OCR.
OCR is used for a massive range of applications. Here are some examples:
OCR, at its core, digitizes and localizes data. Rather than have all your information scattered in both digital and physical form, you can turn it all digital. This has several advantages:
To make a PDF editable, you need an OCR tool. A PDF reader only creates a static image; it is not editable unless first processed with OCR technology.
Most OCR solutions work behind the scenes. In e-PlanSoft™ goPost™, OCR happens automatically when the plan is formatted for e-PlanReview®.
Once project applications are submitted, they're tracked and managed within goPost™. PDF Scout™, an application in goPost™, scans the application for viability. This includes checking for resolution, PDF version, that it has no attachments and is in a proper state to be reviewed. If the plan set fails the test, the software will inform you of why so you can make the proper corrections. Any documents that don't pass PDF Scout's test cannot be submitted for review.
Using goPost™, OCR technology reads the sheet numbers on the plan set and sorts them automatically. This helps you avoid constantly sorting through page after page of sheets, instead putting it all in order for you.
OCR is a popular technology with many different uses, and it's come a long way since its inception. The accuracy of OCR and the advancement of related technologies means it's become much more reliable and easier to use over time. It can be invaluable in saving you time and accelerating your workflow.
With e-PlanSoft's line of products, you can ensure high-quality digital plan sets. Request a demo today to learn more and get started.