A Comprehensive Guide to OCR
Optical Character Recognition/Reader (OCR) is one of the earliest proposed computer vision tasks, as some aspects of it do not require deep learning. Therefore, there were different OCR implementations even before the deep learning boom in 2012, and some even dated back to 1914. This induces many people to think that the OCR challenge is addressed and is no longer obstructing. Anyone who practices computer vision, or machine learning in general, knows that there is no such thing as a solved task, and this case is not different. On the contrary, OCR yields very-good results only on very specific use cases, but generally, it’s still considered challenging. Additionally, it’s true that there are good solutions surely OCR assignments that don’t need deep learning algorithms. However, to really forge ahead towards better and more general solutions, deep learning will be the go-to option.
Types of OCR
As discussed earlier, there are multiple meanings for OCR. In its most general meaning, it refers to extracting text from every possible image, be it a standard printed page from a book, or a random image with graffiti in it (“in the wild”). In between, you’ll find many other tasks, like reading license plates, no-robot captchas, street signs etc.
Although each of these options has its own difficulties, clearly “in the wild” task is the hardest. From these examples we can draw out some attributes of the OCR Reader tasks:
Text density: on a printed/written page, the text is dense. However, given an image of a street with a single street sign, the text is sparse.
Structure of text: text on a page is structured, mostly in strict rows, while text within the wild could also be sprinkled everywhere, in several rotations.
Fonts: printed fonts are easier since they’re more structured than the noisy hand-written characters.
Character type: text may are available in different languages which can be very different from one another. Additionally, the structure of text could also be different from numbers, like house numbers etc.
Artifacts: Outdoor pictures are much noisier than the comfortable scanner.
Location: Some tasks entail cropped/centered text, while in others, text may be located in random locations in the image.
Deep Learning and OCR
Deep learning approaches have improved over a previous couple of years, reviving an interest within the OCR problem, where neural networks are often wont to combine the tasks of localizing text in a picture alongside understanding what the text is. Using deep convolutional neural architectures and attention mechanisms and recurrent networks have gone a long way in this regard.
One of these deep learning approaches is that the basis of Attention — OCR, the library we are getting to be using to predict the text in number plate images.
Think of it like this. The overall pipeline for several architectures for OCR tasks follows a convolutional network to extract image features as encoded vectors followed by a recurrent network. They use these encoded features to predict where each of the letters within the image text could be and what they’re.
If you want to build any OCR-based system, stay in touch with KritiKal Solutions. We are the industry leader in ideating and innovating world-class OCR solutions for our customers.
Visit https://kritikalsolutions.com/products/ocr/ or contact us at sales@kritikalsolutions.com for more details.