Definition of

OCR

OCR is the acronym for Optical Character Recognition.

OCR is the acronym for Optical Character Recognition , an expression in English that can be translated as Optical Character Recognition . The notion is used in computing to name a procedure that allows text to be digitized through a scanner .

What OCR makes possible is that, when passing text through a certain device, the system recognizes the characters as part of an alphabet . Thus, the scanned document can be edited with a word processor , since it is not stored as an image.

In this way, OCR makes the work that many people have to do easier. If someone scans a book with the intention of making a summary, thanks to OCR they will be able to interact with the scanned text through a program such as Microsoft Word , cutting, copying and pasting any word, something impossible if such a recognition process is not carried out. since the computer is unable to understand the text found in an image.

Advantages of OCR

In addition to the obvious advantage of storing text as such and not as an image, there is the considerable difference in weight: images can take up much more disk space than texts, and this must be taken into account if you want to have books scanned integers. Of course, it is not advisable in all cases for the computer to perform the OCR, especially if there is no intention to edit the content.

It is curious that just one application can change the capacity of the same computer so drastically, but this is what happens in all cases: although modern processors can be very efficient, especially if they are combined with latest generation memory and disks, They are of no use to us without the appropriate programs, which is why the same machine can go from being useless to extremely advanced simply because of the software it has.

The case of OCR is very particular, since it gives the computer a skill that is basic for most human beings: reading. It is worth mentioning that this is not an easy task for either of us, although in our case we usually learn to do it from a very young age, which is why we acquire great skill, even when we have to face calligraphy that is difficult to understand.

OCR makes it easier to digitize text.

Its drawbacks

Despite the advancement of technology , OCR still faces various problems. Getting a digital system to recognize handwritten text, for example, is quite difficult. The process usually encounters problems in segmenting the various text units. The same thing happens when words appear very close together.

Other OCR failures can appear when there is not enough contrast between the words and the background. Suppose that a text written in black letters is printed on a gray sheet of paper: it is likely that the OCR process will not be able to distinguish the letters and the words .

Let us not forget that, just as an action apparently as simple as walking down the street requires a series of complementary actions to avoid obstacles and protect our integrity, reading a printed text is the result of several simultaneous recognition tasks, which we carry out. carried out almost unconsciously, but they take work.

When faced with a text, our own OCR system is responsible for searching and recognizing the title, identifying paragraphs, punctuation marks, spaces between words and abbreviations, among other elements, in addition to making an effort to understand the sources. too ornate or sloppy and to complete the information in regions that have suffered any type of wear, such as an ink stain or a missing piece of paper.

Continue with Scanner →