Answer the question
In order to leave comments, you need to log in
Which font is difficult to OCR, but is human readable?
It is clear that if you completely rest and make unrecognizable text, you get a CAPTCHA. But at the same time, it is important that you can read a page of such text without straining. Those. gothic fonts and "handwritten" are no longer available. What's left?
Answer the question
In order to leave comments, you need to log in
Well, in general, what I came to when I was cracking captchas, then all sorts of noise and contrast are cut out easily (as written above), all sorts of options with color variation also do not give anything. More or less effective options:
- these are characters with gaps, like old stencils, where letters and numbers were drawn with a pen, while the gaps should be comparable with the sizes of letters and spaces (in the sense of a couple of pixels for a 72pt font, it won’t give anything at all, it should be proportional).
- this is the imposition of letters on top of each other, but readability begins to suffer.
is a character frame, i.e. the insides of the character have a background color and only 1 pixel border is visible, and if this border is still a dotted line, then most OCR will remove these letters as noise
- these are heavily distorted symbols, if the waves can somehow be restored normally, although an individual approach is required, then distortion (I don’t remember what it’s called in Photoshop), take a rectangular polygon under the symbol and stretch several vertices disproportionately, something like a pseudo 3d transformation, the letter is disproportionately stretched OCR is already starting to break off.
But that's all for captcha.
Any font can be recognized if the letters are the same and there are no interferences (similar to the elements of the letters). Theoretically, recognition is the more difficult, the more the letters are similar to each other (for example, "O" to "P") or intersect. To complicate the task, you can distort the letters (rotation, non-linear compression, positioning, etc.) and enter the noise itself (some lines of text color crossing the letters). But then, as you said, you get a big captcha, which the reader is unlikely to be happy with.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question