International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 17 Issue 2 April-June 2026 Submit your research before last 3 days of June to publish your research paper in the issue of April-June.

Optical Character Recognition Accuracy on Degraded Documents

Author(s) Mr. Marc Jordan Ceballos Saladaga, Florence Jean Talirongan
Country Philippines
Abstract This paper examines the OCR performance of EasyOCR across four types of physical degradation: crumpled, wet, dirty, and normal documents. Specifically, this work seeks to understand the effect of physical document degradation on OCR performance and to assist in improving digitization workflows for academic and administrative documents. The experimental setup was rigorously developed, starting with preparing a piece of text to be printed, physically degrading it into one of four types, and finally scanning it. The scan is then processed, and OCR is applied. OCR accuracy is calculated at the character and word levels, at a defined rate, and statistics are generated by summarizing the collected data in tables using descriptive statistics.
It was found that OCR performance depends largely on the document's physical condition. By comparing other documents relative to the normal one, which achieved an average accuracy of 94.7%, the performance obtained for the crumpled document averaged 86.9%, and for the dirty document, it averaged 80.9%. This clearly shows that wrinkling and smudges distort character shapes and reduce their recognizability. The wet documents had the lowest average accuracy, at 72.7%. This demonstrates that stroke blurring and faded ink are critical factors in OCR failure. It has to be considered that OCR systems can be prone to failure with physically damaged documents, especially when water damage is present, which poses the greatest problem.
It is proposed that preliminary preprocessing methods, such as noise reduction, enhancement, and morphological operations, be employed to restore damaged texts and improve OCR accuracy. An adaptive OCR pipeline, its restoration component, and multilingual capacity should be incorporated into the process to increase the efficiency of digitization at the institutional level. By providing deeper insights into OCR systems' susceptibility to document degradation, this work can further strengthen efforts to achieve accurate automated text extraction and support archival purposes, academic information resources, and research on the design of digitization techniques.
Keywords Optical Character Recognition, EasyOCR, document degradation, accuracy analysis, digitization
Field Computer > Data / Information
Published In Volume 17, Issue 2, April-June 2026
Published On 2026-06-02

Share this