International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJSAT
Upcoming Conference(s) ↓
Conferences Published ↓
ALSDAHW-2025
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 17 Issue 2
April-June 2026
Indexing Partners
Optical Character Recognition Accuracy on Degraded Documents
| Author(s) | Mr. Marc Jordan Ceballos Saladaga, Florence Jean Talirongan |
|---|---|
| Country | Philippines |
| Abstract | This paper examines the OCR performance of EasyOCR across four types of physical degradation: crumpled, wet, dirty, and normal documents. Specifically, this work seeks to understand the effect of physical document degradation on OCR performance and to assist in improving digitization workflows for academic and administrative documents. The experimental setup was rigorously developed, starting with preparing a piece of text to be printed, physically degrading it into one of four types, and finally scanning it. The scan is then processed, and OCR is applied. OCR accuracy is calculated at the character and word levels, at a defined rate, and statistics are generated by summarizing the collected data in tables using descriptive statistics. It was found that OCR performance depends largely on the document's physical condition. By comparing other documents relative to the normal one, which achieved an average accuracy of 94.7%, the performance obtained for the crumpled document averaged 86.9%, and for the dirty document, it averaged 80.9%. This clearly shows that wrinkling and smudges distort character shapes and reduce their recognizability. The wet documents had the lowest average accuracy, at 72.7%. This demonstrates that stroke blurring and faded ink are critical factors in OCR failure. It has to be considered that OCR systems can be prone to failure with physically damaged documents, especially when water damage is present, which poses the greatest problem. It is proposed that preliminary preprocessing methods, such as noise reduction, enhancement, and morphological operations, be employed to restore damaged texts and improve OCR accuracy. An adaptive OCR pipeline, its restoration component, and multilingual capacity should be incorporated into the process to increase the efficiency of digitization at the institutional level. By providing deeper insights into OCR systems' susceptibility to document degradation, this work can further strengthen efforts to achieve accurate automated text extraction and support archival purposes, academic information resources, and research on the design of digitization techniques. |
| Keywords | Optical Character Recognition, EasyOCR, document degradation, accuracy analysis, digitization |
| Field | Computer > Data / Information |
| Published In | Volume 17, Issue 2, April-June 2026 |
| Published On | 2026-06-02 |
Share this

CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.