Improving Document Digitization with Machine Learning-Based OCR

Sri Charitha Pagadala; Pulletikurthi Nithisha; Pallikonda Rahul; Musiboina Ram Mohan Rao; Dr. Akkineni. Haritha

doi:10.71097/IJSAT.v16.i1.1890

Improving Document Digitization with Machine Learning-Based OCR

Author(s)	Sri Charitha Pagadala, Pulletikurthi Nithisha, Pallikonda Rahul, Musiboina Ram Mohan Rao, Dr. Akkineni. Haritha
Country	India
Abstract	In today’s digital era, the extraction of text from unstructured formats such as images, PDFs, and handwritten documents is critical for digitization and automation. Traditional methods often struggle with scalability , complex layouts and multi-language support. This project addresses these challenges by leveraging Machine Learning, Optical Character Recognition (OCR), AWS Textract model and microservices architecture to create a robust, scalable, and efficient text extraction system. The proposed solution integrates advanced technologies such as Java Spring Boot for backend development, PostgreSQL for secure data storage, and containerized microservices for enhanced modularity and scalability. The system performs preprocessing to improve image quality, employs deep learning algorithms for accurate text recognition. Parallel processing and task queuing ensure high throughput and low latency for real-time and bulk operations. By converting unstructured data into structured like JSON or CSV ,this system facilitates seamless integration into existing workflows. This study highlights the design, functionality, and benefits of this innovative approach to text extraction, driving efficiency in document management and automation.
Keywords	TextExtraction,OpticalCharacterRecognition(OCR),ImageProcessing, AWS Textract, PostgreSQL.
Field	Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In	Volume 16, Issue 1, January-March 2025
Published On	2025-02-12
DOI	https://doi.org/10.71097/IJSAT.v16.i1.1890

View / Download PDF File

About IJSAT Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics	Join as a Reviewer Editors & Reviewers Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us	Message on WhatsApp	+91-9687-182-185	editor@ijsat.org

International Journal on Science and Technology

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Improving Document Digitization with Machine Learning-Based OCR

Share this