International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 16 Issue 2 April-June 2025 Submit your research before last 3 days of June to publish your research paper in the issue of April-June.

Improving Document Digitization with Machine Learning-Based OCR

Author(s) Sri Charitha Pagadala, Pulletikurthi Nithisha, Pallikonda Rahul, Musiboina Ram Mohan Rao, Dr. Akkineni. Haritha
Country India
Abstract In today’s digital era, the extraction of text from unstructured formats such as images, PDFs, and handwritten documents is critical for digitization and automation. Traditional methods often struggle with scalability , complex layouts and multi-language support. This project addresses these challenges by leveraging Machine Learning, Optical Character Recognition (OCR), AWS Textract model and microservices architecture to create a robust, scalable, and efficient text extraction system.
The proposed solution integrates advanced technologies such as Java Spring Boot for backend development, PostgreSQL for secure data storage, and containerized microservices for enhanced modularity and scalability. The system performs preprocessing to improve image quality, employs deep learning algorithms for accurate text recognition. Parallel processing and task queuing ensure high throughput and low latency for real-time and bulk operations.
By converting unstructured data into structured like JSON or CSV ,this system
facilitates seamless integration into existing workflows. This study highlights the design, functionality, and benefits of this innovative approach to text extraction, driving efficiency in document management and automation.
Keywords TextExtraction,OpticalCharacterRecognition(OCR),ImageProcessing, AWS Textract, PostgreSQL.
Field Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In Volume 16, Issue 1, January-March 2025
Published On 2025-02-12
Cite This Improving Document Digitization with Machine Learning-Based OCR - Sri Charitha Pagadala, Pulletikurthi Nithisha, Pallikonda Rahul, Musiboina Ram Mohan Rao, Dr. Akkineni. Haritha - IJSAT Volume 16, Issue 1, January-March 2025. DOI 10.71097/IJSAT.v16.i1.1890
DOI https://doi.org/10.71097/IJSAT.v16.i1.1890
Short DOI https://doi.org/g85dkq

Share this