
International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 16 Issue 2
2025
Indexing Partners



















Improving Document Digitization with Machine Learning-Based OCR
Author(s) | Sri Charitha Pagadala, Pulletikurthi Nithisha, Pallikonda Rahul, Musiboina Ram Mohan Rao, Dr. Akkineni. Haritha |
---|---|
Country | India |
Abstract | In today’s digital era, the extraction of text from unstructured formats such as images, PDFs, and handwritten documents is critical for digitization and automation. Traditional methods often struggle with scalability , complex layouts and multi-language support. This project addresses these challenges by leveraging Machine Learning, Optical Character Recognition (OCR), AWS Textract model and microservices architecture to create a robust, scalable, and efficient text extraction system. The proposed solution integrates advanced technologies such as Java Spring Boot for backend development, PostgreSQL for secure data storage, and containerized microservices for enhanced modularity and scalability. The system performs preprocessing to improve image quality, employs deep learning algorithms for accurate text recognition. Parallel processing and task queuing ensure high throughput and low latency for real-time and bulk operations. By converting unstructured data into structured like JSON or CSV ,this system facilitates seamless integration into existing workflows. This study highlights the design, functionality, and benefits of this innovative approach to text extraction, driving efficiency in document management and automation. |
Keywords | TextExtraction,OpticalCharacterRecognition(OCR),ImageProcessing, AWS Textract, PostgreSQL. |
Field | Computer > Artificial Intelligence / Simulation / Virtual Reality |
Published In | Volume 16, Issue 1, January-March 2025 |
Published On | 2025-02-12 |
Cite This | Improving Document Digitization with Machine Learning-Based OCR - Sri Charitha Pagadala, Pulletikurthi Nithisha, Pallikonda Rahul, Musiboina Ram Mohan Rao, Dr. Akkineni. Haritha - IJSAT Volume 16, Issue 1, January-March 2025. DOI 10.71097/IJSAT.v16.i1.1890 |
DOI | https://doi.org/10.71097/IJSAT.v16.i1.1890 |
Short DOI | https://doi.org/g85dkq |
Share this


CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
