Incorporate Structure and Content for Devanagari Table Extraction

Author(s)	Ms. Anuja Ramu Dumada, Prof. Sandeep G. Shah
Country	India
Abstract	Table extraction is a crucial component of document image analysis, enabling the transformation of unstructured document content into structured, machine-readable data. Although significant progress has been achieved for English and other Latin scripts, table extraction from Devanagari-script documents remains challenging due to script complexity, limited annotated datasets, and poor-quality scans. This research proposes a hybrid framework that combines structural analysis with content-based validation to enhance table extraction accuracy for Devanagari documents. The framework integrates preprocessing, Devanagari-specific OCR post-correction, semantic consistency checks, and confidence-based feature fusion. Experimental results on over 500 annotated Devanagari documents demonstrate improved performance compared to existing methods, achieving high structural precision, OCR accuracy, and overall TEDS-S score.
Keywords	Table Extraction, Devanagari Script, Document Image Analysis, OCR, Hybrid Framework, TEDS-S
Field	Computer
Published In	Volume 17, Issue 1, January-March 2026
Published On	2026-02-06
DOI	https://doi.org/10.71097/IJSAT.v17.i1.10314

About IJSAT Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics	Join as a Reviewer Editors & Reviewers Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us	Message on WhatsApp	+91-9687-182-185	editor@ijsat.org

International Journal on Science and Technology