International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 17 Issue 2 April-June 2026 Submit your research before last 3 days of June to publish your research paper in the issue of April-June.

Self-Healing Architecture for Mission-Critical Healthcare Integration Engines: Autonomous Recovery Mechanisms for Channel Failures, Resource Exhaustion, and Connectivity Disruptions

Author(s) Sindhukumar Sundaram
Country United States
Abstract Healthcare integration engines are mission-critical infrastructure whose availability directly impacts clinical care delivery. Channel-level failures — caused by downstream system unavailability, malformed messages, resource exhaustion, or transient network disruptions — require immediate remediation to prevent message loss, workflow interruption, and patient safety compromise. Despite this criticality, current integration engines rely predominantly on manual intervention for failure recovery, with operational teams performing reactive troubleshooting through log analysis, channel restart procedures, and escalation workflows. This manual dependency introduces unacceptable latency into the recovery process, particularly during off-hours when staffing is reduced. This paper designs, implements, and evaluates a Self-Healing Integration Architecture (SHIA) that embeds autonomous detection-diagnosis-recovery control loops within the integration engine layer. SHIA operates through three coordinated subsystems: a real-time health monitor maintaining continuous channel state models, a diagnostic classifier mapping failure signatures to known fault categories, and a recovery orchestrator executing context-appropriate remediation actions. Evaluation in a controlled environment processing 300 messages/second across 150 channels with synthetically injected failures across five fault categories demonstrates autonomous recovery in 94.2% of scenarios with mean time-to-recovery of 34 seconds, compared to 23 minutes for manual recovery baseline. Zero-message-loss recovery is achieved in 88% of scenarios through pre-failure queue checkpointing. False-intervention rate is maintained at 1.8% through multi-signal confirmation.
Keywords self-healing systems, autonomous recovery, healthcare integration, fault tolerance, resilience engineering, middleware reliability, channel management, mission-critical systems, HL7, FHIR
Field Engineering
Published In Volume 17, Issue 2, April-June 2026
Published On 2026-05-15
DOI https://doi.org/10.71097/IJSAT.v17.i2.11327

Share this