KAIZEN: Governed Continual Improvement for LLM-Backed Enterprise Systems Drift Detection, Intervention Selection, and Progressive Delivery for Multi-Artifact LLM Applications

Sandeep Nutakki

doi:10.71097/IJSAT.v17.i2.11319

KAIZEN: Governed Continual Improvement for LLM-Backed Enterprise Systems Drift Detection, Intervention Selection, and Progressive Delivery for Multi-Artifact LLM Applications

Author(s)	Sandeep Nutakki
Country	United States
Abstract	Large language model (LLM) applications change after launch through prompts, retrieval indexes, tool schemas, routing rules, guardrails, and fine-tuned weights, yet many MLOps practices still monitor only checkpoints and aggregate accuracy. This paper presents KAIZEN, an architecture and controlled replay benchmark for governed continual improvement of LLM-backed enterprise systems. KAIZEN combines behavioral-semantic drift detection, budget-aware intervention selection, human-in-the-loop curation, and risk-tiered progressive delivery. We evaluate KAIZEN in a controlled 18-month replay spanning 43,200 synthetic enterprise requests, 36 injected drift events, 54 candidate releases, and 1,440 adjudicated evaluation cases. The study is not a production deployment; it is a reproducible replay under known ground truth. In the controlled replay, KAIZEN improved incident detection F1 from 0.58 to 0.82 under the stated workload and drift assumptions, and detected degradation 5.7 days earlier on median (95% clustered bootstrap CI: 4.3-7.1, p < 0.001). Under the same assumptions, KAIZEN reduced unnecessary retraining actions by 38.9% relative to scheduled monthly retraining (95% CI: 29.4-47.6, p = 0.002) and lowered total simulated improvement cost by 46.2% (95% CI: 39.4-52.8, p < 0.001). Compared with direct rollout in the replay, progressive delivery reduced simulated user-visible regression exposure from 19.6% to 3.1% of affected traffic (95% CI for relative reduction: 79.1-88.4, p < 0.001). These results support KAIZEN as an architecture and benchmark design; they do not establish field performance.
Keywords	MLOps, LLMOps, model monitoring, concept drift, large language models, continual learning, progressive delivery, human-in-the-loop learning, model governance
Field	Engineering
Published In	Volume 17, Issue 2, April-June 2026
Published On	2026-06-04
DOI	https://doi.org/10.71097/IJSAT.v17.i2.11319

View / Download PDF File

doi

CrossRef DOI is assigned to each research paper published in our journal.

IJSAT DOI prefix is
10.71097/IJSAT

Downloads

Research Paper Format Copyright Permission Form and Undertaking Form Cover Page Vol 17 Isu 2 Cover Page Vol 17 Isu 1 Cover Page Vol 16 Isu 4

All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.

CC-BY-SA

About IJSAT Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us	Message on WhatsApp	+91-9687-182-185	editor@ijsat.org

International Journal on Science and Technology

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

KAIZEN: Governed Continual Improvement for LLM-Backed Enterprise Systems Drift Detection, Intervention Selection, and Progressive Delivery for Multi-Artifact LLM Applications

Share this