Hospitals Buy Clinical AI Like Software. It Decays Like a Drug.

The Joint Commission started certifying hospitals for responsible AI in June 2026. The standard most will fail is the one that treats a clinical model as a perishable asset, not a one-time install.

The Joint Commission began certifying hospitals for responsible AI use in the first week of June 2026, and the standard most of them will struggle with is not the one about governance committees or patient disclosure. It is the requirement to keep watching the model after go-live [HCI Innovation Group, 2026]. Most health systems are not built to. They buy clinical AI the way they buy software: install it, validate it once, move on to the next contract. A clinical model does not behave like software. It behaves like a drug. It works on the population it was tested in, its effect shifts as that population shifts, and left unwatched it can start doing harm while everyone assumes it still works.

The Problem

Health systems are deploying clinical AI faster than anyone can monitor it, and at most hospitals nobody owns the monitoring.

Adoption stopped being the interesting question. Seventy-five percent of US health systems now run at least one AI application, up from 59% a year earlier, and half run three or more [Fierce Healthcare, 2026]. The FDA has authorized more than 1,400 AI-enabled devices since it started counting, 331 of them in 2025, the most in any single year [MedTech Dive, 2026]. The tools are in the building. What is missing is a named person whose job is to confirm they still work next quarter.

The evidence that they might not is already published. A March 2026 meta-analysis in PLOS Digital Health pooled 50 studies of predictive clinical AI across 17 specialties and found a median discrimination of 0.652 on the AUC scale, moderate, well short of the figure on most sales decks [PLOS Digital Health, 2026]. The numbers underneath are worse than the average. Only 24% of those tools were ever tested in a live prospective deployment. Roughly 70% were never externally validated outside the data they were built on. Algorithmic bias was formally assessed in 4% of the studies [PLOS Digital Health, 2026]. A model that looks strong in a vendor’s retrospective dataset and has never met your patients is a hypothesis, not an instrument.

What happens after go-live is where the drug comparison stops being a figure of speech. Clinical models drift. The inputs move under them: an EHR upgrade rewrites documentation patterns, a coding system updates, the patient mix changes, a new order set alters how data even gets entered. The model keeps scoring as if the ground never moved. A 2022 study in the International Journal of Medical Informatics simulated exactly this for sepsis prediction and found that when a population shock on the scale of COVID-19 hit, a stale model’s discrimination sank to an AUROC of 0.811 while a retrained version held 0.868 [Int J Med Inform, 2022]. The more unsettling pattern in that literature is that drift often shows up first in calibration, not discrimination: a model can hold its headline accuracy while its risk estimates quietly skew high, so the dashboard looks fine right up until the predictions are wrong [Int J Med Inform, 2022].

The Insight

The fix is not smarter procurement. It is a monitoring discipline most hospitals already run for something else entirely: drugs.

Pharmacy has spent decades on pharmacovigilance, the post-market surveillance that starts from the assumption a product’s real-world behavior can diverge from its trial, and builds standing systems to catch the divergence: continuous adverse-event reporting, and the authority to pull a product that turns dangerous. Clinical AI needs the same posture and almost never gets it. A model is approved once, by a committee that reconvenes only when something breaks visibly, and visible is the trap. A drifting risk score does not crash or throw an error. It returns a number that looks exactly as authoritative as it did on launch day, and clinicians, already inclined to trust the machine, act on it. That trust is the multiplier on the risk, and it is what the deskilling problem makes worse the longer a tool sits in the workflow.

“A drifting clinical model does not fail loudly. It keeps returning a confident number while quietly getting it wrong, and the clinician trusts it exactly as much as the day it was installed.”

Here is the part a cautious health system would rather not put in writing: the Joint Commission certification, voluntary today, is the early shape of a standard that liability will eventually make mandatory. The program organizes its requirements around five areas, and one is explicitly monitoring, evaluating, and validating safety and performance over time [HCI Innovation Group, 2026]. CHAI’s companion governance playbooks, released the same week, name “Responsible AI Lifecycle Management” as a core element rather than a one-time gate [HCI Innovation Group, 2026]. The guidance behind it is specific about what that means on the ground: name the party responsible for ongoing local monitoring, run regular revalidation against your own data, and keep a live channel to the vendor for performance updates [Fierce Healthcare, 2025]. Read that as a rehearsal for the question a plaintiff’s attorney asks after the first patient is harmed by a model nobody had checked in eighteen months.

Treating this as a compliance cost gets the strategy backwards. The system that stands up real AI monitoring now owns a capability its competitors will be scrambling to buy the moment an accreditor or a court makes it non-optional.

Real-World Application

You cannot revalidate every model every month, and the meta-analysis numbers say most systems are starting from roughly zero. So tier the surveillance by clinical risk, the way a P&T committee already tiers drug review.

Model class	What drift looks like	Monitoring cadence	Who owns it
High-acuity clinical (sepsis, deterioration, dosing)	Missed events or alert surges after an EHR or population change	Continuous metric tracking, formal revalidation quarterly	Clinical informatics with pharmacy, never the vendor alone
Medication-risk and interaction models	Flag rates drift as formulary and order sets change	Monthly flag-rate review, revalidate on any order-set change	Pharmacy informatics
Documentation and coding AI	Silent shifts in coding patterns, compliance exposure	Sampled audit each quarter	HIM and compliance
Administrative and operational	Accuracy slippage that costs money, not safety	Annual review	Operations owner

Two rules make the table real. First, every model carries a named human owner and a date of last validation, the AI version of a lot number and an expiration date. If you cannot answer “who checked this, and when,” the model is past expiry and you have no way to know it. Second, a rising alert-override rate is a surveillance signal, not background noise to be tuned away. An override spike is often the first sign a model has drifted, the same way a cluster of adverse-event reports is the first sign of a drug problem, and the instinct to suppress the alerts buries the evidence.

Pharmacy is the natural owner of the medication-risk tier and the working template for the rest. Pharmacists already verify against risk, and they already run the rank-and-release logic good clinical AI demands, where the model flags and the pharmacist holds the authority. They think in product lifecycles rather than one-time approvals. The Epic Sepsis Model is the warning the rest of the catalog should heed. When University of Michigan researchers finally ran an independent external validation, the model that had shipped to hundreds of hospitals posted an AUC of 0.63, missed 1,709 of 2,552 sepsis cases (67% of them), and fired alerts on 18% of all hospitalizations, performance the authors called substantially worse than the developer had reported [JAMA Internal Medicine, 2021]. The tool was not broken in some exotic way. It had simply never been checked against the patients it was scoring.

The Bottom Line

The adoption numbers and the monitoring numbers describe two different organizations wearing the same logo. One has clinical AI in 75% of health systems and more than 1,400 cleared devices in the field [Fierce Healthcare, 2026; MedTech Dive, 2026]. The other externally validated 30% of its models and prospectively tested 24% [PLOS Digital Health, 2026]. The space between those two organizations is the risk surface, and the Joint Commission just stapled a certification to closing it.

The systems exposed here did not buy the wrong tool. They bought a good tool and assumed good was permanent. A model validated in 2024 against 2024 patients, running untouched in 2026 through an EHR upgrade and two formulary changes, is not the instrument the committee signed off on. It is a confident number generator trading on a clinical reputation it may no longer earn, and the deeper clinicians trust it, the more a silent drift costs before anyone catches it. The right scorecard for clinical AI was never accuracy at launch. It is whether the number still holds.

Pharmacy and clinical informatics are positioned to own this, because the discipline already exists under an older name. So put the question to your own catalog, plainly: for your highest-risk deployed model, who owns it, and what date sits on its last revalidation? If the honest answer is a vendor’s name and a blank, the model is already past expiry. You just have not been billed for it yet.