Medical AI tools help improve patient diagnoses, reduce physician workload, and improve hospital operations. But do these tools keep their promises? To answer this question, Stanford researchers developed an open-source framework that allows hospital systems to determine whether an AI technology would bring more benefits than harms to their workflows and patient outcomes.
Often, healthcare providers deploying out-of-the-box AI tools do not have an effective process for monitoring their usefulness over time. This is where the Stanford framework can come in: Fair, Useful, and Trustworthy AI Models (FURM) advicealready in use at Stanford Health Care, is evaluating the utility of technology ranging from early detectors of peripheral arterial disease to a risk prediction model for cancer patients to a chest CT analysis model that could assess whether a person can benefit from a statin prescription.
“One of the key findings we have on our campus is that the benefits we get from any AI solution or model are inextricably linked to the workflow in which it operates and whether we have the time and the resources needed to actually use it in a busy health care system. parameter,” said the co-creator of the FURM framework Dr Nigam Shahprofessor of medicine and biomedical data science at Stanford, as well as chief data scientist of Stanford Health Care.
Other researchers and developers are developing guidance to ensure AI is safe and equitable, Shah said, but a critical gap lies in assessing the technology’s usefulness and realistic implementation, as this What works for one health system may not work for another.
How FURM works
The FURM assessment has three stages:
- The what and why: Understand what problems the AI model would solve, how its results would be used, and the impact on patients and the healthcare system. This part of the process also projects financial viability and evaluates ethical considerations.
- The how: Determine whether it is realistic to deploy the model into healthcare system workflows as planned.
- The impact: planning the initial verification of benefits and how to monitor the results of the model once it is operational and evaluate its performance.
Just as was the case at Stanford, Shah believes, FURM could help health systems better use their time to focus on technologies worth pursuing, instead of just experimenting with everything to see what sticks. . “You might end up with what’s called ‘pilotitis,’ a ‘disease’ that affects the organization, where you find yourself stringing together pilot projects that go nowhere,” Shah said.
Additionally, Shah says it’s important to consider the scale of the impact: a model may be good but only help 50 patients.
Beyond return on investment
AI also has ethical implications that should not be ignored, stresses Michelle Melloprofessor of health law and policy at Stanford. Mello and Danton Charassociate professor of anesthesiology, perioperative medicine, and pain medicine at Stanford and empirical bioethics researcher, created the ethical review component of the FURM framework with the goal of helping hospitals proactively anticipate ethical issues potential. For example, the ethics team recommends ways for implementers to develop stronger processes for monitoring the safety of new tools, evaluates whether and how new tools should be disclosed to patients, and considers how The use of AI tools can widen or narrow healthcare disparities between patients. subgroups.
Dr Sneha Jainclinical assistant professor of cardiovascular medicine at Stanford and co-creator of FURM, participated in developing the methodology to prospectively evaluate AI tools once operational, as well as designing ways to make the FURM framework more accessible to systems outside Stanford. She is currently building the Stanford GUIDE-AI Lab, which stands for Guidance for the Use, Implementation, Development, and Evaluation of AI. The goal, Jain said, is two-fold: to ensure that we continue to improve our AI evaluation processes and to ensure that not only well-resourced health systems can responsibly use AI tools, but also hospitals with lower technology budgets. Mello and Char are continuing similar work for the ethics review process, with funding from the Patient-Centered Outcomes Research Institute and Stanford Impact Labs.
“AI tools are being rapidly deployed across healthcare systems with varying degrees of monitoring and evaluation,” Jain explained. “Our hope is that we can democratize robust but actionable assessment processes for these tools and associated workflows to improve the type of care patients receive in the United States – and hopefully one day in the whole world. »
Moving forward, this interdisciplinary group of Stanford researchers wants to continue adapting the FURM framework to meet the needs of changing AI technologies, including generative AI, which is rapidly evolving and developing day by day.
“If you develop standards or processes that aren’t achievable for people, they just won’t succeed,” Mello added. “A key part of our work is figuring out how to implement tools effectively, especially in an area where everyone is scrambling to move quickly. »