FDA's AI Tool 'Elsa' Is Hallucinating Studies: A Risky Bet on Drug Approval

Imagine a world where life-saving drugs get to the people who need them faster than ever before, all thanks to the power of artificial intelligence. This is the future the U.S. Food and Drug Administration (FDA) envisioned when it launched Elsa, an AI tool designed to revolutionize the drug approval process. Officials painted a picture of streamlined workflows and accelerated timelines. But for the scientists on the front lines, the reality has been less of a revolution and more of a headache.

The Promise vs. The Problem

The FDA, a division of the Department of Health and Human Services (HHS), unveiled Elsa with great fanfare. The promise was simple: use AI to dramatically speed up the lengthy and complex process of approving new drugs and medical devices. HHS Secretary Robert F. Kennedy Jr. even declared, “The AI revolution has arrived,” touting the technology's ability to manage data and fast-track approvals.

Behind the scenes, however, a different story was unfolding. According to current and former FDA officials, Elsa has been a source of both frustration and alarm. While it can handle simple administrative tasks like drafting emails or summarizing meetings, it falls short where it matters most. The tool has been caught “hallucinating”—a term for when an AI confidently makes up information. In this case, Elsa has been inventing nonexistent scientific studies and misrepresenting actual research.

One FDA employee summed up the frustration, stating, “Anything that you don’t have time to double-check is unreliable. It hallucinates confidently.” Another noted the irony: “AI is supposed to save our time, but I guarantee you that I waste a lot of extra time just due to the heightened vigilance that I have to have.”

An Unreliable Assistant

The core of the problem is that Elsa currently cannot perform the critical review work it was intended for. It lacks access to crucial documents, such as proprietary industry submissions, preventing it from answering basic questions about a company's filing history or related products. When tested with questions about approved drugs, it has returned incorrect answers.

When confronted with its errors, Elsa is reportedly “apologetic,” reminding users to verify its work. This flaw fundamentally undermines its purpose. Instead of a powerful analytical partner, FDA scientists have an unreliable assistant that requires constant supervision.

FDA leadership has acknowledged some of these shortcomings. The agency's head of AI, Jeremy Walsh, admitted that Elsa, like other large language models, can hallucinate. FDA Commissioner Dr. Marty Makary noted that using the tool is optional for staff, stating, “They don’t have to use Elsa if they don’t find it to have value.” While updates are planned to improve its capabilities, the initial rollout has raised serious questions about the integrity of a tool meant to safeguard public health.

The Wild West of AI Regulation

Elsa's troubled debut highlights a much larger issue: the lack of federal oversight for AI in the United States. While the European Union has passed the comprehensive AI Act to regulate high-risk systems, the U.S. is lagging. Efforts to establish similar safeguards have stalled, leaving the development and deployment of powerful AI tools in a regulatory gray area often described as the “Wild West.”

As the government pushes to dominate the global AI race, the story of Elsa serves as a critical case study. It demonstrates the massive gap between the theoretical promise of AI and the practical challenges of implementing it safely and effectively, especially in high-stakes fields like medicine.

Key Takeaways

The journey of the FDA's AI tool is a cautionary tale about the rush to innovate without the necessary guardrails. Here are the key points to remember:

High Hopes, Hard Reality: The FDA launched an AI tool named Elsa to accelerate drug approvals, but it has proven unreliable.
AI Hallucinations: The tool has been found to invent nonexistent studies and misrepresent data, a critical flaw in a scientific context.
Efficiency Paradox: Instead of saving time, Elsa's untrustworthiness forces staff to spend more time verifying its work.
Regulatory Gap: The incident underscores the absence of comprehensive AI regulation in the U.S., particularly for high-risk applications.
The Path Forward: Balancing the immense potential of AI with the need for rigorous safety standards remains a crucial challenge for developers and policymakers alike.