OpenAI unveils HealthBench to evaluate LLMs safety in healthcare

OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. 

“The 5,000 conversations in HealthBench simulate interactions between AI models and individual users or clinicians. The task for a model is to provide the best possible response to the user’s last message,” the company said in a statement. 

OpenAI built the benchmark with 262 physicians in 60 countries, who are proficient in 49 languages and have training in 26 medical specialties. 

HealthBench includes 5,000 health conversations, each with a physician-created rubric to evaluate model responses. The rubric evaluation includes 48,562 unique rubric criteria. 

The company said the conversations were created through “synthetic generation and human adversarial testing,” are multilingual, and span various medical specialities and contexts.  

“Every model response is graded against a set of physician-written rubric criteria specific to that conversation,” the company said. 

“Each criterion outlines what an ideal response should include or avoid (e.g., a specific fact to include or unnecessarily technical jargon to avoid). Each criterion has a corresponding point value, weighted to match the physician’s judgment of that criterion’s importance.” 

The model’s responses are evaluated using GPT-4.1 to determine if each rubric criterion is met. An overall score based on the criteria being met is shown to the user and compared to the maximum possible score. 

HealthBench is split into seven themes: expertise-tailored communication, response depth, emergency referrals, health data tasks, global health, responding under uncertainty and context seeking.

“Evaluations like HealthBench are part of our ongoing efforts to understand model behavior in high-impact settings and help ensure progress is directed toward real-world benefit,” the company said. 

“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark. Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability. We look forward to sharing results for future models.”

The tools are publicly available on GitHub. 

THE LARGER TREND

OpenAI’s CEO, Sam Altman, was part of President Donald Trump’s press conference earlier this year announcing the launch of Project Stargate. This $500 billion project would focus on developing the physical and virtual infrastructure to power AI construction, including AI to improve health outcomes. 

The partners, which also included Oracle’s chief technology officer, Larry Ellison, and SoftBank‘s CEO, Masayoshi Son, touted the project as a game changer for healthcare.

Altman said during the press conference that he is thrilled to be part of Stargate and anticipates that diseases will be cured at an unprecedented rate. 

Ellison added that a cancer vaccine is one of the “most exciting” things the group is working on, using the tools that Altman and Son are providing.

Earlier this month, the Financial Times reported that Project Stargate was considering international expansion, with its top country of choice being the UK. Germany and France are also attractive candidates. 

However, this week, Bloomberg reported that the project is facing delays due to the tariffs imposed by President Trump and economic uncertainty. 

Due to economic uncertainty and growing market volatility, banks and institutional investors are wary of investing in Stargate, especially as data center build-out costs are uncertain due to U.S. tariffs, particularly on chips, server racks and cooling systems.   

Additionally, SoftBank, which pledged to donate an immediate $100 billion investment in the project with the goal of it becoming $500 billion within the next four years, has yet to develop a financing template or start discussions with potential backers, according to Bloomberg.  

Trending Products

0
Add to compare
Complete Guide to Natural Home Remedies: Over 100 ...
0
Add to compare
Original price was: $14.99.Current price is: $13.94.
7%
0
Add to compare
Creating the Perfect Lifestyle
0
Add to compare
$14.97
0
Add to compare
Use Your Brain to Change Your Age: Secrets to Look...
0
Add to compare
Original price was: $16.00.Current price is: $12.77.
20%
0
Add to compare
Heal From Within: Your Essential Guide To Natural ...
0
Add to compare
$0.99
0
Add to compare
Equilife – Daily Fruit & Vegetable Blen...
0
Add to compare
$69.95
0
Add to compare
Aromatherapy with Essential Oil Diffusers: For Eve...
0
Add to compare
Original price was: $19.95.Current price is: $10.67.
47%
0
Add to compare
The Spirit Messages Daily Guidance Oracle Deck: A ...
0
Add to compare
Original price was: $22.99.Current price is: $19.97.
13%
0
Add to compare
Keto Essentials: 150 Ketogenic Recipes to Revitali...
0
Add to compare
Original price was: $34.95.Current price is: $16.96.
51%
0
Add to compare
Chunky Cable Knit Throw Blanket Lightweight Light ...
0
Add to compare
$36.99
0
Add to compare
Thinker Sculptures, Sandstone Resin Thinker Statue...
0
Add to compare
Original price was: $9.99.Current price is: $8.99.
10%
.

We will be happy to hear your thoughts

Leave a reply

PuraVidaTV
Logo
Register New Account
Compare items
  • Total (0)
Compare
0
Shopping cart