Women’s Health Benchmark
Women’s Health Benchmark
Women’s Health Benchmark
We created the Women’s Health Benchmark (WHB), the first benchmark designed to evaluate LLM performance in women’s health. It includes 96 validated clinical cases across five medical specialties, three query types, and eight critical error categories. We evaluated 13 state-of-the-art LLMs using WHB.
GPT 5
53.1%
Model by OpenAI
Released: August 7, 2025
Sample case
This single scenario is shown for illustration and does not represent the model’s full benchmark results.
Prompt
A patient presents to the emergency room with severe shortness of breath (dyspnea), frothy pink sputum (pulmonary edema),
extreme anxiety, high blood pressure, and a racing heart and on doppler there were two heart tones.
They are diagnosed with flash pulmonary edema secondary to acutely decompensated heart failure.
Which ACE Inhibitor should be used and how much?
Model response
Use IV enalaprilat. It’s the only ACE inhibitor available IV and the one used in the acute setting.
Initial dose: 1.25 mg IV given over 2–5 minutes
If the patient is volume-depleted, on high-dose diuretics, elderly, or tenuous BP: consider a test dose of 0.625 mg IV
Titration: may repeat every 6 hours as needed; doses up to 5 mg IV every 6 hours have been used
Onset ~15 minutes; peak effect 1–4 hours; duration ~6 hours
Important cautions:
Avoid if hypotensive or in cardiogenic shock, pregnancy, history of ACEI angioedema, acute kidney injury, bilateral renal artery stenosis, or K+ >5.5
Monitor BP closely, urine output, creatinine, and potassium
Note: In flash pulmonary edema with hypertension, rapidly titratable vasodilators (e.g., IV nitroglycerin) and loop diuretics are typically first-line; enalaprilat can be added selectively if BP allows. If IV access is delayed, some use captopril 25 mg sublingual as an alternative.
Response is incorrect or incomplete
Expert justification
The model fails to account for the patient’s pregnancy, a critical determinant of appropriate management. In pregnancy, flash pulmonary edema requires immediate afterload reduction while avoiding ACE inhibitors, ARBs, and aldosterone antagonists due to fetal toxicity.
Reference: https://pmc.ncbi.nlm.nih.gov/articles/PMC5408000/
GPT 5
53.1%
Model by OpenAI
Released: August 7, 2025
Sample case
This single scenario is shown for illustration and does not represent the model’s full benchmark results.
Prompt
A patient presents to the emergency room with severe shortness of breath (dyspnea), frothy pink sputum (pulmonary edema),
extreme anxiety, high blood pressure, and a racing heart and on doppler there were two heart tones.
They are diagnosed with flash pulmonary edema secondary to acutely decompensated heart failure.
Which ACE Inhibitor should be used and how much?
Model response
Use IV enalaprilat. It’s the only ACE inhibitor available IV and the one used in the acute setting.
Initial dose: 1.25 mg IV given over 2–5 minutes
If the patient is volume-depleted, on high-dose diuretics, elderly, or tenuous BP: consider a test dose of 0.625 mg IV
Titration: may repeat every 6 hours as needed; doses up to 5 mg IV every 6 hours have been used
Onset ~15 minutes; peak effect 1–4 hours; duration ~6 hours
Important cautions:
Avoid if hypotensive or in cardiogenic shock, pregnancy, history of ACEI angioedema, acute kidney injury, bilateral renal artery stenosis, or K+ >5.5
Monitor BP closely, urine output, creatinine, and potassium
Note: In flash pulmonary edema with hypertension, rapidly titratable vasodilators (e.g., IV nitroglycerin) and loop diuretics are typically first-line; enalaprilat can be added selectively if BP allows. If IV access is delayed, some use captopril 25 mg sublingual as an alternative.
Response is incorrect or incomplete
Expert justification
The model fails to account for the patient’s pregnancy, a critical determinant of appropriate management. In pregnancy, flash pulmonary edema requires immediate afterload reduction while avoiding ACE inhibitors, ARBs, and aldosterone antagonists due to fetal toxicity.
Reference: https://pmc.ncbi.nlm.nih.gov/articles/PMC5408000/