MedATLAS-Bench
A Comprehensive and Diverse Multi-modal Medical Benchmark for Large Language Models
Xiaotian Ma*, Anand R. Mysorekar*, Xiaomin Liang, Yu-Chun Hsu
Saber Malekmohammadi, Xiaoqian Jiang, Shayan Shams
McWilliams School of Biomedical Informatics, UTHealth Houston
*Equal contribution.
MedATLAS-Bench evaluates multimodal large language models across diverse clinical inputs — including structured text, 2D images, 3D volumetric data, and video — enabling robust and realistic assessment in real-world medical scenarios.
The benchmark spans multiple clinical tasks such as classification, generation, and localization across varied datasets and conditions, including 430 samples in total.
Overall Leaderboard
Ranked model performance with average score across all samples.