NeuralMedBench

Leaderboard

The Performance of Different VLMs on Neuro-Medical Tasks

Direct Diagnosis

Factual Medical Knowledge Application in Diagnostic Scenarios.

sample size

SOTA: 46.7%

Complex Diseases

Diagnostic Challenges in Complex and Rare Diseases.

sample size

SOTA: 40.0%

Multi-round Dialogue

Contextual Relationships in Multi-Round Dialogue.

100

sample size

SOTA: 18.5%

Sorting by:

Ranking	Model	Model Size(B)	Accuracy (@Pass1)	Accuracy (@Pass5)	BertScore

Overview of Neural-MedBench

Dataset

We offer carefully curated and annotated neuromedicine datasets designed to support the training and evaluation of VLM models.

Direct Diagnosis

Straightforward cases with clear diagnostic cues.

Quick Test Prototypicality

Complex Diseases

Ambiguous presentations or rare neurological diseases requiring deeper inference.

Deep Inference Rare Diseases

Multi-round Dialogue

Simulated consult-style dialogue requiring iterative reasoning and multimodal synthesis.

Iterative Reasoning Consult-style

News

Latest updates related to Neural-MedBench

🎉

2026-01-25 Neural-MedBench has been formally accepted for publication at the 2026 International Conference on Learning Representations (ICLR 2026).

📃

2025-09-26 Neural-Benchmark: Pioneering Research Now Live on arXiv

🚀

2025-09-12 Leaderboard is on! Check out the result!

🛠️

2025-05-17 We release the evaluation code! Check out the Usage.

📊

2025-05-16 We release the Neural-MedBench dataset

✨

2025-05-15

We introduce Neural-MedBench

A compact yet reasoning-intensive benchmark specifically designed to probe the limits of multimodal clinical reasoning in neurology.

About Us

Pioneering the exploration of large models in neuro-medicine, empowering AI-driven precision diagnosis and treatment of neurological disorders.

Hugging Face

https://huggingface.co

Citation

Miao, J. et al. (2025). Beyond classification accuracy: Neural-MedBench and the need for deeper reasoning benchmarks.
arXiv preprint arXiv:2509.22258 [cs.CV].

Disclaimer

Neural-MedBench is for research purposes only. Models evaluated on Neural-MedBench can produce unexpected results. We are not responsible for any damages caused by the use of Neural-MedBench, including but not limited to, any loss of profit, data, or use of data.

License

This project is licensed under the MIT License.

Official Website

https://www.gdiist.cn/

NeuralMedBench

NeuralMedBench

Leaderboard

Direct Diagnosis

Complex Diseases

Multi-round Dialogue

NeuralMedBench 2.0 Leaderboard

404: Model Not Found

Overview of Neural-MedBench

Dataset

Direct Diagnosis

Complex Diseases

Multi-round Dialogue

NeuralMedBench 2.0

News

2026-01-25 Neural-MedBench has been formally accepted for publication at the 2026 International Conference on Learning Representations (ICLR 2026).

2025-09-26 Neural-Benchmark: Pioneering Research Now Live on arXiv

2025-09-12 Leaderboard is on! Check out the result!

2025-05-17 We release the evaluation code! Check out the Usage.

2025-05-16 We release the Neural-MedBench dataset

We introduce Neural-MedBench

About Us

Hugging Face

Citation

Disclaimer

License

Official Website

Follow Us

NeuralMedBench

Leaderboard

Direct Diagnosis

Complex Diseases

Multi-round Dialogue

NeuralMedBench 2.0 Leaderboard

404: Model Not Found

Overview of Neural-MedBench

Dataset

Direct Diagnosis

Complex Diseases

Multi-round Dialogue

NeuralMedBench 2.0

News

2026-01-25 Neural-MedBench has been formally accepted for publication at the 2026 International Conference on Learning Representations (ICLR 2026).

2025-09-26 Neural-Benchmark: Pioneering Research Now Live on arXiv

2025-09-12 Leaderboard is on! Check out the result!

2025-05-17 We release the evaluation code! Check out the Usage.

2025-05-16 We release the Neural-MedBench dataset

We introduce Neural-MedBench

About Us

Hugging Face

Citation

Disclaimer

License

Official Website

Follow Us

模型详情