|
Analyzing Multimodal Machine Learning Model Performance and Evaluation Metrics for Medical Report Generation
Ankit Gupta,
Min Xu,
Martin Zhang,
Bryan Wilder
MSCS Thesis Defense
[paper]
[talk]
We compare the performance of a variety of approaches for generating medical reports on a dataset of chest X-ray medical reports, including a unimodal fine-tuned medical LLM, a multimodal model without symptom data, and a multimodal model with symptom data. Next, we introduce four new metrics for evaluating the similarity between generated and reference medical reports, which we term Word Pairs, Sentence Average, Sentence Pairs, and Sentence Pairs (Bio).
|