계명대학교 의학도서관 Repository

Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation

Metadata Downloads
Author(s)
Eun Kyoung HongJiyeon HamByungseok RohJawook GuBeomhee ParkSunghun KangKihyun YouJihwan EomByeonguk BaeJae-Bock JoOk Kyu SongWoong BaeRo Woon LeeChong Hyun SuhChan Ho ParkSeong Jun ChoiJai Soung ParkJae-Hyeong ParkHyun Jeong JeonJeong-Ho HongDosang ChoHan Seok ChoiTae Hee Kim
Keimyung Author(s)
Hong, Jeong Ho
Department
Dept. of Neurology (신경과학)
Journal Title
Radiology
Issued Date
2025
Volume
314
Issue
3
Abstract
Background:
Generative artificial intelligence (AI) is anticipated to alter radiology workflows, requiring a clinical value assessment for frequent examinations like chest radiograph interpretation.

Purpose:
To develop and evaluate the diagnostic accuracy and clinical value of a domain-specific multimodal generative AI model for providing preliminary interpretations of chest radiographs.

Materials and Methods:
For training, consecutive radiograph-report pairs from frontal chest radiography were retrospectively collected from 42 hospitals (2005–2023). The trained domain-specific AI model generated radiology reports for the radiographs. The test set included public datasets (PadChest, Open-i, VinDr-CXR, and MIMIC-CXR-JPG) and radiographs excluded from training. The sensitivity and specificity of the model-generated reports for 13 radiographic findings, compared with radiologist annotations (reference standard), were calculated (with 95% CIs). Four radiologists evaluated the subjective quality of the reports in terms of acceptability, agreement score, quality score, and comparative ranking of reports from (a) the domain-specific AI model, (b) radiologists, and (c) a general-purpose large language model (GPT-4Vision). Acceptability was defined as whether the radiologist would endorse the report as their own without changes. Agreement scores from 1 (clinically significant discrepancy) to 5 (complete agreement) were assigned using RADPEER; quality scores were on a 5-point Likert scale from 1 (very poor) to 5 (excellent).

Results:
A total of 8 838 719 radiograph-report pairs (training) and 2145 radiographs (testing) were included (anonymized with respect to sex and gender). Reports generated by the domain-specific AI model demonstrated high sensitivity for detecting two critical radiographic findings: 95.3% (181 of 190) for pneumothorax and 92.6% (138 of 149) for subcutaneous emphysema. Acceptance rate, evaluated by four radiologists, was 70.5% (6047 of 8680), 73.3% (6288 of 8580), and 29.6% (2536 of 8580) for model-generated, radiologist, and GPT-4Vision reports, respectively. Agreement scores were highest for the model-generated reports (median = 4 [IQR, 3–5]) and lowest for GPT-4Vision reports (median = 1 [IQR, 1–3]; P < .001). Quality scores were also highest for the model-generated reports (median = 4 [IQR, 3–5]) and lowest for the GPT-4Vision reports (median = 2 [IQR, 1–3]; P < .001). From the ranking analysis, model-generated reports were most frequently ranked the highest (60.0%; 5146 of 8580), and GPT-4Vision reports were most frequently ranked the lowest (73.6%; 6312 of 8580).

Conclusion:
A domain-specific multimodal generative AI model demonstrated potential for high diagnostic accuracy and clinical value in providing preliminary interpretations of chest radiographs for radiologists.
Keimyung Author(s)(Kor)
홍정호
Publisher
School of Medicine (의과대학)
Type
Article
ISSN
1527-1315
Source
https://pubs.rsna.org/doi/10.1148/radiol.241476
DOI
10.1148/radiol.241476
URI
https://kumel.medlib.dsmc.or.kr/handle/2015.oak/46310
Appears in Collections:
1. School of Medicine (의과대학) > Dept. of Neurology (신경과학)
공개 및 라이선스
  • 공개 구분공개
  • 엠바고Forever
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.