001     285039
005     20260209101110.0
024 7 _ |a 10.21037/qims-2025-1716
|2 doi
024 7 _ |a 2223-4292
|2 ISSN
024 7 _ |a 2223-4306
|2 ISSN
037 _ _ |a DZNE-2026-00163
082 _ _ |a 610
100 1 _ |a Ji, Jiang
|b 0
245 _ _ |a Comparison of online radiologists and large language model chatbots in responding to common radiology-related questions in Chinese: a cross-sectional comparative analysis
260 _ _ |a Hong Kong
|c 2026
|b AME Publ.
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1770628099_24273
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a Background: Additional avenues for medical counseling are needed to better serve patients. In handling medical counseling, large language model chatbots (LLM-chatbots) have demonstrated near-physician expertise in comprehending enquiries and providing professional advice. However, their performance in addressing patients’ common radiology-related concerns has yet to be evaluated. This study thus aimed to investigate the effectiveness and model performance of LLM-chatbots (DeepSeek-R1 and ChatGPT-4o) in radiology-related medical consultation in the Chinese context through both subjective evaluations and objective metrics.Methods: In this cross-sectional study, common radiology-related questions were collected from the HaoDF online platform, one of the largest Chinese public healthcare service platforms. All questions were posed to the LLM-chatbots from February 24 to February 30, 2025. To facilitate comparison between LLM-chatbots and online radiologists, three senior radiologists from different medical centers were recruited as reviewers, and they blindly scored LLM-generated responses using a 5-point Likert scale across the three subjective dimensions: quality, empathy, and potential harm. Objective metrics including textual features (six metrics across three linguistic dimensions: lexical, syntactic, and semantic), response time, and self-improvement capacity were calculated as additional evaluators for the performance of the two LLM-chatbots.Results: A total of 954 reviews were generated for 318 responses to 106 questions. LLM-chatbots achieved superior scores in quality, empathy, and potential harm as compared to the online radiologists (all P values <0.001). Among the LLM-chatbots, DeepSeek-R1 outperformed ChatGPT-4o in terms of quality scores [DeepSeek-R1: mean 4.40, standard deviation (SD) 0.57; ChatGPT-4o: mean 3.73, SD 0.64; P<0.001]. The response times were significantly shorter for DeepSeek-R1 [median 56.00 s; interquartile range (IQR), 47–67 s] and ChatGPT-4o (median 12.17 s; IQR, 10.91–15.85 s) as compared to online radiologists (median 6,487.90 s; IQR, 3,530.50–29,061.70 s), and the LLM-chatbots generated greater textual complexity (as measured by six metrics across three linguistic dimensions: lexical, syntactic, and semantic) (all P values <0.001). Among the two chatbots, ChatGPT-4o generally produced linguistically simpler responses (all P values <0.001), with comparatively shorter response times (median 12.17 s; IQR, 10.91–15.85 s), than did DeepSeek-R1 (median 56.00 s; IQR, 47–67 s) across various topics (P<0.001). Additionally, both LLM-chatbots demonstrated a degree of self-improvement ability.Conclusions: These findings highlight the potential utility of LLM-chatbots in addressing the common radiology-related inquiries initially posed by patients. However, further optimization and validation are required to establish this emerging technology as a productive and effective pathway in medical counseling.
536 _ _ |a 352 - Disease Mechanisms (POF4-352)
|0 G:(DE-HGF)POF4-352
|c POF4-352
|f POF IV
|x 0
588 _ _ |a Dataset connected to CrossRef, Journals: pub.dzne.de
700 1 _ |a Li, Chenguang
|b 1
700 1 _ |a Fu, Yibin
|b 2
700 1 _ |a Zhao, Zihao
|b 3
700 1 _ |a Wu, Yiyang
|0 P:(DE-2719)9002415
|b 4
|u dzne
700 1 _ |a Liang, Changhua
|b 5
700 1 _ |a Wu, Yue
|b 6
773 _ _ |a 10.21037/qims-2025-1716
|g Vol. 16, no. 2, p. 129 - 129
|0 PERI:(DE-600)2653586-5
|n 2
|p 129
|t Quantitative imaging in medicine and surgery
|v 16
|y 2026
|x 2223-4292
856 4 _ |u https://pub.dzne.de/record/285039/files/DZNE-2026-00163.pdf
|y Restricted
856 4 _ |u https://pub.dzne.de/record/285039/files/DZNE-2026-00163.pdf?subformat=pdfa
|x pdfa
|y Restricted
910 1 _ |a Deutsches Zentrum für Neurodegenerative Erkrankungen
|0 I:(DE-588)1065079516
|k DZNE
|b 4
|6 P:(DE-2719)9002415
913 1 _ |a DE-HGF
|b Gesundheit
|l Neurodegenerative Diseases
|1 G:(DE-HGF)POF4-350
|0 G:(DE-HGF)POF4-352
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-300
|4 G:(DE-HGF)POF
|v Disease Mechanisms
|x 0
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2024-12-18
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2024-12-18
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2024-12-18
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0160
|2 StatID
|b Essential Science Indicators
|d 2024-12-18
915 _ _ |a WoS
|0 StatID:(DE-HGF)0113
|2 StatID
|b Science Citation Index Expanded
|d 2024-12-18
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2024-12-18
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b QUANT IMAG MED SURG : 2022
|d 2024-12-18
915 _ _ |a IF < 5
|0 StatID:(DE-HGF)9900
|2 StatID
|d 2024-12-18
920 1 _ |0 I:(DE-2719)1110001
|k AG Herms
|l Translational Brain Research
|x 0
980 _ _ |a journal
980 _ _ |a EDITORS
980 _ _ |a VDBINPRINT
980 _ _ |a I:(DE-2719)1110001
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21