000285039 001__ 285039
000285039 005__ 20260209101110.0
000285039 0247_ $$2doi$$a10.21037/qims-2025-1716
000285039 0247_ $$2ISSN$$a2223-4292
000285039 0247_ $$2ISSN$$a2223-4306
000285039 037__ $$aDZNE-2026-00163
000285039 082__ $$a610
000285039 1001_ $$aJi, Jiang$$b0
000285039 245__ $$aComparison of online radiologists and large language model chatbots in responding to common radiology-related questions in Chinese: a cross-sectional comparative analysis
000285039 260__ $$aHong Kong$$bAME Publ.$$c2026
000285039 3367_ $$2DRIVER$$aarticle
000285039 3367_ $$2DataCite$$aOutput Types/Journal article
000285039 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1770628099_24273
000285039 3367_ $$2BibTeX$$aARTICLE
000285039 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000285039 3367_ $$00$$2EndNote$$aJournal Article
000285039 520__ $$aBackground: Additional avenues for medical counseling are needed to better serve patients. In handling medical counseling, large language model chatbots (LLM-chatbots) have demonstrated near-physician expertise in comprehending enquiries and providing professional advice. However, their performance in addressing patients’ common radiology-related concerns has yet to be evaluated. This study thus aimed to investigate the effectiveness and model performance of LLM-chatbots (DeepSeek-R1 and ChatGPT-4o) in radiology-related medical consultation in the Chinese context through both subjective evaluations and objective metrics.Methods: In this cross-sectional study, common radiology-related questions were collected from the HaoDF online platform, one of the largest Chinese public healthcare service platforms. All questions were posed to the LLM-chatbots from February 24 to February 30, 2025. To facilitate comparison between LLM-chatbots and online radiologists, three senior radiologists from different medical centers were recruited as reviewers, and they blindly scored LLM-generated responses using a 5-point Likert scale across the three subjective dimensions: quality, empathy, and potential harm. Objective metrics including textual features (six metrics across three linguistic dimensions: lexical, syntactic, and semantic), response time, and self-improvement capacity were calculated as additional evaluators for the performance of the two LLM-chatbots.Results: A total of 954 reviews were generated for 318 responses to 106 questions. LLM-chatbots achieved superior scores in quality, empathy, and potential harm as compared to the online radiologists (all P values <0.001). Among the LLM-chatbots, DeepSeek-R1 outperformed ChatGPT-4o in terms of quality scores [DeepSeek-R1: mean 4.40, standard deviation (SD) 0.57; ChatGPT-4o: mean 3.73, SD 0.64; P<0.001]. The response times were significantly shorter for DeepSeek-R1 [median 56.00 s; interquartile range (IQR), 47–67 s] and ChatGPT-4o (median 12.17 s; IQR, 10.91–15.85 s) as compared to online radiologists (median 6,487.90 s; IQR, 3,530.50–29,061.70 s), and the LLM-chatbots generated greater textual complexity (as measured by six metrics across three linguistic dimensions: lexical, syntactic, and semantic) (all P values <0.001). Among the two chatbots, ChatGPT-4o generally produced linguistically simpler responses (all P values <0.001), with comparatively shorter response times (median 12.17 s; IQR, 10.91–15.85 s), than did DeepSeek-R1 (median 56.00 s; IQR, 47–67 s) across various topics (P<0.001). Additionally, both LLM-chatbots demonstrated a degree of self-improvement ability.Conclusions: These findings highlight the potential utility of LLM-chatbots in addressing the common radiology-related inquiries initially posed by patients. However, further optimization and validation are required to establish this emerging technology as a productive and effective pathway in medical counseling.
000285039 536__ $$0G:(DE-HGF)POF4-352$$a352 - Disease Mechanisms (POF4-352)$$cPOF4-352$$fPOF IV$$x0
000285039 588__ $$aDataset connected to CrossRef, Journals: pub.dzne.de
000285039 7001_ $$aLi, Chenguang$$b1
000285039 7001_ $$aFu, Yibin$$b2
000285039 7001_ $$aZhao, Zihao$$b3
000285039 7001_ $$0P:(DE-2719)9002415$$aWu, Yiyang$$b4$$udzne
000285039 7001_ $$aLiang, Changhua$$b5
000285039 7001_ $$aWu, Yue$$b6
000285039 773__ $$0PERI:(DE-600)2653586-5$$a10.21037/qims-2025-1716$$gVol. 16, no. 2, p. 129 - 129$$n2$$p129$$tQuantitative imaging in medicine and surgery$$v16$$x2223-4292$$y2026
000285039 8564_ $$uhttps://pub.dzne.de/record/285039/files/DZNE-2026-00163.pdf$$yRestricted
000285039 8564_ $$uhttps://pub.dzne.de/record/285039/files/DZNE-2026-00163.pdf?subformat=pdfa$$xpdfa$$yRestricted
000285039 9101_ $$0I:(DE-588)1065079516$$6P:(DE-2719)9002415$$aDeutsches Zentrum für Neurodegenerative Erkrankungen$$b4$$kDZNE
000285039 9131_ $$0G:(DE-HGF)POF4-352$$1G:(DE-HGF)POF4-350$$2G:(DE-HGF)POF4-300$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lNeurodegenerative Diseases$$vDisease Mechanisms$$x0
000285039 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bQUANT IMAG MED SURG : 2022$$d2024-12-18
000285039 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5$$d2024-12-18
000285039 9201_ $$0I:(DE-2719)1110001$$kAG Herms$$lTranslational Brain Research$$x0
000285039 980__ $$ajournal
000285039 980__ $$aEDITORS
000285039 980__ $$aVDBINPRINT
000285039 980__ $$aI:(DE-2719)1110001
000285039 980__ $$aUNRESTRICTED