Comparison of Post-hoc Calibration Methods for Neural Network Likelihood Scores

Handels, Heinz; Ramedani, Majid; Tolxdorff, Thomas; Strauß, Tobias; Martensen, Ole H.; Dyrba, Martin; Breininger, Katharina; Palm, Christoph; Maier, Andreas; Maier-Hein, Klaus; Deserno, Thomas

doi:10.1007/978-3-658-51100-5_87

Contribution to a conference proceedings/Contribution to a book

DZNE-2026-00661

Comparison of Post-hoc Calibration Methods for Neural Network Likelihood Scores

Handels, H. (Editor) ; Breininger, K. (Editor) ; Deserno, T. (Editor) ; Maier, A. (Editor) ; Maier-Hein, K. (Editor) ; Palm, C. (Editor) ; Tolxdorff, T. (Editor) ; Martensen, O. H. ; Strauß, T. ; Ramedani, M.DZNE* ; Dyrba, M. (Last author)DZNE*

2026
Springer Fachmedien Wiesbaden Wiesbaden
ISBN: 978-3-658-51099-2 (print), 978-3-658-51100-5 (electronic)

Bildverarbeitung für die Medizin 2026 / Handels, Heinz (Editor) [https://orcid.org/0000-0002-3499-4328] ; Wiesbaden : Springer Fachmedien Wiesbaden, 2026, Chapter 87 ; ISSN: 1431-472X=2628-8958 ; ISBN: 978-3-658-51099-2=978-3-658-51100-5 ; doi:10.1007/978-3-658-51100-5
Bildverarbeitung für die Medizin Workshop, BVM 2026, Lübeck, Germany, 15 Mar 2026 - 17 Mar 2026 Wiesbaden : Springer Fachmedien Wiesbaden, Informatik aktuell 443 - 449 (2026) [10.1007/978-3-658-51100-5_87]

This record in other databases:

Please use a persistent id in citations: doi:10.1007/978-3-658-51100-5_87

Abstract: This study investigates how to improve the reliability of probability estimates produced by deep learning models for the detection of Alzheimer’s disease using MRI data. Although convolutional neural networks (CNNs) can accurately classify neurodegenerative diseases, their softmax outputs often misrepresent true classification probabilities. We evaluated four calibration methods, two parametric (logistic and probit regression) and two nonparametric (isotonic regression and Bayesian binning into quantiles), on data from 474 participants. All models improved the CNN’s calibration noticeably without reducing accuracy. Non-parametric methods achieved the best calibration results (expected calibration error ≈ 0.014 and maximum calibration error ≈ 0.025). These findings suggest that non-parametric calibration provides more reliable and clinically useful probability estimates.

Note: Missing Journal: = 2628-8958 (import from CrossRef Book Series, Journals: pub.dzne.de)

Contributing Institute(s):

Clinical Dementia Research (Rostock /Greifswald) (AG Teipel)

Research Program(s):

353 - Clinical and Health Care Research (POF4-353) (POF4-353)

Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Document types > Books > Contribution to a book
Institute Collections > ROS DZNE > ROS DZNE-AG Teipel
Documents in Process
Public records
In process

Record created 2026-06-23, last modified 2026-06-23

Similar records

Fulltext:

PDF

PDF (PDFA)

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login DZNEPUB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help