Journal Article DZNE-2025-01210

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Benchmarking large language models for personalized, biomarker-based health intervention recommendations.

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2025
Macmillan Publishers Limited [Basingstoke]

npj digital medicine 8(1), 631 () [10.1038/s41746-025-01996-2]

This record in other databases:    

Please use a persistent id in citations: doi:

Abstract: The use of large language models (LLMs) in clinical diagnostics and intervention planning is expanding, yet their utility for personalized recommendations for longevity interventions remains opaque. We extended the BioChatter framework to benchmark LLMs' ability to generate personalized longevity intervention recommendations based on biomarker profiles while adhering to key medical validation requirements. Using 25 individual profiles across three different age groups, we generated 1000 diverse test cases covering interventions such as caloric restriction, fasting and supplements. Evaluating 56000 model responses via an LLM-as-a-Judge system with clinician validated ground truths, we found that proprietary models outperformed open-source models especially in comprehensiveness. However, even with Retrieval-Augmented Generation (RAG), all models exhibited limitations in addressing key medical validation requirements, prompt stability, and handling age-related biases. Our findings highlight limited suitability of LLMs for unsupervised longevity intervention recommendations. Our open-source framework offers a foundation for advancing AI benchmarking in various medical contexts.

Classification:

Contributing Institute(s):
  1. Translational Neurodegeneration (AG Hermann)
Research Program(s):
  1. 353 - Clinical and Health Care Research (POF4-353) (POF4-353)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; DOAJ ; OpenAccess ; Article Processing Charges ; Clarivate Analytics Master Journal List ; Current Contents - Clinical Medicine ; DOAJ Seal ; Essential Science Indicators ; Fees ; IF >= 15 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > ROS DZNE > ROS DZNE-AG Hermann
Full Text Collection
Public records
Publications Database

 Record created 2025-10-29, last modified 2025-11-13