TY - JOUR
AU - Young, Cameron C
AU - Eason, Katherine
AU - Manzano Garcia, Raquel
AU - Moulange, Richard
AU - Mukherjee, Sach
AU - Chin, Suet-Feung
AU - Caldas, Carlos
AU - Rueda, Oscar M
TI - Development and validation of a reliable DNA copy-number-based machine learning algorithm (CopyClust) for breast cancer integrative cluster classification.
JO - Scientific reports
VL - 14
IS - 1
SN - 2045-2322
CY - [London]
PB - Macmillan Publishers Limited, part of Springer Nature
M1 - DZNE-2024-00612
SP - 11861
PY - 2024
AB - The Integrative Cluster subtypes (IntClusts) provide a framework for the classification of breast cancer tumors into 10 distinct groups based on copy number and gene expression, each with unique biological drivers of disease and clinical prognoses. Gene expression data is often lacking, and accurate classification of samples into IntClusts with copy number data alone is essential. Current classification methods achieve low accuracy when gene expression data are absent, warranting the development of new approaches to IntClust classification. Copy number data from 1980 breast cancer samples from METABRIC was used to train multiclass XGBoost machine learning algorithms (CopyClust). A piecewise constant fit was applied to the average copy number profile of each IntClust and unique breakpoints across the 10 profiles were identified and converted into 500 genomic regions used as features for CopyClust. These models consisted of two approaches: a 10-class model with the final IntClust label predicted by a single multiclass model and a 6-class model with binary reclassification in which four pairs of IntClusts were combined for initial multiclass classification. Performance was validated on the TCGA dataset, with copy number data generated from both SNP arrays and WES platforms. CopyClust achieved 81
KW - Humans
KW - Breast Neoplasms: genetics
KW - Breast Neoplasms: classification
KW - Machine Learning
KW - Female
KW - DNA Copy Number Variations: genetics
KW - Algorithms
KW - Cluster Analysis
KW - Gene Expression Profiling: methods
LB - PUB:(DE-HGF)16
C6 - pmid:38789621
C2 - pmc:PMC11126405
DO - DOI:10.1038/s41598-024-62724-6
UR - https://pub.dzne.de/record/269770
ER -