000272949 001__ 272949
000272949 005__ 20241128165020.0
000272949 0247_ $$2doi$$a10.1093/bioadv/vbae165
000272949 0247_ $$2pmid$$apmid:39544628
000272949 0247_ $$2pmc$$apmc:PMC11562964
000272949 037__ $$aDZNE-2024-01329
000272949 041__ $$aEnglish
000272949 082__ $$a004
000272949 1001_ $$0P:(DE-2719)9001161$$aBreimann, Stephan$$b0$$eFirst author
000272949 245__ $$aAAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales.
000272949 260__ $$aOxford$$bOxford University Press$$c2024
000272949 3367_ $$2DRIVER$$aarticle
000272949 3367_ $$2DataCite$$aOutput Types/Journal article
000272949 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1732782718_2418
000272949 3367_ $$2BibTeX$$aARTICLE
000272949 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000272949 3367_ $$00$$2EndNote$$aJournal Article
000272949 520__ $$aAmino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications.The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.
000272949 536__ $$0G:(DE-HGF)POF4-352$$a352 - Disease Mechanisms (POF4-352)$$cPOF4-352$$fPOF IV$$x0
000272949 588__ $$aDataset connected to CrossRef, PubMed, , Journals: pub.dzne.de
000272949 7001_ $$00000-0002-9006-4707$$aFrishman, Dmitrij$$b1
000272949 773__ $$0PERI:(DE-600)3076075-6$$a10.1093/bioadv/vbae165$$gVol. 4, no. 1, p. vbae165$$n1$$pvbae165$$tBioinformatics advances$$v4$$x2635-0041$$y2024
000272949 8564_ $$uhttps://pub.dzne.de/record/272949/files/DZNE-2024-01329%20SUP1.zip
000272949 8564_ $$uhttps://pub.dzne.de/record/272949/files/DZNE-2024-01329%20SUP2.pdf
000272949 8564_ $$uhttps://pub.dzne.de/record/272949/files/DZNE-2024-01329.pdf$$yOpenAccess
000272949 8564_ $$uhttps://pub.dzne.de/record/272949/files/DZNE-2024-01329%20SUP2.pdf?subformat=pdfa$$xpdfa
000272949 8564_ $$uhttps://pub.dzne.de/record/272949/files/DZNE-2024-01329.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000272949 909CO $$ooai:pub.dzne.de:272949$$popenaire$$popen_access$$pVDB$$pdriver$$pdnbdelivery
000272949 9101_ $$0I:(DE-588)1065079516$$6P:(DE-2719)9001161$$aDeutsches Zentrum für Neurodegenerative Erkrankungen$$b0$$kDZNE
000272949 9131_ $$0G:(DE-HGF)POF4-352$$1G:(DE-HGF)POF4-350$$2G:(DE-HGF)POF4-300$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lNeurodegenerative Diseases$$vDisease Mechanisms$$x0
000272949 9141_ $$y2024
000272949 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2023-08-30
000272949 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000272949 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2021-11-16T17:08:20Z
000272949 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2021-11-16T17:08:20Z
000272949 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000272949 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Peer review$$d2021-11-16T17:08:20Z
000272949 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2023-08-30
000272949 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2023-08-30
000272949 915__ $$0StatID:(DE-HGF)0320$$2StatID$$aDBCoverage$$bPubMed Central$$d2023-08-30
000272949 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2023-08-30
000272949 9201_ $$0I:(DE-2719)1110000-1$$kAG Steiner$$lBiochemistry of γ-Secretase$$x0
000272949 980__ $$ajournal
000272949 980__ $$aVDB
000272949 980__ $$aUNRESTRICTED
000272949 980__ $$aI:(DE-2719)1110000-1
000272949 9801_ $$aFullTexts