Contribution to a conference proceedings/Contribution to a book DZNE-2022-01054

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

 ;  ;  ;  ;  ;

2019
Springer International Publishing Cham
ISBN: 978-3-030-20241-5 (print), 978-3-030-20242-2 (electronic)

Bioinformatics Research and Applications / Cai, Zhipeng (Editor) ; Cham : Springer International Publishing, 2019, Chapter 14 ; ISSN: 0302-9743=1611-3349 ; ISBN: 978-3-030-20241-5=978-3-030-20242-2 ; doi:10.1007/978-3-030-20242-2
International Symposium on Bioinformatics Research and Applications, Meeting location,
Cham : Springer International Publishing, Lecture Notes in Computer Science 11490, 159 - 170 () [10.1007/978-3-030-20242-2_14]

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The “one dataset out” average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for ‘unseen’ datasets.


Contributing Institute(s):
  1. Genome Biology of Neurodegenerative Diseases (AG Heutink 1)
Research Program(s):
  1. 899 - ohne Topic (POF4-899) (POF4-899)

Appears in the scientific report 2019
Database coverage:
NationallizenzNationallizenz ; SCOPUS
Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Document types > Books > Contribution to a book
Institute Collections > TÜ DZNE > TÜ DZNE-AG Heutink
Public records
Publications Database

 Record created 2022-06-01, last modified 2023-01-03



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)