Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

Fiosins, Maksims; Bonn, Stefan
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Fiosins:145612,
      author       = {Fiosins, Maksims and Bonn, Stefan},
      title        = {{D}eep {L}earning and {R}andom {F}orest-{B}ased
                      {A}ugmentation of s{RNA} {E}xpression {P}rofiles},
      reportid     = {DZNE-2020-00942},
      year         = {2019},
      abstract     = {The lack of well-structured annotations in a growing amount
                      of RNA expression data complicates data interoperability and
                      reusability. Commonly used text mining methods extract
                      annotations from existing unstructured data descriptions and
                      often provide inaccurate output that requires manual
                      curation. Automatic data-based augmentation (generation of
                      annotations on the base of expression data) can considerably
                      improve the annotation quality and has not been
                      well-studied. We formulate an automatic augmentation of
                      small RNA-seq expression data as a classification problem
                      and investigate deep learning (DL) and random forest (RF)
                      approaches to solve it. We generate tissue and sex
                      annotations from small RNA-seq expression data for tissues
                      and cell lines of homo sapiens. We validate our approach on
                      4243 annotated small RNA-seq samples from the Small RNA
                      Expression Atlas (SEA) database. The average prediction
                      accuracy for tissue groups is $98\%$ (DL), for tissues -
                      $96.5\%$ (DL), and for sex - $77\%$ (DL). The “one dataset
                      out” average accuracy for tissue group prediction is
                      $83\%$ (DL) and $59\%$ (RF). On average, DL provides better
                      results as compared to RF, and considerably improves
                      classification performance for ‘unseen’ datasets.},
      month         = {Jun},
      date          = {2019-06-03},
      organization  = {15th International Symposium on
                       Bioinformatics Research and
                       Applications (ISBRA), Barcelona
                       (Spain), 3 Jun 2019 - 6 Jun 2019},
      subtyp        = {Other},
      cin          = {AG Bonn 1},
      cid          = {I:(DE-2719)1410003},
      pnm          = {342 - Disease Mechanisms and Model Systems (POF3-342)},
      pid          = {G:(DE-HGF)POF3-342},
      typ          = {PUB:(DE-HGF)6},
      url          = {https://pub.dzne.de/record/145612},
}
guest :: login DZNEPUB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help