A Method to Facilitate Cancer Detection and Type Classification from Gene Expression Data using a Deep Autoencoder and Neural Network

Dec 20, 2018
Xi Chen, Jin Xie, Qingcong Yuan


 Click To Get Model/Code


With the increased affordability and availability of whole-genome sequencing, large-scale and high-throughput gene expression is widely used to characterize diseases, including cancers. However, establishing specificity in cancer diagnosis using gene expression data continues to pose challenges due to the high dimensionality and complexity of the data. Here we present models of deep learning (DL) and apply them to gene expression data for the diagnosis and categorization of cancer. In this study, we have developed two DL models using messenger ribonucleic acid (mRNA) datasets available from the Genomic Data Commons repository. Our models achieved 98% accuracy in cancer detection, with false negative and false positive rates below 1.7%. In our results, we demonstrated that 18 out of 32 cancer-typing classifications achieved more than 90% accuracy. Due to the limitation of a small sample size (less than 50 observations), certain cancers could not achieve a higher accuracy in typing classification, but still achieved high accuracy for the cancer detection task. To validate our models, we compared them with traditional statistical models. The main advantage of our models over traditional cancer detection is the ability to use data from various cancer types to automatically form features to enhance the detection and diagnosis of a specific cancer type.

* 6 pages 

 Click To Get Model/Code

   Click for Paper Source