Abstract: Single-cell RNA sequencing (scRNA-seq) promises to provide a high resolution of cellular differences. However, the analysis of scRNA-seq data remains a statistical and computational challenge, due to pervasive dropout events that obscure the high dimensional data matrix with pre-vailing ‘false’ zero count observations. Furthermore, subsequent differential expression analysis after clustering incurs the so-called “double use of data" problem, which compromises type 1 error control for standard statistical tests. In this talk, I will introduce model-based deep autoencoders to address these issues. The proposed approaches leverage the most recent developments in feature representation learning in deep learning and feature selection in statistical learning, as well as prior information from domain scientists. Extensive experiments on both simulated and real datasets demonstrate that the proposed methods can significantly boost clustering performance while effectively filtering out most irrelevant genes. Our methods can generate more biologically meaningful clusters with enhanced interpretability, as desired by biologists.
Dr. Zhi Wei
Read More