Publications

Detailed Information

FastRNA: An efficient solution for PCA of single-cell RNA-sequencing data based on a batch-accounting count model

Cited 2 time in Web of Science Cited 2 time in Scopus
Authors

Lee, Hanbin; Han, Buhm

Issue Date
2022-11
Publisher
University of Chicago Press
Citation
American Journal of Human Genetics, Vol.109 No.11, pp.1974-1985
Abstract
Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.
ISSN
0002-9297
URI
https://hdl.handle.net/10371/191456
DOI
https://doi.org/10.1016/j.ajhg.2022.09.008
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • College of Medicine
  • Department of Medicine
Research Area Bioinformatics, Genomics, Statistical Genetics

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share