| Title: | Construct and Compare scGRN from Single-Cell Transcriptomic Data |
|---|---|
| Description: | A workflow based on machine learning methods to construct and compare single-cell gene regulatory networks (scGRN) using single-cell RNA-seq (scRNA-seq) data collected from different conditions. Uses principal component regression, tensor decomposition, and manifold alignment, to accurately identify even subtly shifted gene expression programs. See <doi:10.1016/j.patter.2020.100139> for more details. |
| Authors: | Daniel Osorio [aut, cre] (ORCID: <https://orcid.org/0000-0003-4424-8422>), Yan Zhong [aut, ctb], Guanxun Li [aut, ctb], Jianhua Huang [aut, ctb], James Cai [aut, ctb, ths] (ORCID: <https://orcid.org/0000-0002-8081-6725>) |
| Maintainer: | Daniel Osorio <[email protected]> |
| License: | GPL (>=2) |
| Version: | 1.4 |
| Built: | 2026-05-29 19:29:58 UTC |
| Source: | https://github.com/cailab-tamu/sctenifoldnet |
Canonical Polyadic (CP) decomposition of a tensor, aka
CANDECOMP/PARAFAC. Approximate a K-Tensor using a sum of
num_components rank-1 K-Tensors. A rank-1 K-Tensor can be written
as an outer product of K vectors. There are a total of
num_components * tnsr$num_modes vectors in the output, stored in
tnsr$num_modes matrices, each with num_components columns.
This is an iterative algorithm, with two possible stopping conditions:
either relative error in Frobenius norm has gotten below tol, or
the max_iter number of iterations has been reached. For more details
on CP decomposition, consult Kolda and Bader (2009).
cpDecomposition(tnsr, num_components = NULL, max_iter = 25, tol = 1e-05)cpDecomposition(tnsr, num_components = NULL, max_iter = 25, tol = 1e-05)
tnsr |
Tensor with K modes. |
num_components |
The number of rank-1 K-Tensors to use in approximation. |
max_iter |
Maximum number of iterations if error stays above |
tol |
Relative Frobenius norm error tolerance. |
Uses the Alternating Least Squares (ALS) estimation procedure. A progress bar is included to help monitor operations on large tensors.
A list containing the following
lambdasA vector of normalizing constants, one for each component.
UA list of matrices, one for each mode, each with num_components columns.
convWhether or not resid < tol by the last iteration.
norm_percentThe percent of Frobenius norm explained by the approximation.
estEstimate of tnsr after compression.
fnorm_residThe Frobenius norm of the error.
all_residsVector containing the Frobenius norm of error for all iterations.
T. Kolda, B. Bader, "Tensor decomposition and applications". SIAM Applied Mathematics and Applications 2009.
This function normalizes the count data present in a given matrix using counts per million normalization (CPM). Each gene count for each cell is divided by the total counts for that cell and multiplied by 1e6. No log-transformation is applied.
cpmNormalization(X, label = NULL)cpmNormalization(X, label = NULL)
X |
Raw counts matrix with cells as columns and genes (symbols) as rows. |
label |
Optional character label (e.g. "X", "Y") prepended to progress messages when running inside a pipeline. |
A dgCMatrix object with the count per million (CPM) normalized values.
Vallejos, Catalina A., et al. "Normalizing single-cell RNA sequencing data: challenges and opportunities." Nature methods 14.6 (2017): 565.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Performing Counts per million Normalization (CPM) normalizationOutput <- cpmNormalization(qcOutput) # Visualizing the differences oldPar <- par(no.readonly = TRUE) par( mfrow = c(1, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0) ) plot( Matrix::colSums(qcOutput), ylab = 'Library Size', xlab = 'Cell', main = 'Before CPM Normalization' ) plot( Matrix::colSums(normalizationOutput), ylab = 'Library Size', xlab = 'Cell', main = 'After CPM Normalization' ) par(oldPar)library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Performing Counts per million Normalization (CPM) normalizationOutput <- cpmNormalization(qcOutput) # Visualizing the differences oldPar <- par(no.readonly = TRUE) par( mfrow = c(1, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0) ) plot( Matrix::colSums(qcOutput), ylab = 'Library Size', xlab = 'Cell', main = 'Before CPM Normalization' ) plot( Matrix::colSums(normalizationOutput), ylab = 'Library Size', xlab = 'Cell', main = 'After CPM Normalization' ) par(oldPar)
Using the output of the non-linear manifold alignment, this function computes the Euclidean distance between the coordinates for the same gene in both conditions. Calculated distances are then transformed using Box-Cox power transformation, and standardized to ensure normality. P-values are assigned following the chi-square distribution over the fold-change of the squared distance computed with respect to the expectation.
dRegulation(manifoldOutput)dRegulation(manifoldOutput)
manifoldOutput |
A matrix. The output of the non-linear manifold alignment, a labeled matrix with two times the number of shared genes as rows (X_ genes followed by Y_ genes in the same order) and |
A data frame with 6 columns as follows:
gene A character vector with the gene id identified from the manifoldAlignment output.
distance A numeric vector of the Euclidean distance computed between the coordinates of the same gene in both conditions.
Z A numeric vector of the Z-scores computed after Box-Cox power transformation.
FC A numeric vector of the FC computed with respect to the expectation.
p.value A numeric vector of the p-values associated to the fold-changes, probabilities are asigned as using the Chi-square distribution with one degree of freedom.
p.adj A numeric vector of adjusted p-values using Benjamini & Hochberg (1995) FDR correction.
Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165-1188. doi: 10.1214/aos/1013699998.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells xNetworks <- makeNetworks(X = qcOutput, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Computing a K = 3 CANDECOMP/PARAFAC (CP) Tensor Decomposition tdOutput <- tensorDecomposition(xNetworks, K = 3, maxError = 1e5, maxIter = 1e3) ## Not run: # Computing the alignment # For this example, we are using the same input, the match should be perfect. maOutput <- manifoldAlignment(tdOutput$X, tdOutput$X) # Evaluating the difference in regulation dcOutput <- dRegulation(maOutput, minFC = 0) head(dcOutput) # Plotting # If FDR < 0.05, the gene will be colored in red. geneColor <- ifelse(dcOutput$p.adj < 0.05, 'red', 'black') qqnorm(dcOutput$Z, main = 'Standardized Distance', pch = 16, col = geneColor) qqline(dcOutput$Z) ## End(Not run)library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells xNetworks <- makeNetworks(X = qcOutput, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Computing a K = 3 CANDECOMP/PARAFAC (CP) Tensor Decomposition tdOutput <- tensorDecomposition(xNetworks, K = 3, maxError = 1e5, maxIter = 1e3) ## Not run: # Computing the alignment # For this example, we are using the same input, the match should be perfect. maOutput <- manifoldAlignment(tdOutput$X, tdOutput$X) # Evaluating the difference in regulation dcOutput <- dRegulation(maOutput, minFC = 0) head(dcOutput) # Plotting # If FDR < 0.05, the gene will be colored in red. geneColor <- ifelse(dcOutput$p.adj < 0.05, 'red', 'black') qqnorm(dcOutput$Z, main = 'Standardized Distance', pch = 16, col = geneColor) qqline(dcOutput$Z) ## End(Not run)
This function computes nNet gene regulatory networks for
a randomly selected subsample of nCells cells based on principal
component regression (PCR), a technique based on principal component
analysis. In PCR, the outcome variable is regressed over a nComp
number of principal components computed from a set of covariates to estimate
the unknown regression coefficients in the model. pcNet computes the
PCR coefficients for each gene one at a time using all the others as
covariates, to construct an all-by-all gene regulatory network.
makeNetworks( X, nNet = 10, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95, nCores = parallel::detectCores(), label = NULL )makeNetworks( X, nNet = 10, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95, nCores = parallel::detectCores(), label = NULL )
X |
A filtered and normlized gene expression matrix with cells as columns and genes as rows. |
nNet |
An integer value. The number of networks based on principal components regression to generate. |
nCells |
An integer value. The number of cells to subsample each time to generate a network. |
nComp |
An integer value. The number of principal components in PCA to generate the networks. Should be greater than 2 and lower than the total number of genes. |
scaleScores |
A boolean value ( |
symmetric |
A boolean value ( |
q |
A decimal value between 0 and 1. Represent the cut-off threshold of top q% relationships to be returned. |
nCores |
An integer value. Defines the number of cores to be used. |
label |
Optional character label (e.g. "X", "Y") prepended to progress messages when running inside a pipeline. |
Principal component regression may be broadly divided into three major steps:
Perform PCA on the observed covariates data matrix to obtain nComp number of the principal components.
Regress the observed vector of outcomes on the selected principal components as covariates, using ordinary least squares regression to get a vector of estimated regression coefficients
Transform this vector back to the scale of the actual covariates, using the eigenvectors corresponding to the selected principal components to get the final PCR estimator for estimating the regression coefficients characterizing the original model.
A list with nNet gene regulatory networks in dgCMatrix format. Each one computed from a randomly selected subsample of nCells cells.
Gill, Ryan, Somnath Datta, and Susmita Datta. "dna: An R package for differential network analysis." Bioinformation 10.4 (2014): 233.
Jolliffe, Ian T. "A note on the use of principal components in regression." Journal of the Royal Statistical Society: Series C (Applied Statistics) 31.3 (1982): 300-303.
Massy, William F. "Principal components regression in exploratory statistical research." Journal of the American Statistical Association 60.309 (1965): 234-256.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells mnOutput <- makeNetworks(X = X, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Verifying the class class(mnOutput) # Verifying the number of networks length(mnOutput) # Veryfying the dimention of the networks lapply(mnOutput,dim) # Single-cell gene regulatory networks mnOutput[[1]][1:10,1:10] mnOutput[[2]][1:10,1:10] mnOutput[[3]][1:10,1:10]library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells mnOutput <- makeNetworks(X = X, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Verifying the class class(mnOutput) # Verifying the number of networks length(mnOutput) # Veryfying the dimention of the networks lapply(mnOutput,dim) # Single-cell gene regulatory networks mnOutput[[1]][1:10,1:10] mnOutput[[2]][1:10,1:10] mnOutput[[3]][1:10,1:10]
Build comparable low-dimensional features for two weight-averaged
denoised single-cell gene regulatory networks. Using a non-linear network
embedding method manifoldAlignment aligns two gene regulatory networks
and finds the structural similarities between them. This function is a
wrapper of the Python code provided by Vu et al., (2012) at
https://github.com/all-umass/ManifoldWarping.
manifoldAlignment(X, Y, d = 30, nCores = parallel::detectCores())manifoldAlignment(X, Y, d = 30, nCores = parallel::detectCores())
X |
A gene regulatory network. |
Y |
A gene regulatory network. |
d |
The dimension of the low-dimensional feature space. |
nCores |
An integer value. Defines the number of cores to be used. |
Manifold alignment builds connections between two or more disparate data sets by aligning their underlying manifolds and provides knowledge transfer across the data sets. For further information please see: Wang et al., (2009)
A low-dimensional projection for two the two gene regulatory networks used as input. The output is a labeled matrix with two times the number of shared genes as rows ( X_ genes followed by Y_ genes in the same order) and d number of columns.
Vu, Hoa Trong, Clifton Carey, and Sridhar Mahadevan. "Manifold warping: Manifold alignment over time." Twenty-Sixth AAAI Conference on Artificial Intelligence. 2012.
Wang, Chang, and Sridhar Mahadevan. "A general framework for manifold alignment." 2009 AAAI Fall Symposium Series. 2009.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells xNetworks <- makeNetworks(X = X, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Computing a K = 3 CANDECOMP/PARAFAC (CP) Tensor Decomposition tdOutput <- tensorDecomposition(xNetworks, K = 3, maxError = 1e5, maxIter = 1e3) ## Not run: # Computing the alignment # For this example, we are using the same input, the match should be perfect. maOutput <- manifoldAlignment(tdOutput$X, tdOutput$X) # Separating the coordinates for each gene X <- maOutput[grepl('X_', rownames(maOutput)),] Y <- maOutput[grepl('Y_', rownames(maOutput)),] # Plotting # X Points plot(X, pch = 16) # Y Points points(Y, col = 'red') # Legend legend('topright', legend = c('X', 'Y'), col = c('black', 'red'), bty = 'n', pch = c(16,1), cex = 0.7) ## End(Not run)library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells xNetworks <- makeNetworks(X = X, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Computing a K = 3 CANDECOMP/PARAFAC (CP) Tensor Decomposition tdOutput <- tensorDecomposition(xNetworks, K = 3, maxError = 1e5, maxIter = 1e3) ## Not run: # Computing the alignment # For this example, we are using the same input, the match should be perfect. maOutput <- manifoldAlignment(tdOutput$X, tdOutput$X) # Separating the coordinates for each gene X <- maOutput[grepl('X_', rownames(maOutput)),] Y <- maOutput[grepl('Y_', rownames(maOutput)),] # Plotting # X Points plot(X, pch = 16) # Y Points points(Y, col = 'red') # Legend legend('topright', legend = c('X', 'Y'), col = c('black', 'red'), bty = 'n', pch = c(16,1), cex = 0.7) ## End(Not run)
Computes a gene co-expression network using Principal Component Regression. Each gene is regressed against principal components derived from all other genes.
pcNet( X, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0, verbose = FALSE, nCores = 1, useRcpp = TRUE )pcNet( X, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0, verbose = FALSE, nCores = 1, useRcpp = TRUE )
X |
A matrix with genes as rows and cells/samples as columns. Can be a regular matrix or dgCMatrix (sparse). Must have positive row sums (quality control applied). |
nComp |
Number of principal components to use for regression. Must be >= 2 and < number of genes. Default: 3. |
scaleScores |
If TRUE (default), scales output network by maximum absolute value to normalize edge weights to [-1, 1]. |
symmetric |
If TRUE, symmetrizes the network matrix as (A + t(A))/2. Default: FALSE. |
q |
Quantile threshold for filtering edges. Values below this quantile are set to zero. Range: [0, 1]. Default: 0 (no filtering). |
verbose |
If TRUE, prints progress updates. Default: FALSE. |
nCores |
Number of cores for parallelization. If > 1, uses future backend (multisession). Default: 1 (sequential). |
useRcpp |
If TRUE (default), uses compiled Rcpp backend for faster computation. Falls back to pure R if Rcpp backend unavailable. Default: TRUE. |
Algorithm: 1. Standardize input matrix (center and scale by columns) 2. For each gene K: a. Extract all other genes as design matrix (Xi) b. Compute truncated SVD of Xi to get nComp principal components c. Project gene K onto these components d. Perform OLS regression to get coefficients 3. Assemble coefficients into sparse network matrix 4. Apply optional scaling and filtering
A sparse matrix (Matrix::dgCMatrix) representing the gene network. Entry [i,j] contains the regression coefficient from gene i to gene j. Diagonal is always zero (no self-loops).
## Not run: # Create sample data X <- matrix(rpois(1000*50, 5), nrow=1000, ncol=50) rownames(X) <- paste0('Gene', 1:1000) # Compute network net <- pcNet(X, nComp=3, nCores=4) ## End(Not run)## Not run: # Create sample data X <- matrix(rpois(1000*50, 5), nrow=1000, ncol=50) rownames(X) <- paste0('Gene', 1:1000) # Compute network net <- pcNet(X, nComp=3, nCores=4) ## End(Not run)
This function performs quality control filters over the provided input matrix. It checks for minimum cell library size, mitochondrial ratio, outlier cells, and the fraction of cells where a gene is expressed.
scQC( X, minLibSize = 1000, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1, label = NULL )scQC( X, minLibSize = 1000, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1, label = NULL )
X |
Raw counts matrix with cells as columns and genes (symbols) as rows. |
minLibSize |
An integer value. Defines the minimum library size required for a cell to be included in the analysis. |
removeOutlierCells |
A boolean value ( |
minPCT |
A decimal value between 0 and 1. Defines the minimum fraction of cells where the gene needs to be expressed to be included in the analysis. |
maxMTratio |
A decimal value between 0 and 1. Defines the maximum ratio of mitochondrial reads (mitochondrial reads / library size) present in a cell to be included in the analysis. It's computed using the symbol genes starting with 'MT-' non-case sensitive. |
label |
Optional character label (e.g. "X", "Y") prepended to progress messages when running inside a pipeline. |
A dgCMatrix object with the cells and the genes that pass the quality control filters.
Ilicic, Tomislav, et al. "Classification of low quality cells from single-cell RNA-seq data." Genome biology 17.1 (2016): 29.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Visualizing the Differences oldPar <- par(no.readonly = TRUE) par( mfrow = c(2, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0) ) # Library Size plot( Matrix::colSums(X), ylim = c(20, 70), ylab = 'Library Size', xlab = 'Cell', main = 'Library Size - Before QC' ) abline(h = c(30, 58), lty = 2, col = 'red') plot( Matrix::colSums(qcOutput), ylim = c(20, 70), ylab = 'Library Size', xlab = 'Cell', main = 'Library Size - After QC' ) abline(h = c(30, 58), lty = 2, col = 'red') # Mitochondrial ratio mtGenes <- grepl('^mt-', rownames(X), ignore.case = TRUE) plot( Matrix::colSums(X[mtGenes,]) / Matrix::colSums(X), ylim = c(0, 0.3), ylab = 'Mitochondrial Ratio', xlab = 'Cell', main = 'Mitochondrial Ratio - Before QC' ) abline(h = c(0.1), lty = 2, col = 'red') plot( Matrix::colSums(qcOutput[mtGenes,]) / Matrix::colSums(qcOutput), ylim = c(0, 0.3), ylab = 'Mitochondrial Ratio', xlab = 'Cell', main = 'Mitochondrial Ratio - Before QC' ) abline(h = c(0.1), lty = 2, col = 'red') par(oldPar)library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Visualizing the Differences oldPar <- par(no.readonly = TRUE) par( mfrow = c(2, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0) ) # Library Size plot( Matrix::colSums(X), ylim = c(20, 70), ylab = 'Library Size', xlab = 'Cell', main = 'Library Size - Before QC' ) abline(h = c(30, 58), lty = 2, col = 'red') plot( Matrix::colSums(qcOutput), ylim = c(20, 70), ylab = 'Library Size', xlab = 'Cell', main = 'Library Size - After QC' ) abline(h = c(30, 58), lty = 2, col = 'red') # Mitochondrial ratio mtGenes <- grepl('^mt-', rownames(X), ignore.case = TRUE) plot( Matrix::colSums(X[mtGenes,]) / Matrix::colSums(X), ylim = c(0, 0.3), ylab = 'Mitochondrial Ratio', xlab = 'Cell', main = 'Mitochondrial Ratio - Before QC' ) abline(h = c(0.1), lty = 2, col = 'red') plot( Matrix::colSums(qcOutput[mtGenes,]) / Matrix::colSums(qcOutput), ylim = c(0, 0.3), ylab = 'Mitochondrial Ratio', xlab = 'Cell', main = 'Mitochondrial Ratio - Before QC' ) abline(h = c(0.1), lty = 2, col = 'red') par(oldPar)
Construct and compare single-cell gene regulatory networks (scGRNs) using single-cell RNA-seq (scRNA-seq) data sets collected from different conditions based on principal component regression, tensor decomposition, and manifold alignment.
scTenifoldNet( X, Y, qc = TRUE, qc_minLibSize = 1000, qc_removeOutlierCells = TRUE, qc_minPCT = 0.05, qc_maxMTratio = 0.1, nc_nNet = 10, nc_nCells = 500, nc_nComp = 3, nc_symmetric = FALSE, nc_scaleScores = TRUE, nc_q = 0.05, td_K = 3, td_nDecimal = 1, td_maxIter = 1000, td_maxError = 1e-05, ma_nDim = 30, nCores = parallel::detectCores() )scTenifoldNet( X, Y, qc = TRUE, qc_minLibSize = 1000, qc_removeOutlierCells = TRUE, qc_minPCT = 0.05, qc_maxMTratio = 0.1, nc_nNet = 10, nc_nCells = 500, nc_nComp = 3, nc_symmetric = FALSE, nc_scaleScores = TRUE, nc_q = 0.05, td_K = 3, td_nDecimal = 1, td_maxIter = 1000, td_maxError = 1e-05, ma_nDim = 30, nCores = parallel::detectCores() )
X |
Raw counts matrix with cells as columns and genes (symbols) as rows. |
Y |
Raw counts matrix with cells as columns and genes (symbols) as rows. |
qc |
A boolean value (TRUE/FALSE), if TRUE, a quality control is applied over the data. |
qc_minLibSize |
An integer value. Defines the minimum library size required for a cell to be included in the analysis. |
qc_removeOutlierCells |
A boolean value (TRUE/FALSE), if TRUE, the identified cells with library size greater than 1.58 IQR/sqrt(n) computed from the sample, are removed. For further details see: |
qc_minPCT |
A decimal value between 0 and 1. Defines the minimum fraction of cells where the gene needs to be expressed to be included in the analysis. |
qc_maxMTratio |
A decimal value between 0 and 1. Defines the maximum ratio of mitochondrial reads (mithocondrial reads / library size) present in a cell to be included in the analysis. It's computed using the symbol genes starting with 'MT-' non-case sensitive. |
nc_nNet |
An integer value. The number of networks based on principal components regression to generate. |
nc_nCells |
An integer value. The number of cells to subsample each time to generate a network. |
nc_nComp |
An integer value. The number of principal components in PCA to generate the networks. Should be greater than 2 and lower than the total number of genes. |
nc_symmetric |
A boolean value (TRUE/FALSE), if TRUE, the weights matrix returned will be symmetric. |
nc_scaleScores |
A boolean value (TRUE/FALSE), if TRUE, the weights will be normalized such that the maximum absolute value is 1. |
nc_q |
A decimal value between 0 and 1. Defines the cut-off threshold of top q% relationships to be returned. |
td_K |
An integer value. Defines the number of rank-one tensors used to approximate the data using CANDECOMP/PARAFAC (CP) Tensor Decomposition. |
td_nDecimal |
An integer value indicating the number of decimal places to be used. |
td_maxIter |
An integer value. Defines the maximum number of iterations if error stay above |
td_maxError |
A decimal value between 0 and 1. Defines the relative Frobenius norm error tolerance. |
ma_nDim |
An integer value. Defines the number of dimensions of the low-dimensional feature space to be returned from the non-linear manifold alignment. |
nCores |
An integer value. Defines the number of cores to be used. |
A list with 3 slots as follows:
tensorNetworks: The generated weight-averaged denoised gene regulatory networks using CANDECOMP/PARAFAC (CP) Tensor Decomposition.
manifoldAlignment: The generated low-dimensional features result of the non-linear manifold alignment.
diffRegulation The results of the differential regulation analysis.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Generating a perturbed network modifying the expression of genes 10, 2 and 3 Y <- X Y[10,] <- Y[50,] Y[2,] <- Y[11,] Y[3,] <- Y[5,] ## Not run: # scTenifoldNet Output <- scTenifoldNet(X = X, Y = Y, nc_nNet = 10, nc_nCells = 500, td_K = 3, qc_minLibSize = 30) # Structure of the output str(Output) # Accessing the computed weight-averaged denoised gene regulatory networks # Network for sample X igraph::graph_from_adjacency_matrix(adjmatrix = Output$tensorNetworks$X, weighted = TRUE) # IGRAPH 15cbeea DNW- 100 2836 -- # + attr: name (v/c), weight (e/n) # + edges from 15cbeea (vertex names): # [1] ng6 ->ng1 ng12->ng1 ng14->ng1 ng24->ng1 ng28->ng1 # [6] ng31->ng1 ng42->ng1 ng44->ng1 ng49->ng1 ng55->ng1 # [11] ng56->ng1 ng59->ng1 ng62->ng1 ng63->ng1 ng72->ng1 # [16] ng73->ng1 ng74->ng1 ng77->ng1 ng80->ng1 ng82->ng1 # [21] ng83->ng1 ng87->ng1 ng89->ng1 mt-1->ng1 mt-5->ng1 # [26] mt-7->ng1 ng27->ng3 ng28->ng3 ng31->ng3 ng32->ng3 # [31] ng44->ng3 ng59->ng3 ng62->ng3 ng72->ng3 ng73->ng3 # [36] ng74->ng3 ng77->ng3 ng82->ng3 ng87->ng3 ng89->ng3 # + ... omitted several edges # Network for sample Y igraph::graph_from_adjacency_matrix(adjmatrix = Output$tensorNetworks$Y, weighted = TRUE) #IGRAPH 3ad1533 DNW- 100 725 -- # + attr: name (v/c), weight (e/n) # + edges from 3ad1533 (vertex names): # [1] ng2 ->ng2 ng3 ->ng2 ng5 ->ng2 ng6 ->ng2 # [5] ng7 ->ng2 ng8 ->ng2 ng9 ->ng2 ng10->ng2 # [9] ng11->ng2 ng12->ng2 ng13->ng2 ng15->ng2 # [13] ng16->ng2 ng17->ng2 ng18->ng2 ng20->ng2 # [17] ng21->ng2 ng22->ng2 ng23->ng2 ng24->ng2 # [21] ng25->ng2 ng26->ng2 ng28->ng2 ng29->ng2 # [25] ng30->ng2 ng31->ng2 ng33->ng2 ng34->ng2 # [29] ng35->ng2 ng36->ng2 ng38->ng2 ng39->ng2 # + ... omitted several edges # Accessing the manifold alignment result head(Output$manifoldAlignment) # NLMA 1 NLMA 2 NLMA 3 NLMA 4 NLMA 5 # X_ng1 0.0068499391 0.01096706 0.03077900 0.002655469 -0.0136455614 # X_ng2 0.3356288575 -0.03551752 -0.18463680 -0.193353751 0.3398606363 # X_ng3 -0.1285177133 -0.20064344 0.20926567 0.059542294 -0.0099528441 # X_ng4 0.0029881645 -0.01267593 0.01195683 0.007331123 0.0003031888 # X_ng5 -0.1192632208 -0.18475439 0.27616148 0.112944009 -0.0281827702 # X_ng6 0.0005911568 0.02557475 0.07527792 -0.191180647 -0.1165095115 # NLMA 6 NLMA 7 NLMA 8 NLMA 9 NLMA 10 # X_ng1 -0.029852128 0.007539925 0.009299591 -0.009813157 -0.01360414 # X_ng2 -0.313361443 0.146429589 0.006286777 0.162023788 -0.04307899 # X_ng3 -0.008733285 0.172084611 0.508056218 0.199322512 -0.07935797 # X_ng4 -0.004680652 0.005344541 0.002634755 -0.003376544 -0.01100757 # X_ng5 -0.126328797 0.190769152 -0.468107666 0.170278281 -0.06744795 # X_ng6 -0.051266264 0.063822269 0.011060924 -0.134880459 -0.02579998 # NLMA 11 NLMA 12 NLMA 13 NLMA 14 NLMA 15 # X_ng1 -0.0199528840 0.008035130 0.004631187 0.000807797 0.011960838 # X_ng2 -0.0138200390 -0.002847701 -0.004404942 0.008024704 0.006040799 # X_ng3 0.0232384468 -0.031398116 -0.007026934 0.028956700 -0.002112626 # X_ng4 0.0012864539 -0.018915289 0.003835404 0.004054159 -0.002546324 # X_ng5 0.0232899093 -0.040974531 -0.006759459 0.025415953 -0.007518957 # X_ng6 -0.0001650355 0.023277338 0.006646904 -0.002683418 -0.112688129 # NLMA 16 NLMA 17 NLMA 18 NLMA 19 NLMA 20 # X_ng1 -0.016962988 -0.016649748 0.01140020 -0.006632691 -0.0005015655 # X_ng2 0.007543775 -0.016188689 0.02517684 0.014814415 0.0162617154 # X_ng3 -0.005598267 -0.006975026 0.05218029 0.006731063 0.0183436415 # X_ng4 0.003207934 -0.001784120 0.01093237 -0.001192860 0.0028746990 # X_ng5 -0.009555879 -0.007429166 0.05206441 0.006534604 0.0170071357 # X_ng6 -0.065437425 0.110728870 -0.12746932 0.335610531 0.1341842827 # NLMA 21 NLMA 22 NLMA 23 NLMA 24 NLMA 25 # X_ng1 0.003113385 -0.023311350 -0.026415944 7.085995e-04 0.053898102 # X_ng2 0.001390569 0.001191301 -0.015621435 2.359703e-03 -0.013418093 # X_ng3 -0.007483171 0.011496519 0.004164546 2.764407e-02 -0.004527981 # X_ng4 0.020316634 -0.002796092 0.032119363 4.203867e-05 -0.002251366 # X_ng5 -0.004963436 0.016525449 0.009683698 2.564700e-02 0.002286340 # X_ng6 0.229199525 0.340639745 -0.041216345 3.599596e-03 0.008572652 # NLMA 26 NLMA 27 NLMA 28 NLMA 29 NLMA 30 # X_ng1 0.065832029 -0.0080248854 0.107300843 -0.02902323 -0.005337500 # X_ng2 -0.007982259 -0.0026295392 -0.001765851 0.01491257 -0.003546343 # X_ng3 0.009770602 0.0008819272 0.014564070 -0.01568192 -0.017450667 # X_ng4 0.015802609 0.0012975576 -0.003406675 -0.01774975 -0.003300053 # X_ng5 0.003401007 0.0001761177 0.013622016 -0.01224127 -0.013909178 # X_ng6 -0.089450710 -0.0763838722 -0.107751916 -0.05841353 -0.059217012 # Differential Regulation head(Output$diffRegulation,n = 10) # gene distance Z FC p.value p.adj # 2 ng2 0.023526702 2.762449 12.193413 0.0004795855 0.02414332 # 50 ng50 0.023514429 2.761550 12.180695 0.0004828665 0.02414332 # 11 ng11 0.022443941 2.681598 11.096894 0.0008647241 0.02882414 # 3 ng3 0.020263415 2.508478 9.045415 0.0026335445 0.06583861 # 10 ng10 0.019194561 2.417929 8.116328 0.0043868326 0.07711821 # 5 ng5 0.019079975 2.407977 8.019712 0.0046270923 0.07711821 # 31 ng31 0.013632541 1.865506 4.094085 0.0430335257 0.61476465 # 96 mt-6 0.011401177 1.589757 2.863536 0.0906081350 0.90977795 # 59 ng59 0.009835354 1.368238 2.130999 0.1443466682 0.90977795 # 62 ng62 0.007995812 1.067193 1.408408 0.2353209153 0.90977795 # Plotting # Genes with FDR < 0.1 are labeled as red set.seed(1) qChisq <- rchisq(100,1) geneColor <- rev(ifelse(Output$diffRegulation$p.adj < 0.1, 10,1)) qqplot(qChisq, Output$diffRegulation$FC, pch = 16, main = 'H0', col = geneColor, xlab = expression(X^2~Quantiles), ylab = 'FC', xlim=c(0,8), ylim=c(0,13)) qqline(qChisq) legend('bottomright', legend = c('FDR < 0.1'), pch = 16, col = 'red', bty='n', cex = 0.7) ## End(Not run)library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Generating a perturbed network modifying the expression of genes 10, 2 and 3 Y <- X Y[10,] <- Y[50,] Y[2,] <- Y[11,] Y[3,] <- Y[5,] ## Not run: # scTenifoldNet Output <- scTenifoldNet(X = X, Y = Y, nc_nNet = 10, nc_nCells = 500, td_K = 3, qc_minLibSize = 30) # Structure of the output str(Output) # Accessing the computed weight-averaged denoised gene regulatory networks # Network for sample X igraph::graph_from_adjacency_matrix(adjmatrix = Output$tensorNetworks$X, weighted = TRUE) # IGRAPH 15cbeea DNW- 100 2836 -- # + attr: name (v/c), weight (e/n) # + edges from 15cbeea (vertex names): # [1] ng6 ->ng1 ng12->ng1 ng14->ng1 ng24->ng1 ng28->ng1 # [6] ng31->ng1 ng42->ng1 ng44->ng1 ng49->ng1 ng55->ng1 # [11] ng56->ng1 ng59->ng1 ng62->ng1 ng63->ng1 ng72->ng1 # [16] ng73->ng1 ng74->ng1 ng77->ng1 ng80->ng1 ng82->ng1 # [21] ng83->ng1 ng87->ng1 ng89->ng1 mt-1->ng1 mt-5->ng1 # [26] mt-7->ng1 ng27->ng3 ng28->ng3 ng31->ng3 ng32->ng3 # [31] ng44->ng3 ng59->ng3 ng62->ng3 ng72->ng3 ng73->ng3 # [36] ng74->ng3 ng77->ng3 ng82->ng3 ng87->ng3 ng89->ng3 # + ... omitted several edges # Network for sample Y igraph::graph_from_adjacency_matrix(adjmatrix = Output$tensorNetworks$Y, weighted = TRUE) #IGRAPH 3ad1533 DNW- 100 725 -- # + attr: name (v/c), weight (e/n) # + edges from 3ad1533 (vertex names): # [1] ng2 ->ng2 ng3 ->ng2 ng5 ->ng2 ng6 ->ng2 # [5] ng7 ->ng2 ng8 ->ng2 ng9 ->ng2 ng10->ng2 # [9] ng11->ng2 ng12->ng2 ng13->ng2 ng15->ng2 # [13] ng16->ng2 ng17->ng2 ng18->ng2 ng20->ng2 # [17] ng21->ng2 ng22->ng2 ng23->ng2 ng24->ng2 # [21] ng25->ng2 ng26->ng2 ng28->ng2 ng29->ng2 # [25] ng30->ng2 ng31->ng2 ng33->ng2 ng34->ng2 # [29] ng35->ng2 ng36->ng2 ng38->ng2 ng39->ng2 # + ... omitted several edges # Accessing the manifold alignment result head(Output$manifoldAlignment) # NLMA 1 NLMA 2 NLMA 3 NLMA 4 NLMA 5 # X_ng1 0.0068499391 0.01096706 0.03077900 0.002655469 -0.0136455614 # X_ng2 0.3356288575 -0.03551752 -0.18463680 -0.193353751 0.3398606363 # X_ng3 -0.1285177133 -0.20064344 0.20926567 0.059542294 -0.0099528441 # X_ng4 0.0029881645 -0.01267593 0.01195683 0.007331123 0.0003031888 # X_ng5 -0.1192632208 -0.18475439 0.27616148 0.112944009 -0.0281827702 # X_ng6 0.0005911568 0.02557475 0.07527792 -0.191180647 -0.1165095115 # NLMA 6 NLMA 7 NLMA 8 NLMA 9 NLMA 10 # X_ng1 -0.029852128 0.007539925 0.009299591 -0.009813157 -0.01360414 # X_ng2 -0.313361443 0.146429589 0.006286777 0.162023788 -0.04307899 # X_ng3 -0.008733285 0.172084611 0.508056218 0.199322512 -0.07935797 # X_ng4 -0.004680652 0.005344541 0.002634755 -0.003376544 -0.01100757 # X_ng5 -0.126328797 0.190769152 -0.468107666 0.170278281 -0.06744795 # X_ng6 -0.051266264 0.063822269 0.011060924 -0.134880459 -0.02579998 # NLMA 11 NLMA 12 NLMA 13 NLMA 14 NLMA 15 # X_ng1 -0.0199528840 0.008035130 0.004631187 0.000807797 0.011960838 # X_ng2 -0.0138200390 -0.002847701 -0.004404942 0.008024704 0.006040799 # X_ng3 0.0232384468 -0.031398116 -0.007026934 0.028956700 -0.002112626 # X_ng4 0.0012864539 -0.018915289 0.003835404 0.004054159 -0.002546324 # X_ng5 0.0232899093 -0.040974531 -0.006759459 0.025415953 -0.007518957 # X_ng6 -0.0001650355 0.023277338 0.006646904 -0.002683418 -0.112688129 # NLMA 16 NLMA 17 NLMA 18 NLMA 19 NLMA 20 # X_ng1 -0.016962988 -0.016649748 0.01140020 -0.006632691 -0.0005015655 # X_ng2 0.007543775 -0.016188689 0.02517684 0.014814415 0.0162617154 # X_ng3 -0.005598267 -0.006975026 0.05218029 0.006731063 0.0183436415 # X_ng4 0.003207934 -0.001784120 0.01093237 -0.001192860 0.0028746990 # X_ng5 -0.009555879 -0.007429166 0.05206441 0.006534604 0.0170071357 # X_ng6 -0.065437425 0.110728870 -0.12746932 0.335610531 0.1341842827 # NLMA 21 NLMA 22 NLMA 23 NLMA 24 NLMA 25 # X_ng1 0.003113385 -0.023311350 -0.026415944 7.085995e-04 0.053898102 # X_ng2 0.001390569 0.001191301 -0.015621435 2.359703e-03 -0.013418093 # X_ng3 -0.007483171 0.011496519 0.004164546 2.764407e-02 -0.004527981 # X_ng4 0.020316634 -0.002796092 0.032119363 4.203867e-05 -0.002251366 # X_ng5 -0.004963436 0.016525449 0.009683698 2.564700e-02 0.002286340 # X_ng6 0.229199525 0.340639745 -0.041216345 3.599596e-03 0.008572652 # NLMA 26 NLMA 27 NLMA 28 NLMA 29 NLMA 30 # X_ng1 0.065832029 -0.0080248854 0.107300843 -0.02902323 -0.005337500 # X_ng2 -0.007982259 -0.0026295392 -0.001765851 0.01491257 -0.003546343 # X_ng3 0.009770602 0.0008819272 0.014564070 -0.01568192 -0.017450667 # X_ng4 0.015802609 0.0012975576 -0.003406675 -0.01774975 -0.003300053 # X_ng5 0.003401007 0.0001761177 0.013622016 -0.01224127 -0.013909178 # X_ng6 -0.089450710 -0.0763838722 -0.107751916 -0.05841353 -0.059217012 # Differential Regulation head(Output$diffRegulation,n = 10) # gene distance Z FC p.value p.adj # 2 ng2 0.023526702 2.762449 12.193413 0.0004795855 0.02414332 # 50 ng50 0.023514429 2.761550 12.180695 0.0004828665 0.02414332 # 11 ng11 0.022443941 2.681598 11.096894 0.0008647241 0.02882414 # 3 ng3 0.020263415 2.508478 9.045415 0.0026335445 0.06583861 # 10 ng10 0.019194561 2.417929 8.116328 0.0043868326 0.07711821 # 5 ng5 0.019079975 2.407977 8.019712 0.0046270923 0.07711821 # 31 ng31 0.013632541 1.865506 4.094085 0.0430335257 0.61476465 # 96 mt-6 0.011401177 1.589757 2.863536 0.0906081350 0.90977795 # 59 ng59 0.009835354 1.368238 2.130999 0.1443466682 0.90977795 # 62 ng62 0.007995812 1.067193 1.408408 0.2353209153 0.90977795 # Plotting # Genes with FDR < 0.1 are labeled as red set.seed(1) qChisq <- rchisq(100,1) geneColor <- rev(ifelse(Output$diffRegulation$p.adj < 0.1, 10,1)) qqplot(qChisq, Output$diffRegulation$FC, pch = 16, main = 'H0', col = geneColor, xlab = expression(X^2~Quantiles), ylab = 'FC', xlim=c(0,8), ylim=c(0,13)) qqline(qChisq) legend('bottomright', legend = c('FDR < 0.1'), pch = 16, col = 'red', bty='n', cex = 0.7) ## End(Not run)
Generate weight-averaged denoised gene regulatory networks using CANDECOMP/PARAFAC (CP) Tensor Decomposition. This function takes one or two lists of gene regulatory matrices. If two lists are provided, the shared genes are selected and the CP tensor decomposition is performed independently for each list (3d-tensor). The tensor decomposed matrices are then averaged to generate weight-averaged denoised networks.
tensorDecomposition( xList, yList = NULL, nDecimal = 1, K = 5, maxError = 1e-05, maxIter = 1000 )tensorDecomposition( xList, yList = NULL, nDecimal = 1, K = 5, maxError = 1e-05, maxIter = 1000 )
xList |
A list of gene regulatory networks. |
yList |
Optional. A list of gene regulatory networks. |
nDecimal |
An integer value indicating the number of decimal places to be used. |
K |
The number of rank-one tensors used to approximate the data using CANDECOMP/PARAFAC (CP) Tensor Decomposition, |
maxError |
A decimal value between 0 and 1. Defines the relative Frobenius norm error tolerance |
maxIter |
An integer value. Defines the maximum number of iterations if error stay above |
CANDECOMP/PARAFRAC (CP) tensor decomposition approximate a K-Tensor using a sum of K rank-1 K-Tensors. A rank-1 K-Tensor can be written as an outer product of K vectors. This is an iterative algorithm, with two possible stopping conditions: either relative error in Frobenius norm has gotten below maxError, or the maxIter number of iterations has been reached. For more details on CP decomposition, consult Kolda and Bader (2009) and Morup (2011).
A list of weight-averaged denoised gene regulatory networks.
This is an adaptation of the code provided by Li, J., Bien, J., & Wells, M. T. (2018)
Li, J., Bien, J., & Wells, M. T. (2018). rTensor: An R Package for Multidimensional Array (Tensor) Unfolding, Multiplication, and Decomposition. Journal of Statistical Software, 87(10), 1-31.
Kolda, Tamara G., and Brett W. Bader. "Tensor decompositions and applications." SIAM review 51.3 (2009): 455-500.
Morup, Morten. "Applications of tensor (multiway array) factorizations and decompositions in data mining." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1.1 (2011): 24-40.
library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells mnOutput <- makeNetworks(X = X, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Computing a K = 3 CANDECOMP/PARAFAC (CP) Tensor Decomposition tdOutput <- tensorDecomposition(mnOutput, K = 3, maxError = 1e5, maxIter = 1e3) # Verifying the number of networks length(tdOutput) # Veryfying the dimention of the networks lapply(tdOutput,dim) # Weight-averaged denoised single-cell gene regulatory networks tdOutput[[1]][1:10,1:10]library(scTenifoldNet) # Simulating of a dataset following a negative binomial distribution with high sparcity (~67%) nCells = 2000 nGenes = 100 set.seed(1) X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98) X <- round(X) X <- matrix(X, ncol = nCells) rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10)) # Performing Single cell quality control qcOutput <- scQC( X = X, minLibSize = 30, removeOutlierCells = TRUE, minPCT = 0.05, maxMTratio = 0.1 ) # Computing 3 single-cell gene regulatory networks each one from a subsample of 500 cells mnOutput <- makeNetworks(X = X, nNet = 3, nCells = 500, nComp = 3, scaleScores = TRUE, symmetric = FALSE, q = 0.95 ) # Computing a K = 3 CANDECOMP/PARAFAC (CP) Tensor Decomposition tdOutput <- tensorDecomposition(mnOutput, K = 3, maxError = 1e5, maxIter = 1e3) # Verifying the number of networks length(tdOutput) # Veryfying the dimention of the networks lapply(tdOutput,dim) # Weight-averaged denoised single-cell gene regulatory networks tdOutput[[1]][1:10,1:10]