TrajAtlas.TrajDiff.Tdiff#

class TrajAtlas.TrajDiff.Tdiff[source]#

Mudata class for differential pseudotime analysis.

See also

Methods table#

add_covariate_to_nhoods_var(mdata, ...[, ...])

Add covariate from cell-level obs to sample-level obs.

add_nhood_expression(mdata[, layer, feature_key])

Calculates the mean expression in neighbourhoods of each feature.

annotate_nhoods(mdata, anno_col[, feature_key])

Assigns a categorical label to neighbourhoods, based on the most frequent label among cells in each neighbourhood.

annotate_nhoods_continuous(mdata, anno_col)

Assigns a continuous value to neighbourhoods, based on mean cell level covariate stored in adata.obs.

build_nhood_graph(mdata[, basis, feature_key])

Build graph of neighbourhoods used for visualization of DA results

count_nhoods(data, sample_col[, feature_key])

Builds a sample-level AnnData object storing the matrix of cell counts per sample per neighbourhood.

da(mdata, design[, time_col, ...])

Differential abundance pipeline.

da_expression(mdata, design[, ...])

Performs differential expression testing on neighbourhoods using QLF test implementation as implemented in edgeR.

da_expression_overall(mdata, design[, ...])

Performs differential expression testing on neighbourhoods using QLF test implementation as implemented in edgeR.

de(mdata, design[, time_col, ...])

Differential expression pipeline.

load(input[, feature_key])

Prepare a MuData object for subsequent processing.

make_nhoods(data[, neighbors_key, ...])

Randomly sample vertices on a KNN graph to define neighbourhoods of cells.

make_pseudobulk_parallel(mdata[, min_cell, ...])

Make pseudobulk within each neighborhoods.

make_single_cpm(mdata[, feature_key, ...])

permute expression matrix

make_whole_cpm(mdata[, fix_libsize, ...])

perform CPM in all sample

permute_point_cpm(mdata[, n])

permute_point_cpm_parallel(mdata[, mode, n, ...])

plotDAheatmap(mdata[, vmax, vmin, ...])

Plot differential abundance heatmap.

plotDE(mdata[, genes, row_cluster, ...])

Plot heatmap to display differential gene expression between two group.

Methods#

add_covariate_to_nhoods_var#

Tdiff.add_covariate_to_nhoods_var(mdata, new_covariates, feature_key='rna')[source]#

Add covariate from cell-level obs to sample-level obs. These should be covariates for which a single value can be assigned to each sample.

Parameters:
  • mdata – MuData object

  • new_covariates – columns in tdata[feature_key].obs to add to tdata[‘tdiff’].obs.

  • feature_key – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

Returns:#

:

None, adds columns to tdata[‘tdiff’] in place

add_nhood_expression#

Tdiff.add_nhood_expression(mdata, layer=None, feature_key='rna')[source]#

Calculates the mean expression in neighbourhoods of each feature.

Parameters:
  • mdata (MuData) – MuData object

  • layer (str | None) – If provided, use tdata[feature_key][layer] as expression matrix instead of tdata[feature_key].X. Defaults to None.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

Returns:#

:

Updates adata in place to store the matrix of average expression in each neighbourhood in tdata[‘tdiff’].varm[‘expr’]

annotate_nhoods#

Tdiff.annotate_nhoods(mdata, anno_col, feature_key='rna')[source]#

Assigns a categorical label to neighbourhoods, based on the most frequent label among cells in each neighbourhood. This can be useful to stratify DA testing results by cell types or samples.

Parameters:
  • mdata (MuData) – MuData object

  • anno_col (str) – Column in adata.obs containing the cell annotations to use for nhood labelling

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

Returns:#

:

None. Adds in place: - tdata[‘tdiff’].var[“nhood_annotation”]: assigning a label to each nhood - tdata[‘tdiff’].var[“nhood_annotation_frac”] stores the fraciton of cells in the neighbourhood with the assigned label - tdata[‘tdiff’].varm[‘frac_annotation’]: stores the fraction of cells from each label in each nhood - tdata[‘tdiff’].uns[“annotation_labels”]: stores the column names for tdata[‘tdiff’].varm[‘frac_annotation’]

annotate_nhoods_continuous#

Tdiff.annotate_nhoods_continuous(mdata, anno_col, feature_key='rna')[source]#

Assigns a continuous value to neighbourhoods, based on mean cell level covariate stored in adata.obs. This can be useful to correlate DA log-foldChanges with continuous covariates such as pseudotime, gene expression scores etc…

Parameters:
  • mdata (MuData) – MuData object

  • anno_col (str) – Column in adata.obs containing the cell annotations to use for nhood labelling

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

Returns:#

:

None. Adds in place: - tdata[‘tdiff’].var[“nhood_{anno_col}”]: assigning a continuous value to each nhood

build_nhood_graph#

Tdiff.build_nhood_graph(mdata, basis='X_umap', feature_key='rna')[source]#

Build graph of neighbourhoods used for visualization of DA results

Parameters:
  • mdata (MuData) – MuData object

  • basis (str) – Name of the obsm basis to use for layout of neighbourhoods (key in adata.obsm). Defaults to “X_umap”.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

Returns:#

:
  • tdata[‘tdiff’].varp[‘nhood_connectivities’]: graph of overlap between neighbourhoods (i.e. no of shared cells)

  • tdata[‘tdiff’].var[“Nhood_size”]: number of cells in neighbourhoods

count_nhoods#

Tdiff.count_nhoods(data, sample_col, feature_key='rna')[source]#

Builds a sample-level AnnData object storing the matrix of cell counts per sample per neighbourhood.

Parameters:
  • data (AnnData | MuData) – AnnData object with neighbourhoods defined in obsm[‘nhoods’] or MuData object with a modality with neighbourhoods defined in obsm[‘nhoods’]

  • sample_col (str) – Keys in obs that you store sample information.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. (default: ‘rna’)

Returns:#

:

MuData object storing the original (i.e. rna) AnnData in mudata[feature_key] and the compositional anndata storing the neighbourhood cell counts in mudata[‘tdiff’]. Here: - mudata[‘tdiff’].obs_names are samples (defined from adata.obs[‘sample_col’]) - mudata[‘tdiff’].var_names are neighbourhoods - mudata[‘tdiff’].X is the matrix counting the number of cells from each sample in each neighbourhood

da#

Tdiff.da(mdata, design, time_col='pseduoPred', model_contrasts=None, subset_samples=None, add_intercept=True, feature_key='rna', shuffle_times=20, FDR_threshold=0.05)[source]#

Differential abundance pipeline.

Parameters:
  • mdata – AnnData object with neighbourhoods defined in obsm[‘nhoods’] or MuData object with a modality with neighbourhoods defined in obsm[‘nhoods’]

  • design (str) – Formula for the test, following glm syntax from R (e.g. ‘~ condition’). Terms should be columns in tdiff[feature_key].obs.

  • model_contrasts (str | None) – A string vector that defines the contrasts used to perform DA testing, following glm syntax from R (e.g. “conditionDisease - conditionControl”). If no contrast is specified (default), then the last categorical level in condition of interest is used as the test group. Defaults to None.

  • subset_samples (list[str] | None) – subset of samples (obs in tdata[‘tdiff’]) to use for the test. Defaults to None.

  • add_intercept (bool) – whether to include an intercept in the model. If False, this is equivalent to adding + 0 in the design formula. When model_contrasts is specified, this is set to False by default. Defaults to True.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. (default: ‘rna’)

  • shuffle_times (int | None) – Times to randomly shuffle sample between two groups to get lambda in bionomal distribution.

  • FDR_threshold (int) – False discover rate to identify significant genes.

  • time_col (str | None) –

Returns:#

:

MuData object storing the differential test statics.

da_expression#

Tdiff.da_expression(mdata, design, model_contrasts=None, subset_samples=None, add_intercept=True, feature_key='rna', shuffle=False, fix_libsize=False, njob=-1)[source]#

Performs differential expression testing on neighbourhoods using QLF test implementation as implemented in edgeR.

Parameters:
  • mdata (MuData) – MuData object

  • design (str) –

    Formula for the test, following glm syntax from R (e.g. ‘~ condition’).

    Terms should be columns in tdata[feature_key].obs.

  • model_contrasts (str | None) –

    A string vector that defines the contrasts used to perform DA testing, following glm syntax from R (e.g. “conditionDisease - conditionControl”).

    If no contrast is specified (default), then the last categorical level in condition of interest is used as the test group. Defaults to None.

  • subset_samples (list[str] | None) – subset of samples (obs in tdata[‘tdiff’]) to use for the test. Defaults to None.

  • add_intercept (bool) – whether to include an intercept in the model. If False, this is equivalent to adding + 0 in the design formula. When model_contrasts is specified, this is set to False by default. Defaults to True.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

  • shuffle (bool) –

  • njob (int) –

Returns:#

:

None, modifies tdata[‘tdiff’] in place, adding the results of the DA test to .var: - logFC stores the log fold change in cell abundance (coefficient from the GLM) - PValue stores the p-value for the QLF test before multiple testing correction - SpatialFDR stores the the p-value adjusted for multiple testing to limit the false discovery rate,

calculated with weighted Benjamini-Hochberg procedure

da_expression_overall#

Tdiff.da_expression_overall(mdata, design, model_contrasts=None, subset_samples=None, add_intercept=True, feature_key='rna', fix_libsize=False)[source]#

Performs differential expression testing on neighbourhoods using QLF test implementation as implemented in edgeR.

Parameters:
  • mdata (MuData) – MuData object

  • design (str) –

    Formula for the test, following glm syntax from R (e.g. ‘~ condition’).

    Terms should be columns in tdata[feature_key].obs.

  • model_contrasts (str | None) –

    A string vector that defines the contrasts used to perform DA testing, following glm syntax from R (e.g. “conditionDisease - conditionControl”).

    If no contrast is specified (default), then the last categorical level in condition of interest is used as the test group. Defaults to None.

  • subset_samples (list[str] | None) – subset of samples (obs in tdata[‘tdiff’]) to use for the test. Defaults to None.

  • add_intercept (bool) – whether to include an intercept in the model. If False, this is equivalent to adding + 0 in the design formula. When model_contrasts is specified, this is set to False by default. Defaults to True.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

  • fix_libsize (bool) –

Returns:#

:

None, modifies tdata[‘tdiff’] in place, adding the results of the DA test to .var: - logFC stores the log fold change in cell abundance (coefficient from the GLM) - PValue stores the p-value for the QLF test before multiple testing correction - SpatialFDR stores the the p-value adjusted for multiple testing to limit the false discovery rate,

calculated with weighted Benjamini-Hochberg procedure

de#

Tdiff.de(mdata, design, time_col='pseduoPred', model_contrasts=None, subset_samples=None, add_intercept=True, feature_key='rna', shuffle_times=20, FDR=0.05)[source]#

Differential expression pipeline.

See also

differential genes.

Parameters:
  • mdata – MuData object with tdiff modal and pseudobulk modal.

  • design (str) – Formula for the test, following glm syntax from R (e.g. ‘~ condition’). Terms should be columns in tdiff[feature_key].obs.

  • model_contrasts (str | None) – A string vector that defines the contrasts used to perform DA testing, following glm syntax from R (e.g. “conditionDisease - conditionControl”). If no contrast is specified (default), then the last categorical level in condition of interest is used as the test group. Defaults to None.

  • subset_samples (list[str] | None) – subset of samples (obs in tdata[‘tdiff’]) to use for the test. Defaults to None.

  • add_intercept (bool) – whether to include an intercept in the model. If False, this is equivalent to adding + 0 in the design formula. When model_contrasts is specified, this is set to False by default. Defaults to True.

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. (default: ‘rna’)

  • shuffle_times (int | None) – Times to randomly shuffle sample between two groups to get lambda in bionomal distribution.

  • FDR (int) – False discover rate to identify significant genes.

  • time_col (str | None) –

Returns:#

:

MuData object storing the differential test statics.

load#

Tdiff.load(input, feature_key='rna')[source]#

Prepare a MuData object for subsequent processing.

Parameters:
  • input (AnnData) – AnnData.

  • feature_key (str | None) – Key to store the cell-level AnnData object in the MuData object

Return type:

MuData

Returns:#

:

MuData: MuData object with original AnnData (default is mudata[feature_key]).

make_nhoods#

Tdiff.make_nhoods(data, neighbors_key=None, feature_key='rna', prop=0.1, seed=0, copy=False)[source]#

Randomly sample vertices on a KNN graph to define neighbourhoods of cells.

The set of neighborhoods get refined by computing the median profile for the neighbourhood in reduced dimensional space and by selecting the nearest vertex to this position. Thus, multiple neighbourhoods may be collapsed to prevent over-sampling the graph space.

Parameters:
  • data (AnnData | MuData) – AnnData object with KNN graph defined in obsp or MuData object with a modality with KNN graph defined in obsp

  • neighbors_key (str | None) – The key in adata.obsp or mdata[feature_key].obsp to use as KNN graph. If not specified, make_nhoods looks .obsp[‘connectivities’] for connectivities (default storage places for scanpy.pp.neighbors). If specified, it looks at .obsp[.uns[neighbors_key][‘connectivities_key’]] for connectivities. (default: None)

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. (default: ‘rna’)

  • prop (float) – Fraction of cells to sample for neighbourhood index search. (default: 0.1)

  • seed (int) – Random seed for cell sampling. (default: 0)

  • copy (bool) – Determines whether a copy of the adata is returned. (default: False)

Returns:#

:

If copy=True, returns the copy of adata with the result in .obs, .obsm, and .uns. Otherwise:

nhoods: scipy.sparse._csr.csr_matrix in adata.obsm[‘nhoods’]. A binary matrix of cell to neighbourhood assignments. Neighbourhoods in the columns are ordered by the order of the index cell in adata.obs_names

nhood_ixs_refined: pandas.Series in adata.obs[‘nhood_ixs_refined’]. A boolean indicating whether a cell is an index for a neighbourhood

nhood_kth_distance: pandas.Series in adata.obs[‘nhood_kth_distance’]. The distance to the kth nearest neighbour for each index cell (used for SpatialFDR correction)

nhood_neighbors_key: adata.uns[“nhood_neighbors_key”] KNN graph key, used for neighbourhood construction

make_pseudobulk_parallel#

Tdiff.make_pseudobulk_parallel(mdata, min_cell=3, feature_key='rna', sample_col='Sample', group_col='Group', time_col='Time', other_col=[], njob=-1)[source]#

Make pseudobulk within each neighborhoods.

See also

differential genes.

Parameters:
  • mdata (MuData) – MuData object with a modality with neighbourhoods defined in obsm[‘nhoods’]

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. (default: ‘rna’)

  • min_cell (int) – Minimal cell number to check which sample to keep within neighborhoods. (default: 3)

  • sample_col (str | None) – Keys in obs that you store sample information. (default: “Sample”)

  • group_col (str | None) – Keys in obs that you store group information. (default: “Group”)

  • time_col (str | None) – Keys in obs that you store pseudotime information. See Projecting Osteogenic Datasets onto Differentiation Atlas and OPCST Model Using TrajAtlas on how to predict pseudotime in osteogenesis datasets. (default: “Time”)

  • other_col – Keys in obs that you want to keep in pseudobulk.

  • njob (int) – Number of parallel jobs to use. (default : -1)

Returns:#

:

MuData object storing pseudobulk in mudata[‘pseudobulk’]. Here: - mudata[‘tdiff’].obs_names are samples (defined from adata.obs[‘sample_col’]) - mudata[‘tdiff’].var_names are features - mudata[‘tdiff’].X is the matrix of pseduobulk from each sample in each neighbourhood

make_single_cpm#

Tdiff.make_single_cpm(mdata, feature_key='rna', fix_libsize=False, njob=-1)[source]#

permute expression matrix

Parameters:
  • mdata (MuData) – MuData object

  • feature_key (str | None) – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

  • fix_libsize (bool) – Whether to fix library size in edgeR.

  • njob (int) – Number of job to parallel

Returns:

None, modifies tdata[‘tdiff’] in place, adding the results of the DA test to .var: - logFC stores the log fold change in cell abundance (coefficient from the GLM) - PValue stores the p-value for the QLF test before multiple testing correction - SpatialFDR stores the the p-value adjusted for multiple testing to limit the false discovery rate,

calculated with weighted Benjamini-Hochberg procedure

make_whole_cpm#

Tdiff.make_whole_cpm(mdata, fix_libsize=False, sample_column=None, njobs=-1)[source]#

perform CPM in all sample

Parameters:
  • mdata (MuData) – MuData object

  • design

    Formula for the test, following glm syntax from R (e.g. ‘~ condition’).

    Terms should be columns in tdata[feature_key].obs.

  • model_contrasts

    A string vector that defines the contrasts used to perform DA testing, following glm syntax from R (e.g. “conditionDisease - conditionControl”).

    If no contrast is specified (default), then the last categorical level in condition of interest is used as the test group. Defaults to None.

  • feature_key – If input data is MuData, specify key to cell-level AnnData object. Defaults to ‘rna’.

  • sample_column (str | None) –

  • njobs (int) –

Returns:#

:

None, modifies tdata[‘tdiff’] in place, adding the results of the DA test to .var: - logFC stores the log fold change in cell abundance (coefficient from the GLM) - PValue stores the p-value for the QLF test before multiple testing correction - SpatialFDR stores the the p-value adjusted for multiple testing to limit the false discovery rate,

calculated with weighted Benjamini-Hochberg procedure

permute_point_cpm#

Tdiff.permute_point_cpm(mdata, n=100)[source]#
Parameters:
  • mdata (MuData) –

  • n (int) –

permute_point_cpm_parallel#

Tdiff.permute_point_cpm_parallel(mdata, mode='DE', n=100, njobs=-1)[source]#
Parameters:

plotDAheatmap#

Tdiff.plotDAheatmap(mdata, vmax=3, vmin=-3, n_interval=100, col_cluster=False, cmap='RdBu_r', **kwarg)[source]#

Plot differential abundance heatmap.

Parameters:
  • mdata (MuData) – MuData object previously run tdiff.da pipeline.

  • vmax (int) – Max threshold of heatmap. (default : 3)

  • vmin (int) – Max threshold of expression.(default : -3)

  • n_interval (int) – Intervals number to split pseudotime axis. (default : 100)

  • col_cluster (bool) – Wether to cluster column (pseudotime axis). (default : False)

  • cmap (str) – Color map of heatmap. (default : RdBu_r)

Returns:#

:

Nothing. But plot differential abundance heatmap.

plotDE#

Tdiff.plotDE(mdata, genes=None, row_cluster=False, show_rownames=False, show_colnames=False, row_split_gap=1, pseudotime_cmap='jet', row_split=None, **kwarg)[source]#

Plot heatmap to display differential gene expression between two group. Heatmap were generated with pyComplexHeatmap.

See also

differential genes.

Parameters:
  • mdata (MuData) – MuData object has been processed using the ‘tdiff.de’ pipeline.

  • genes (list | None) – A list of genes to plot. If not specific, we plot all significant genes. (default: None)

  • row_cluster (bool) – Whether to cluster row in heatmap. (default: False)

  • show_rownames (bool) – Whether to display gene name on the side. (default: False)

  • show_colnames (bool) – Whether to display pseudotime value on the bottom. (default: False)

  • feature_key – If input data is MuData, specify key to cell-level AnnData object. (default: ‘rna’)

  • row_split_gap (int | None) – Gap between row split. (default: ‘1’)

  • row_split – Genes category.pd.Series or pd.DataFrame, used to split rows or rows into subplots. We recommend to use split_gene function to split genes base on expression profile or stage.

  • pseudotime_cmap (str | None) –

Returns:#

:

Nothing. Plot four-panel heatmap.