Differential Abundance with TrajAtlas#

Setup#

import numpy as np
import scanpy as sc
import pandas as pd
import TrajAtlas as tja
sc.settings.verbosity = 3
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (7, 7)

About#

To learn about TrajDiff, please read TrajDiff

Datasets#

In this tutorial, we continue with the adata we created in OPCST projecting.

adata=sc.read("../data/3.19_adata_immediate_step1.h5ad")
adata
AnnData object with n_obs × n_vars = 15079 × 22076
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent_mito', 'RNA_snn_res.0.5', 'seurat_clusters', 'group', 'sample', 'pseduoPred', 'pred_level1_anno', 'pred_level2_anno', 'pred_level3_anno', 'pred_level4_anno', 'pred_level5_anno', 'pred_level6_anno', 'pred_level7_anno', 'pred_lineage_fibro', 'pred_lineage_lepr', 'pred_lineage_msc', 'pred_lineage_chondro', 'lineageSum', 'Cell'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
    uns: 'group_colors', 'neighbors', 'pred_level2_anno_colors', 'pred_level3_anno_colors', 'pred_lineage_chondro_colors', 'pred_lineage_fibro_colors', 'pred_lineage_lepr_colors', 'pred_lineage_msc_colors', 'umap'
    obsm: 'X_umap', 'scANVI'
    layers: 'counts'
    obsp: 'connectivities', 'distances'
sc.pl.umap(adata, color=["group","pred_level3_anno"], wspace=0.3, size=8)
/home/gilberthan/anaconda3/envs/scarches/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
/home/gilberthan/anaconda3/envs/scarches/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

png

adata.obs[["sample","group"]].drop_duplicates()
sample group
AAACCCATCCAATCCC-1_1 BmscAging_Young_MA1 MA
AAACCCAAGCTAAACA-1_2 BmscAging_Young_MA2 MA
AAACCCAGTAACATAG-1_3 BmscAging_Young_MA3 MA
AAACCCACAGCGTACC-1_4 BmscAging_Young_MA4 MA
AAACCCACAAAGGATT-1_5 BmscAging_Young_MA5 MA
AAACCCAAGCGTTAGG-1_10 BmscAging_Young_Young1 Young
AAACCCAGTTAAGAAC-1_11 BmscAging_Young_Young2 Young
AAACGAAGTACCATAC-1_12 BmscAging_Young_Young3 Young
AAACCCACACGAAAGC-1_13 BmscAging_Young_Young4 Young
AAACGAACACCTCGTT-1_14 BmscAging_Young_Young5 Young
tdiff=tja.diff.Tdiff()

Building KNN graph (standard milo pipeline)#

We followed the standard milo pipeline. The n_neighbors parameter represents the smallest possible size of the neighborhood in which we will quantify differential abundance (i.e., with k=50, the smallest neighborhood will have 50 cells). The value of k should increase with the sample count.

sc.pp.neighbors(tdata['rna'], use_rep='scANVI', n_neighbors=200, n_pcs=15)
tdiff.make_nhoods(tdata['rna'], prop=0.08)
tdata =  tdiff.count_nhoods(tdata, sample_col="sample")
computing neighbors
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:38)


/home/gilberthan/anaconda3/envs/scarches/lib/python3.8/site-packages/anndata/_core/anndata.py:117: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)

Differential abundance testing#

Here, we calculated the differential abundance for each neighborhood using a Generalized Linear Model (GLM). Differing from the standard milo pipeline, we projected the results of the differential abundance onto the pseudotime axis. We divided the pseudotime axis into n intervals and applied a binomial model to each interval. Next, we shuffled the group labels to perform permutation null hypothesis testing to obtain lambda. Through this process, we are able to identify the differentiation stage with differential abundance.

tdiff.da(tdata, design='~group',model_contrasts='groupYoung-groupMA',time_col="pseduoPred",shuffle_times=20,FDR=0.05)
Permutation null hypothesis testing.....
Running differential abundance.....
Projecting neighborhoods to pseudotime axis.....
Done!
tdata
MuData object with n_obs × n_vars = 15079 × 22076
  3 modalities
    rna:	15079 x 22076
      obs:	'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent_mito', 'RNA_snn_res.0.5', 'seurat_clusters', 'group', 'sample', 'pseduoPred', 'pred_level1_anno', 'pred_level2_anno', 'pred_level3_anno', 'pred_level4_anno', 'pred_level5_anno', 'pred_level6_anno', 'pred_level7_anno', 'pred_lineage_fibro', 'pred_lineage_lepr', 'pred_lineage_msc', 'pred_lineage_chondro', 'lineageSum', 'Cell', 'nhood_ixs_random', 'nhood_ixs_refined', 'nhood_kth_distance'
      var:	'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
      uns:	'group_colors', 'neighbors', 'pred_level2_anno_colors', 'pred_level3_anno_colors', 'pred_lineage_chondro_colors', 'pred_lineage_fibro_colors', 'pred_lineage_lepr_colors', 'pred_lineage_msc_colors', 'umap', 'nhood_neighbors_key'
      obsm:	'X_umap', 'scANVI', 'nhoods'
      layers:	'counts'
      obsp:	'connectivities', 'distances'
    tdiff:	10 x 707
      obs:	'group', 'sample'
      var:	'index_cell', 'kth_distance', 'null', 'group1_cpm', 'group2_cpm', 'logFC', 'logCPM', 'F', 'PValue', 'FDR', 'SpatialFDR', 'time', 'range_down', 'range_up', 'Accept', 'logChange'
      uns:	'sample_col'
      varm:	'whole_cpm'
    pseudobulk:	0 x 0

Visualization#

We utilized PyComplexHeatmap to visualize the cell abundance of every sample and the statistics. The TrajDiff generated four statistics. Group1 and Group2 represent the mean abundance of two groups (Young and MA) in every pseudotime interval. Diff_abundance indicates the cell abundance difference between Group1 (Young) and Group2 (MA). FDR indicates the significance of the difference.

meta=adata.obs
sampleDf=meta[["sample","group"]].drop_duplicates()
sampleDf.index=sampleDf["sample"]
import PyComplexHeatmap as pch
row_ha = pch.HeatmapAnnotation(group=pch.anno_simple(sampleDf.group,cmap='Set1',
                                           add_text=True,text_kws={'color':'white','rotation':-90,'fontweight':'bold','fontsize':10,},
                                          legend=True),
                           axis=0,verbose=0,label_kws={'rotation':90,'horizontalalignment':'left'})
tdiff.plotDAheatmap(tdata,row_split=sampleDf.group,left_annotation=row_ha)
Starting plotting..
Starting calculating row orders..
Reordering rows..
Starting calculating col orders..
Reordering cols..
Plotting matrix..
Collecting legends..
Plotting legends..
Estimated legend width: 17.767277777777778 mm

png

In the results above, we observed that the differences in cell abundance are concentrated at the early and middle stages of differentiation. Specifically, at the early stage, MA has a higher cell abundance, whereas at the middle stage, Young has a higher cell abundance. We did not detect any differences at the late stage, which may be due to the low cell abundance of both samples. Overall, these results are aligned with our study, which detected cell abundance differences in LepR+ BMSC OPCST between young and adult individuals.