01_Install

The installation process and Tiny demo can be found on the website: https://github.com/TingtingLiGroup/GRASP?tab=readme-ov-file

Requirements

  • Python: 3.9+

  • Training (optional): build-train-pkl and train-moco require PyTorch (torch) and PyTorch Geometric (torch-geometric). They are NOT installed by default via pip install grasp-tool.

Concrete workflow for biologists

We provide a concise, end-to-end, protocol-style workflow to enable non-computational users to apply GRASP to imaging-based spatial transcriptomics data.

Inputs → QC → TSGs → GRASP embeddings → downstream analyses. Starting from standard inputs—(i) cell/nucleus segmentation masks and (ii) a transcript coordinate table (x, y, gene ID, cell ID)—we first perform basic QC to exclude (a) low-quality/aberrant cells and (b) low-support gene–cell instances (insufficient molecules). For each retained gene in each retained cell, we construct a Transcript Spot Graph (TSG) by mapping transcripts into radial–angular subregions (e.g., 30×15 = 450 bins) and connecting locally adjacent bins, with node features encoding transcript density. GRASP then trains an unsupervised, contrastive GAT-based encoder and generates a low-dimensional TSG embedding for each gene–cell instance, yielding a unified representation of subcellular RNA organization while preserving cell-to-cell heterogeneity.

Downstream example I — Pattern discovery → gene enrichment (identify genes with clear subcellular patterns). TSG embeddings → unsupervised clustering (e.g., GMM; K chosen by elbow) → recurring localization motif clusters → visualize representative cells/TSGs per cluster to assign interpretable motif names → for each motif cluster, compute a gene-by-motif composition (fraction of a gene’s TSGs assigned to the motif) → generate gene-enrichment heatmaps and shortlist genes enriched in motifs of interest → visual confirmation in representative single-cell spot maps. Outputs: motif dictionary (cluster exemplars + labels), gene–motif enrichment table/heatmap, prioritized genes with representative examples.

Downstream example II — Pattern heterogeneity → association with biological covariates. Per-TSG motif labels (e.g., polar vs non-polar) → summarize per cell (or per zone) as pattern prevalence → compare prevalence across conditions/covariates (e.g., villus Bottom/Mid/Top; cycling vs non-cycling) → quantify effects using logistic regression with cell-clustered standard errors to account for repeated gene–cell measurements within cells. Outputs: effect sizes and significance for covariates, plots of prevalence across conditions, examples illustrating structured heterogeneity.

Downstream example III — Cell-level subcellular representation → refined clustering and niche/interface discovery. TSG embeddings → aggregate to cell-level subcellular signatures (pool across a cell’s gene–cell instances) → optionally concatenate with expression and/or neighborhood context → refined cell-state clustering (subcluster within a transcriptional group) → integrate with local neighborhood graphs (kNN) → identify niches/domains/interfaces supported by neighborhood enrichment (e.g., immune-neighbor fraction, adjacency probability). Outputs: refined subclusters, subcellular-augmented cell embeddings, niche/domain/interface labels with neighborhood enrichment evidence.

workflow