# 01_Install

The installation process and Tiny demo can be found on the website: https://github.com/TingtingLiGroup/GRASP?tab=readme-ov-file

### Requirements
- Python: 3.9+
- Training (optional): build-train-pkl and train-moco require PyTorch (torch) and PyTorch Geometric (torch-geometric). They are NOT installed by default via pip install grasp-tool.

### Installation (recommended)

#### (1) Create a conda environment
```
conda create -n grasp python=3.9 -y
conda activate grasp
```
Or use the provided environment file (creates env grasp and installs grasp-tool via pip):
```
conda env create -f envs/grasp-base.yml
conda activate grasp
```

#### (2) Install GRASP from PyPI
```
pip install grasp-tool
```
Smoke checks (should work without training deps):
```
grasp-tool --help
grasp-tool train-moco --help
```
If you plan to run training commands (build-train-pkl, train-moco), install the training stack first:

- PyTorch install selector: https://pytorch.org/get-started/locally/
- PyG install guide: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

If you're running from this repo checkout, you can also run a small demo:
```
grasp-tool register \
  --pkl_file example_pkl/simulated1_data_dict.pkl \
  --output_pkl outputs/simulated1_registered.pkl
```

### Concrete workflow for biologists 

We provide a concise, end-to-end, **protocol-style workflow** to enable non-computational users to apply GRASP to imaging-based spatial transcriptomics data.

**Inputs → QC → TSGs → GRASP embeddings → downstream analyses.**
 Starting from standard inputs—(i) **cell/nucleus segmentation masks** and (ii) a **transcript coordinate table** (x, y, gene ID, cell ID)—we first perform basic QC to exclude (a) low-quality/aberrant cells and (b) low-support **gene–cell instances** (insufficient molecules). For each retained gene in each retained cell, we construct a **Transcript Spot Graph (TSG)** by mapping transcripts into **radial–angular subregions** (e.g., 30×15 = 450 bins) and connecting locally adjacent bins, with node features encoding transcript density. GRASP then trains an unsupervised, contrastive **GAT-based encoder** and generates a low-dimensional **TSG embedding** for each gene–cell instance, yielding a unified representation of subcellular RNA organization while preserving cell-to-cell heterogeneity.

**Downstream example I — Pattern discovery → gene enrichment (identify genes with clear subcellular patterns).**
 TSG embeddings → **unsupervised clustering** (e.g., GMM; K chosen by elbow) → recurring localization **motif clusters** → visualize representative cells/TSGs per cluster to assign interpretable motif names → for each motif cluster, compute a **gene-by-motif composition** (fraction of a gene’s TSGs assigned to the motif) → generate **gene-enrichment heatmaps** and shortlist genes enriched in motifs of interest → **visual confirmation** in representative single-cell spot maps.
 **Outputs:** motif dictionary (cluster exemplars + labels), gene–motif enrichment table/heatmap, prioritized genes with representative examples.

**Downstream example II — Pattern heterogeneity → association with biological covariates.**
 Per-TSG motif labels (e.g., polar vs non-polar) → summarize per cell (or per zone) as **pattern prevalence** → compare prevalence across conditions/covariates (e.g., villus Bottom/Mid/Top; cycling vs non-cycling) → quantify effects using **logistic regression with cell-clustered standard errors** to account for repeated gene–cell measurements within cells.
 **Outputs:** effect sizes and significance for covariates, plots of prevalence across conditions, examples illustrating structured heterogeneity.

**Downstream example III — Cell-level subcellular representation → refined clustering and niche/interface discovery.**
 TSG embeddings → **aggregate to cell-level subcellular signatures** (pool across a cell’s gene–cell instances) → optionally concatenate with expression and/or neighborhood context → refined cell-state clustering (subcluster within a transcriptional group) → integrate with local neighborhood graphs (kNN) → identify **niches/domains/interfaces** supported by **neighborhood enrichment** (e.g., immune-neighbor fraction, adjacency probability).
 **Outputs:** refined subclusters, subcellular-augmented cell embeddings, niche/domain/interface labels with neighborhood enrichment evidence.


<p align="center">
  <img src="_static/workflow.png" alt="workflow" style="width:100%; height:auto;"/>
</p>