# 01_Install The installation process and Tiny demo can be found on the website: https://github.com/TingtingLiGroup/GRASP?tab=readme-ov-file ### Requirements - Python: 3.9+ - Training (optional): build-train-pkl and train-moco require PyTorch (torch) and PyTorch Geometric (torch-geometric). They are NOT installed by default via pip install grasp-tool. ### Installation (recommended) #### (1) Create a conda environment ``` conda create -n grasp python=3.9 -y conda activate grasp ``` Or use the provided environment file (creates env grasp and installs grasp-tool via pip): ``` conda env create -f envs/grasp-base.yml conda activate grasp ``` #### (2) Install GRASP from PyPI ``` pip install grasp-tool ``` Smoke checks (should work without training deps): ``` grasp-tool --help grasp-tool train-moco --help ``` If you plan to run training commands (build-train-pkl, train-moco), install the training stack first: - PyTorch install selector: https://pytorch.org/get-started/locally/ - PyG install guide: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html If you're running from this repo checkout, you can also run a small demo: ``` grasp-tool register \ --pkl_file example_pkl/simulated1_data_dict.pkl \ --output_pkl outputs/simulated1_registered.pkl ``` ### Concrete workflow for biologists We provide a concise, end-to-end, **protocol-style workflow** to enable non-computational users to apply GRASP to imaging-based spatial transcriptomics data. **Inputs → QC → TSGs → GRASP embeddings → downstream analyses.** Starting from standard inputs—(i) **cell/nucleus segmentation masks** and (ii) a **transcript coordinate table** (x, y, gene ID, cell ID)—we first perform basic QC to exclude (a) low-quality/aberrant cells and (b) low-support **gene–cell instances** (insufficient molecules). For each retained gene in each retained cell, we construct a **Transcript Spot Graph (TSG)** by mapping transcripts into **radial–angular subregions** (e.g., 30×15 = 450 bins) and connecting locally adjacent bins, with node features encoding transcript density. GRASP then trains an unsupervised, contrastive **GAT-based encoder** and generates a low-dimensional **TSG embedding** for each gene–cell instance, yielding a unified representation of subcellular RNA organization while preserving cell-to-cell heterogeneity. **Downstream example I — Pattern discovery → gene enrichment (identify genes with clear subcellular patterns).** TSG embeddings → **unsupervised clustering** (e.g., GMM; K chosen by elbow) → recurring localization **motif clusters** → visualize representative cells/TSGs per cluster to assign interpretable motif names → for each motif cluster, compute a **gene-by-motif composition** (fraction of a gene’s TSGs assigned to the motif) → generate **gene-enrichment heatmaps** and shortlist genes enriched in motifs of interest → **visual confirmation** in representative single-cell spot maps. **Outputs:** motif dictionary (cluster exemplars + labels), gene–motif enrichment table/heatmap, prioritized genes with representative examples. **Downstream example II — Pattern heterogeneity → association with biological covariates.** Per-TSG motif labels (e.g., polar vs non-polar) → summarize per cell (or per zone) as **pattern prevalence** → compare prevalence across conditions/covariates (e.g., villus Bottom/Mid/Top; cycling vs non-cycling) → quantify effects using **logistic regression with cell-clustered standard errors** to account for repeated gene–cell measurements within cells. **Outputs:** effect sizes and significance for covariates, plots of prevalence across conditions, examples illustrating structured heterogeneity. **Downstream example III — Cell-level subcellular representation → refined clustering and niche/interface discovery.** TSG embeddings → **aggregate to cell-level subcellular signatures** (pool across a cell’s gene–cell instances) → optionally concatenate with expression and/or neighborhood context → refined cell-state clustering (subcluster within a transcriptional group) → integrate with local neighborhood graphs (kNN) → identify **niches/domains/interfaces** supported by **neighborhood enrichment** (e.g., immune-neighbor fraction, adjacency probability). **Outputs:** refined subclusters, subcellular-augmented cell embeddings, niche/domain/interface labels with neighborhood enrichment evidence.