Tutorial 1: Denoising on single-slice DLPFC data from 10x Visium

In this tutorial, we demonstrate the denoising capabilities of STADIM using single-slice data. We use the human Dorsolateral Prefrontal Cortex (DLPFC) slice #151673 generated by 10x Visium platform as a representative example, which is annotated by Maynard et al. into white matter (WM) and cortical layers (L1-L6) based on marker genes and cellular structure.
We download the data via http://spatial.libd.org/spatialLIBD/.

Quick view data

[1]:
import anndata as ad

raw_data = ad.read_h5ad('/data2/xiaost/SODA/Data/DLPFC/dlpfc_151673.h5ad')

print(raw_data)
AnnData object with n_obs × n_vars = 3611 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'Label'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'
[2]:
print(raw_data.obs.head(5))
                    in_tissue  array_row  array_col  sample Label
AACGTGATGAAGGACA-1          1          9         83  151673    L3
TGATTCCCGGTTACCT-1          1         64         50  151673    L6
CTTTGGCGCTTTATAC-1          1         32         18  151673    L4
ACGCTGTGAGGCGTAG-1          1         39         13  151673    L5
AGGTTACACCATGCCG-1          1         43        115  151673    L2
[3]:
import scanpy as sc
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(3, 3))
sc.pl.spatial(raw_data, color='Label', ax=ax, show=False)
plt.show()
_images/Tutorial1_Single_DLPFC_5_0.png

Run STADIM

You can call the STADIM command directly in your terminal:

nohup stadim --input "./dlpfc_151673.h5ad" --save_dir ./dlpfc_151673_test --device "cuda:0" --monitor --seed 2026 --min_genes 0 --min_cells 10 --knn_c 6 --knn_e 10 > ./dlpfc_151673_test.log 2>&1 &

and load the results into your Python environment for downstream analysis. The denoised expression matrix is conveniently stored within the layers attribute:

import anndata as ad

adata = ad.read_h5ad('/dlpfc_151673_test/sdata.h5ad')

## The denoised gene expressionis is stored in adata.layers['STADIM']
print(adata)

or you can call STADIM within a Jupyter Notebook as follows:

[4]:
import stadim

adata = stadim.run_STADIM(['/data2/xiaost/SODA/Data/DLPFC/dlpfc_151673.h5ad'], save_preprocessed_h5ad=None, save_dir=None,
                          sample_names=None, batch_key='sample', device='cuda:1', seed=2026,
                          min_genes=0, min_cells=10, nhvgs=2000, dim=50, knn_c=6, knn_e=10, mnn_n=25,
                          batch_size=256, triplets_update_ratio=0.8, hard_triplets_ratio=0.7,
                          epochs=100, lr=1e-3, loss_mode='nb', beta_trip=0.1,
                          encoder_layers=[512, 256, 64], decoder_layers=[1000])
Results will be stored in adata.layers['STADIM']

=== 1. Begin read_data!
Processing single slice: Using user-defined column 'sample'
Processing single slice: Keeping existing sample label 151673

Merging data...

Raw Merged Data: 3611 spots × 33538 genes
Unified Filtering (min_genes=0, min_cells=10)...
  → After Filtering: 3611 spots × 16567 genes
  ✓ Complete: Samples included: ['151673']
  ✓ Complete: AnnData object with n_obs × n_vars = 3611 × 16567
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'Label'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial'
    obsm: 'spatial'
    layers: 'X_raw'

=== 2. Begin MY_preprocess!
== Selected 2000 HVGs across 1 slices

=== 3. Begin all_ap find neighbors!

=== 4. Begin pre_dataset!
{0: '151673'}
Preparing triplet: 100%|██████████| 3611/3611 [00:00<00:00, 6602.14it/s]

=== 5. Begin IterableTripletDataset!
Initializing triplets...
IterableTripletDataset Initialized: 3611 anchors, batch_size=256

=== 6. Begin calculate_recommended_margin!
Calculating recommended margin (averaging over 5 runs)...

========================================
Margins from 5 runs: [15.0, 15.0, 15.0, 15.0, 15.0]
Final Recommended Margin: 15.0000

=== 7. Starting training...
Begin training: device=cuda:1
Training: 100%|██████████| 100/100 [01:27<00:00,  1.14it/s, recon=0.366, triplet=3.448, total=0.711]

=== 8. Generating denoised expression...
   Data size (5.98e+07) is within limits. Allocating memory for denoised data.

All processes finished! Total time: 1.73 mins.
[5]:
print(adata)
AnnData object with n_obs × n_vars = 3611 × 16567
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'Label', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'size_factor'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable'
    uns: 'spatial', 'log1p', 'top_hvgs'
    obsm: 'spatial', 'S_scale', 'X_hvg_scale', 'X_pca', 'cell_names', 'STADIM'
    layers: 'X_raw', 'X_norm', 'X_log', 'STADIM', 'STADIM_withbatch'

Analysis

[6]:
import anndata as ad
import pandas as pd
import scanpy as sc
import numpy as np
import os
import warnings
warnings.filterwarnings('ignore')

import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap
gene_exp_colors = sns.blend_palette(["#C9E2FF", "#eae6cc", '#e31a1c'], n_colors=100)
gene_exp_palette = LinearSegmentedColormap.from_list("gene_exp", gene_exp_colors)
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
plt.rcParams.update({
    'font.size': 16,
    'axes.labelsize': 18,
    'axes.titlesize': 18,
    'xtick.labelsize': 16,
    'ytick.labelsize': 16,
    'legend.fontsize': 16,
})

gene_list = ['FABP7', 'RELN', 'HPCAL1', 'ADCYAP1', 'FREM3', 'CARTPT', 'PVALB', 'TRABD2A',
             'PCP4', 'KRT17', 'NR4A2', 'NTNG2', 'MOBP', 'NDRG1', 'MBP']

gene_region = {
    'FABP7': 'L1', 'RELN': 'L1',
    'HPCAL1': 'L2',
    'ADCYAP1': 'L3', 'FREM3': 'L3', 'CARTPT': 'L3',
    'PVALB': 'L4',
    'TRABD2A': 'L5', 'PCP4': 'L5',
    'KRT17': 'L6', 'NR4A2': 'L6', 'NTNG2': 'L6',
    'MOBP': 'WM', 'NDRG1': 'WM', 'MBP': 'WM'
}

Gene expression visualization

[7]:
fig, axes = plt.subplots(2, 8, figsize=(24, 6))
axes = axes.flatten()

for i, gene in enumerate(gene_list):
    sc.pl.spatial(adata, layer='X_log', color=gene, cmap=gene_exp_palette, spot_size=150, ax=axes[i], show=False, img_key='hires',
                  vmax="p99", title=f"$\mathit{{{gene}}}$ ({gene_region[gene]})")
    axes[i].set_xlabel('')
    axes[i].set_ylabel('')
axes[-1].axis('off')

plt.tight_layout(pad=0.1)
plt.show()
_images/Tutorial1_Single_DLPFC_13_0.png
[8]:
adata.X = adata.layers['STADIM'].copy()
sc.pp.log1p(adata)
adata.layers['STADIM_log'] = adata.X.copy()

fig, axes = plt.subplots(2, 8, figsize=(24, 6))
axes = axes.flatten()

for i, gene in enumerate(gene_list):
    sc.pl.spatial(adata, layer='STADIM_log', color=gene, cmap=gene_exp_palette, spot_size=150, ax=axes[i], show=False, img_key='hires',
                  vmax="p99", title=f"$\mathit{{{gene}}}$ ({gene_region[gene]})")
    axes[i].set_xlabel('')
    axes[i].set_ylabel('')
axes[-1].axis('off')

plt.tight_layout(pad=0.1)
plt.show()
WARNING: adata.X seems to be already log-transformed.
_images/Tutorial1_Single_DLPFC_14_1.png

Evaluation metrics

[9]:
import os
import sys
from contextlib import redirect_stdout

results = []

for layer in ['X_norm', 'STADIM']:
    for gene, region in gene_region.items():
        expr = adata.layers[layer][:, adata.var_names.get_loc(gene)]
        expr = expr.toarray().flatten() if hasattr(expr, 'toarray') else np.array(expr).flatten()

        mask = adata.obs['Label'] == region
        target_val = expr[mask]
        bg_val = expr[~mask]

        with open(os.devnull, 'w') as fnull:
            with redirect_stdout(fnull):
                moran = stadim.calculate_new_morani(adata, [layer], {gene: region})

        results.append({
            'Method': layer,
            'Gene': gene,
            'Region': region,
            'Tau': stadim.calculate_tau_index(expr, adata.obs['Label']),
            'FC': stadim.calculate_fold_change(target_val, bg_val),
            'Cohens_D': stadim.calculate_cohens_d(target_val, bg_val),
            'Global_Moran': moran['Global_Moran'].values[0],
            'Background_Moran': moran['Background_Moran'].values[0],
            'newI': moran['newI'].values[0]
        })

df_res = pd.DataFrame(results)
[10]:
for layer in ['X_norm', 'STADIM']:
    print(f"\n=== Detailed Metrics for {layer} ===")
    layer_df = df_res[df_res['Method'] == layer]
    display(layer_df)

print("\n=== Final Summary (Averaged across all genes) ===")
summary = df_res.groupby('Method')[['Tau', 'FC', 'Cohens_D', 'Global_Moran', 'Background_Moran', 'newI']].mean()
display(summary.sort_values('newI', ascending=False))

=== Detailed Metrics for X_norm ===
Method Gene Region Tau FC Cohens_D Global_Moran Background_Moran newI
0 X_norm FABP7 L1 0.679730 3.176126 0.721098 0.089367 0.036577 0.526395
1 X_norm RELN L1 0.833288 7.612881 0.775069 0.109547 0.045728 0.531910
2 X_norm HPCAL1 L2 0.699730 3.068554 1.360526 0.221055 0.135886 0.542585
3 X_norm ADCYAP1 L3 0.561619 2.554207 0.353631 0.051692 0.009240 0.521226
4 X_norm FREM3 L3 0.714852 3.003668 0.259095 0.047120 0.017907 0.514607
5 X_norm CARTPT L3 0.594013 3.195088 0.478894 0.124311 0.063235 0.530538
6 X_norm PVALB L4 0.596678 2.239379 0.565294 0.074833 0.057134 0.508850
7 X_norm TRABD2A L5 0.841705 7.867309 0.623517 0.071052 -0.006673 0.532189
8 X_norm PCP4 L5 0.697033 3.536765 1.128401 0.253894 0.066206 0.593844
9 X_norm KRT17 L6 0.758341 3.998996 0.796877 0.125162 0.045310 0.539926
10 X_norm NR4A2 L6 0.719760 3.072878 0.330337 0.053665 0.049078 0.502294
11 X_norm NTNG2 L6 0.455522 1.800602 0.241853 0.024883 0.000786 0.512048
12 X_norm MOBP WM 0.919297 11.077320 3.416460 0.710332 0.331313 0.689509
13 X_norm NDRG1 WM 0.805246 4.729894 1.521295 0.268364 0.076004 0.596180
14 X_norm MBP WM 0.892194 8.341205 4.951767 0.909925 0.705406 0.602260

=== Detailed Metrics for STADIM ===
Method Gene Region Tau FC Cohens_D Global_Moran Background_Moran newI
15 STADIM FABP7 L1 0.581135 2.441201 3.132529 0.818420 0.757170 0.530625
16 STADIM RELN L1 0.717490 4.364239 2.716194 0.814868 0.769424 0.522722
17 STADIM HPCAL1 L2 0.662147 2.710248 3.049344 0.857274 0.783988 0.536643
18 STADIM ADCYAP1 L3 0.531298 2.405351 2.059148 0.814738 0.609914 0.602412
19 STADIM FREM3 L3 0.667070 2.597443 1.546577 0.877676 0.704484 0.586596
20 STADIM CARTPT L3 0.568594 3.103972 2.004034 0.847647 0.755061 0.546293
21 STADIM PVALB L4 0.525031 1.971619 1.827773 0.845831 0.791697 0.527067
22 STADIM TRABD2A L5 0.802766 5.740999 2.960346 0.812605 0.380765 0.715920
23 STADIM PCP4 L5 0.561332 2.447884 2.470404 0.835366 0.781112 0.527127
24 STADIM KRT17 L6 0.741785 3.666232 2.546959 0.807889 0.594334 0.606777
25 STADIM NR4A2 L6 0.738118 3.418727 1.538140 0.814291 0.634330 0.589981
26 STADIM NTNG2 L6 0.434813 1.625638 1.125350 0.823101 0.784449 0.519326
27 STADIM MOBP WM 0.914177 10.399591 6.246632 0.957841 0.758200 0.599821
28 STADIM NDRG1 WM 0.779575 4.257239 5.381348 0.958056 0.792414 0.582821
29 STADIM MBP WM 0.889637 7.987041 5.747448 0.952741 0.770116 0.591312

=== Final Summary (Averaged across all genes) ===
Tau FC Cohens_D Global_Moran Background_Moran newI
Method
STADIM 0.674331 3.942495 2.956815 0.855890 0.711164 0.572363
X_norm 0.717934 4.618325 1.168274 0.209013 0.108876 0.549624

Label specific markers

[11]:
def get_markers(adata, layer, n_top=100):
    key = f'rank_{layer}'
    sc.tl.rank_genes_groups(adata, 'Label', method='wilcoxon', layer=layer, key_added=key)
    df = sc.get.rank_genes_groups_df(adata, group=None, key=key)
    # Add rank and filter top N
    df['Rank'] = df.groupby('group').cumcount() + 1
    return df[df['Rank'] <= n_top][['names', 'group', 'Rank']].rename(columns={'names':'Gene', 'group':'Label'})

# 1. Get Top 100 markers for both layers
df_norm = get_markers(adata, 'X_log')
df_stadim = get_markers(adata, 'STADIM_log')

# 2. Merge to find common and unique pairs
merged = pd.merge(df_norm, df_stadim, on=['Gene', 'Label'], how='outer', suffixes=('_norm', '_stadim'), indicator=True)

common = merged[merged['_merge'] == 'both']
unique_norm = merged[merged['_merge'] == 'left_only']
unique_stadim = merged[merged['_merge'] == 'right_only']

# 3. Quick Stats Report
print(f"--- Marker Stats (Gene-Label Pairs) ---")
print(f"Common: {len(common)} | Unique Norm: {len(unique_norm)} | Unique STADIM: {len(unique_stadim)}")

# 4. Specificity Analysis: Multi-label genes (Noise Reduction)
multi_norm = df_norm.groupby('Gene')['Label'].nunique()
multi_stadim = df_stadim.groupby('Gene')['Label'].nunique()

# Genes that are multi-label in Norm but single-label (specific) in STADIM
denoised_genes = list(set(multi_norm[multi_norm > 1].index) & set(multi_stadim[multi_stadim == 1].index))

print(f"\n--- Specificity & Noise Reduction ---")
print(f"Multi-label genes in Norm: {sum(multi_norm > 1)}")
print(f"Multi-label genes in STADIM: {sum(multi_stadim > 1)}")
print(f"Denoised genes (Multi-label -> Single-label): {len(denoised_genes)}")

if denoised_genes:
    print(f"Denoised gene examples: {denoised_genes[:10]}")
--- Marker Stats (Gene-Label Pairs) ---
Common: 154 | Unique Norm: 546 | Unique STADIM: 546

--- Specificity & Noise Reduction ---
Multi-label genes in Norm: 162
Multi-label genes in STADIM: 0
Denoised genes (Multi-label -> Single-label): 34
Denoised gene examples: ['CRYAB', 'DIRAS2', 'UBE2QL1', 'HS3ST2', 'NEFL', 'MAG', 'CAMK2N1', 'ENC1', 'CLDND1', 'ATP5F1B']
[12]:
merged[merged['Gene'] == 'CAMK2N1']
[12]:
Gene Label Rank_norm Rank_stadim _merge
192 CAMK2N1 L1 31.0 NaN left_only
193 CAMK2N1 L2 1.0 8.0 both
194 CAMK2N1 L3 28.0 NaN left_only
[13]:
gene = 'CAMK2N1'
labels_gene = ['L1/2/3', 'L2']
layers = ['X_log', 'STADIM_log']

fig, axes = plt.subplots(1, 2, figsize=(8, 4))

for i, layer in enumerate(layers):
    sc.pl.spatial(adata, layer=layer, color=gene, cmap=gene_exp_palette, spot_size=150, vmax='p99',
                  ax=axes[i], show=False, title=f"$\mathit{{{gene}}}$ ({labels_gene[i]})\n{layer}")

plt.tight_layout()
plt.show()
_images/Tutorial1_Single_DLPFC_21_0.png
[14]:
gene = 'CAMK2N1'
labels_gene = ['L1/2/3', 'L2']
layers = ['X_norm', 'STADIM']

fig, axes = plt.subplots(1, 2, figsize=(8, 4))

for i, layer in enumerate(layers):
    sc.pl.spatial(adata, layer=layer, color=gene, cmap=gene_exp_palette, spot_size=150, vmax='p99',
                  ax=axes[i], show=False, title=f"$\mathit{{{gene}}}$ ({labels_gene[i]})\n{layer}")

plt.tight_layout()
plt.show()
_images/Tutorial1_Single_DLPFC_22_0.png
[ ]:

[ ]: