Tutorial 1: Denoising on single-slice DLPFC data from 10x Visium

In this tutorial, we demonstrate the denoising capabilities of STADIM using single-slice data. We use the human Dorsolateral Prefrontal Cortex (DLPFC) slice #151673 generated by 10x Visium platform as a representative example, which is annotated by Maynard et al. into white matter (WM) and cortical layers (L1-L6) based on marker genes and cellular structure.

We download the data via http://spatial.libd.org/spatialLIBD/.

Quick view data

[1]:

import anndata as ad

raw_data = ad.read_h5ad('/data2/xiaost/SODA/Data/DLPFC/dlpfc_151673.h5ad')

print(raw_data)

AnnData object with n_obs × n_vars = 3611 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'Label'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

[2]:

print(raw_data.obs.head(5))

                    in_tissue  array_row  array_col  sample Label
AACGTGATGAAGGACA-1          1          9         83  151673    L3
TGATTCCCGGTTACCT-1          1         64         50  151673    L6
CTTTGGCGCTTTATAC-1          1         32         18  151673    L4
ACGCTGTGAGGCGTAG-1          1         39         13  151673    L5
AGGTTACACCATGCCG-1          1         43        115  151673    L2

[3]:

import scanpy as sc
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(3, 3))
sc.pl.spatial(raw_data, color='Label', ax=ax, show=False)
plt.show()

Run STADIM

You can call the STADIM command directly in your terminal:

nohup stadim --input "./dlpfc_151673.h5ad" --save_dir ./dlpfc_151673_test --device "cuda:0" --monitor --seed 2026 --min_genes 0 --min_cells 10 --knn_c 6 --knn_e 10 > ./dlpfc_151673_test.log 2>&1 &

and load the results into your Python environment for downstream analysis. The denoised expression matrix is conveniently stored within the layers attribute:

import anndata as ad

adata = ad.read_h5ad('/dlpfc_151673_test/sdata.h5ad')

## The denoised gene expressionis is stored in adata.layers['STADIM']
print(adata)

or you can call STADIM within a Jupyter Notebook as follows:

[4]:

import stadim

adata = stadim.run_STADIM(['/data2/xiaost/SODA/Data/DLPFC/dlpfc_151673.h5ad'], save_preprocessed_h5ad=None, save_dir=None,
                          sample_names=None, batch_key='sample', device='cuda:1', seed=2026,
                          min_genes=0, min_cells=10, nhvgs=2000, dim=50, knn_c=6, knn_e=10, mnn_n=25,
                          batch_size=256, triplets_update_ratio=0.8, hard_triplets_ratio=0.7,
                          epochs=100, lr=1e-3, loss_mode='nb', beta_trip=0.1,
                          encoder_layers=[512, 256, 64], decoder_layers=[1000])

Results will be stored in adata.layers['STADIM']

=== 1. Begin read_data!
Processing single slice: Using user-defined column 'sample'
Processing single slice: Keeping existing sample label 151673

Merging data...

Raw Merged Data: 3611 spots × 33538 genes
Unified Filtering (min_genes=0, min_cells=10)...
  → After Filtering: 3611 spots × 16567 genes
  ✓ Complete: Samples included: ['151673']
  ✓ Complete: AnnData object with n_obs × n_vars = 3611 × 16567
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'Label'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial'
    obsm: 'spatial'
    layers: 'X_raw'

=== 2. Begin MY_preprocess!
== Selected 2000 HVGs across 1 slices

=== 3. Begin all_ap find neighbors!

=== 4. Begin pre_dataset!
{0: '151673'}

Preparing triplet: 100%|██████████| 3611/3611 [00:00<00:00, 6602.14it/s]


=== 5. Begin IterableTripletDataset!
Initializing triplets...
IterableTripletDataset Initialized: 3611 anchors, batch_size=256

=== 6. Begin calculate_recommended_margin!
Calculating recommended margin (averaging over 5 runs)...

========================================
Margins from 5 runs: [15.0, 15.0, 15.0, 15.0, 15.0]
Final Recommended Margin: 15.0000

=== 7. Starting training...
Begin training: device=cuda:1

Training: 100%|██████████| 100/100 [01:27<00:00,  1.14it/s, recon=0.366, triplet=3.448, total=0.711]


=== 8. Generating denoised expression...
   Data size (5.98e+07) is within limits. Allocating memory for denoised data.

All processes finished! Total time: 1.73 mins.

[5]:

print(adata)

AnnData object with n_obs × n_vars = 3611 × 16567
    obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'Label', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'size_factor'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable'
    uns: 'spatial', 'log1p', 'top_hvgs'
    obsm: 'spatial', 'S_scale', 'X_hvg_scale', 'X_pca', 'cell_names', 'STADIM'
    layers: 'X_raw', 'X_norm', 'X_log', 'STADIM', 'STADIM_withbatch'

Analysis

[6]:

import anndata as ad
import pandas as pd
import scanpy as sc
import numpy as np
import os
import warnings
warnings.filterwarnings('ignore')

import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap
gene_exp_colors = sns.blend_palette(["#C9E2FF", "#eae6cc", '#e31a1c'], n_colors=100)
gene_exp_palette = LinearSegmentedColormap.from_list("gene_exp", gene_exp_colors)
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
plt.rcParams.update({
    'font.size': 16,
    'axes.labelsize': 18,
    'axes.titlesize': 18,
    'xtick.labelsize': 16,
    'ytick.labelsize': 16,
    'legend.fontsize': 16,
})

gene_list = ['FABP7', 'RELN', 'HPCAL1', 'ADCYAP1', 'FREM3', 'CARTPT', 'PVALB', 'TRABD2A',
             'PCP4', 'KRT17', 'NR4A2', 'NTNG2', 'MOBP', 'NDRG1', 'MBP']

gene_region = {
    'FABP7': 'L1', 'RELN': 'L1',
    'HPCAL1': 'L2',
    'ADCYAP1': 'L3', 'FREM3': 'L3', 'CARTPT': 'L3',
    'PVALB': 'L4',
    'TRABD2A': 'L5', 'PCP4': 'L5',
    'KRT17': 'L6', 'NR4A2': 'L6', 'NTNG2': 'L6',
    'MOBP': 'WM', 'NDRG1': 'WM', 'MBP': 'WM'
}

Gene expression visualization

[7]:

fig, axes = plt.subplots(2, 8, figsize=(24, 6))
axes = axes.flatten()

for i, gene in enumerate(gene_list):
    sc.pl.spatial(adata, layer='X_log', color=gene, cmap=gene_exp_palette, spot_size=150, ax=axes[i], show=False, img_key='hires',
                  vmax="p99", title=f"$\mathit{{{gene}}}$ ({gene_region[gene]})")
    axes[i].set_xlabel('')
    axes[i].set_ylabel('')
axes[-1].axis('off')

plt.tight_layout(pad=0.1)
plt.show()

[8]:

adata.X = adata.layers['STADIM'].copy()
sc.pp.log1p(adata)
adata.layers['STADIM_log'] = adata.X.copy()

fig, axes = plt.subplots(2, 8, figsize=(24, 6))
axes = axes.flatten()

for i, gene in enumerate(gene_list):
    sc.pl.spatial(adata, layer='STADIM_log', color=gene, cmap=gene_exp_palette, spot_size=150, ax=axes[i], show=False, img_key='hires',
                  vmax="p99", title=f"$\mathit{{{gene}}}$ ({gene_region[gene]})")
    axes[i].set_xlabel('')
    axes[i].set_ylabel('')
axes[-1].axis('off')

plt.tight_layout(pad=0.1)
plt.show()

WARNING: adata.X seems to be already log-transformed.

Evaluation metrics

[9]:

import os
import sys
from contextlib import redirect_stdout

results = []

for layer in ['X_norm', 'STADIM']:
    for gene, region in gene_region.items():
        expr = adata.layers[layer][:, adata.var_names.get_loc(gene)]
        expr = expr.toarray().flatten() if hasattr(expr, 'toarray') else np.array(expr).flatten()

        mask = adata.obs['Label'] == region
        target_val = expr[mask]
        bg_val = expr[~mask]

        with open(os.devnull, 'w') as fnull:
            with redirect_stdout(fnull):
                moran = stadim.calculate_new_morani(adata, [layer], {gene: region})

        results.append({
            'Method': layer,
            'Gene': gene,
            'Region': region,
            'Tau': stadim.calculate_tau_index(expr, adata.obs['Label']),
            'FC': stadim.calculate_fold_change(target_val, bg_val),
            'Cohens_D': stadim.calculate_cohens_d(target_val, bg_val),
            'Global_Moran': moran['Global_Moran'].values[0],
            'Background_Moran': moran['Background_Moran'].values[0],
            'newI': moran['newI'].values[0]
        })

df_res = pd.DataFrame(results)

[10]:

for layer in ['X_norm', 'STADIM']:
    print(f"\n=== Detailed Metrics for {layer} ===")
    layer_df = df_res[df_res['Method'] == layer]
    display(layer_df)

print("\n=== Final Summary (Averaged across all genes) ===")
summary = df_res.groupby('Method')[['Tau', 'FC', 'Cohens_D', 'Global_Moran', 'Background_Moran', 'newI']].mean()
display(summary.sort_values('newI', ascending=False))


=== Detailed Metrics for X_norm ===

	Method	Gene	Region	Tau	FC	Cohens_D	Global_Moran	Background_Moran	newI
0	X_norm	FABP7	L1	0.679730	3.176126	0.721098	0.089367	0.036577	0.526395
1	X_norm	RELN	L1	0.833288	7.612881	0.775069	0.109547	0.045728	0.531910
2	X_norm	HPCAL1	L2	0.699730	3.068554	1.360526	0.221055	0.135886	0.542585
3	X_norm	ADCYAP1	L3	0.561619	2.554207	0.353631	0.051692	0.009240	0.521226
4	X_norm	FREM3	L3	0.714852	3.003668	0.259095	0.047120	0.017907	0.514607
5	X_norm	CARTPT	L3	0.594013	3.195088	0.478894	0.124311	0.063235	0.530538
6	X_norm	PVALB	L4	0.596678	2.239379	0.565294	0.074833	0.057134	0.508850
7	X_norm	TRABD2A	L5	0.841705	7.867309	0.623517	0.071052	-0.006673	0.532189
8	X_norm	PCP4	L5	0.697033	3.536765	1.128401	0.253894	0.066206	0.593844
9	X_norm	KRT17	L6	0.758341	3.998996	0.796877	0.125162	0.045310	0.539926
10	X_norm	NR4A2	L6	0.719760	3.072878	0.330337	0.053665	0.049078	0.502294
11	X_norm	NTNG2	L6	0.455522	1.800602	0.241853	0.024883	0.000786	0.512048
12	X_norm	MOBP	WM	0.919297	11.077320	3.416460	0.710332	0.331313	0.689509
13	X_norm	NDRG1	WM	0.805246	4.729894	1.521295	0.268364	0.076004	0.596180
14	X_norm	MBP	WM	0.892194	8.341205	4.951767	0.909925	0.705406	0.602260


=== Detailed Metrics for STADIM ===

	Method	Gene	Region	Tau	FC	Cohens_D	Global_Moran	Background_Moran	newI
15	STADIM	FABP7	L1	0.581135	2.441201	3.132529	0.818420	0.757170	0.530625
16	STADIM	RELN	L1	0.717490	4.364239	2.716194	0.814868	0.769424	0.522722
17	STADIM	HPCAL1	L2	0.662147	2.710248	3.049344	0.857274	0.783988	0.536643
18	STADIM	ADCYAP1	L3	0.531298	2.405351	2.059148	0.814738	0.609914	0.602412
19	STADIM	FREM3	L3	0.667070	2.597443	1.546577	0.877676	0.704484	0.586596
20	STADIM	CARTPT	L3	0.568594	3.103972	2.004034	0.847647	0.755061	0.546293
21	STADIM	PVALB	L4	0.525031	1.971619	1.827773	0.845831	0.791697	0.527067
22	STADIM	TRABD2A	L5	0.802766	5.740999	2.960346	0.812605	0.380765	0.715920
23	STADIM	PCP4	L5	0.561332	2.447884	2.470404	0.835366	0.781112	0.527127
24	STADIM	KRT17	L6	0.741785	3.666232	2.546959	0.807889	0.594334	0.606777
25	STADIM	NR4A2	L6	0.738118	3.418727	1.538140	0.814291	0.634330	0.589981
26	STADIM	NTNG2	L6	0.434813	1.625638	1.125350	0.823101	0.784449	0.519326
27	STADIM	MOBP	WM	0.914177	10.399591	6.246632	0.957841	0.758200	0.599821
28	STADIM	NDRG1	WM	0.779575	4.257239	5.381348	0.958056	0.792414	0.582821
29	STADIM	MBP	WM	0.889637	7.987041	5.747448	0.952741	0.770116	0.591312


=== Final Summary (Averaged across all genes) ===

	Tau	FC	Cohens_D	Global_Moran	Background_Moran	newI
Method
STADIM	0.674331	3.942495	2.956815	0.855890	0.711164	0.572363
X_norm	0.717934	4.618325	1.168274	0.209013	0.108876	0.549624

Label specific markers

[11]:

def get_markers(adata, layer, n_top=100):
    key = f'rank_{layer}'
    sc.tl.rank_genes_groups(adata, 'Label', method='wilcoxon', layer=layer, key_added=key)
    df = sc.get.rank_genes_groups_df(adata, group=None, key=key)
    # Add rank and filter top N
    df['Rank'] = df.groupby('group').cumcount() + 1
    return df[df['Rank'] <= n_top][['names', 'group', 'Rank']].rename(columns={'names':'Gene', 'group':'Label'})

# 1. Get Top 100 markers for both layers
df_norm = get_markers(adata, 'X_log')
df_stadim = get_markers(adata, 'STADIM_log')

# 2. Merge to find common and unique pairs
merged = pd.merge(df_norm, df_stadim, on=['Gene', 'Label'], how='outer', suffixes=('_norm', '_stadim'), indicator=True)

common = merged[merged['_merge'] == 'both']
unique_norm = merged[merged['_merge'] == 'left_only']
unique_stadim = merged[merged['_merge'] == 'right_only']

# 3. Quick Stats Report
print(f"--- Marker Stats (Gene-Label Pairs) ---")
print(f"Common: {len(common)} | Unique Norm: {len(unique_norm)} | Unique STADIM: {len(unique_stadim)}")

# 4. Specificity Analysis: Multi-label genes (Noise Reduction)
multi_norm = df_norm.groupby('Gene')['Label'].nunique()
multi_stadim = df_stadim.groupby('Gene')['Label'].nunique()

# Genes that are multi-label in Norm but single-label (specific) in STADIM
denoised_genes = list(set(multi_norm[multi_norm > 1].index) & set(multi_stadim[multi_stadim == 1].index))

print(f"\n--- Specificity & Noise Reduction ---")
print(f"Multi-label genes in Norm: {sum(multi_norm > 1)}")
print(f"Multi-label genes in STADIM: {sum(multi_stadim > 1)}")
print(f"Denoised genes (Multi-label -> Single-label): {len(denoised_genes)}")

if denoised_genes:
    print(f"Denoised gene examples: {denoised_genes[:10]}")

--- Marker Stats (Gene-Label Pairs) ---
Common: 154 | Unique Norm: 546 | Unique STADIM: 546

--- Specificity & Noise Reduction ---
Multi-label genes in Norm: 162
Multi-label genes in STADIM: 0
Denoised genes (Multi-label -> Single-label): 34
Denoised gene examples: ['CRYAB', 'DIRAS2', 'UBE2QL1', 'HS3ST2', 'NEFL', 'MAG', 'CAMK2N1', 'ENC1', 'CLDND1', 'ATP5F1B']

[12]:

merged[merged['Gene'] == 'CAMK2N1']

[12]:

	Gene	Label	Rank_norm	Rank_stadim	_merge
192	CAMK2N1	L1	31.0	NaN	left_only
193	CAMK2N1	L2	1.0	8.0	both
194	CAMK2N1	L3	28.0	NaN	left_only

[13]:

gene = 'CAMK2N1'
labels_gene = ['L1/2/3', 'L2']
layers = ['X_log', 'STADIM_log']

fig, axes = plt.subplots(1, 2, figsize=(8, 4))

for i, layer in enumerate(layers):
    sc.pl.spatial(adata, layer=layer, color=gene, cmap=gene_exp_palette, spot_size=150, vmax='p99',
                  ax=axes[i], show=False, title=f"$\mathit{{{gene}}}$ ({labels_gene[i]})\n{layer}")

plt.tight_layout()
plt.show()

[14]:

gene = 'CAMK2N1'
labels_gene = ['L1/2/3', 'L2']
layers = ['X_norm', 'STADIM']

fig, axes = plt.subplots(1, 2, figsize=(8, 4))

for i, layer in enumerate(layers):
    sc.pl.spatial(adata, layer=layer, color=gene, cmap=gene_exp_palette, spot_size=150, vmax='p99',
                  ax=axes[i], show=False, title=f"$\mathit{{{gene}}}$ ({labels_gene[i]})\n{layer}")

plt.tight_layout()
plt.show()

[ ]:

[ ]: