SMODER Tutorial 01: Mouse Brain H3K27ac Quick Start
This tutorial demonstrates how to run the current validated SMODER workflow on a mouse brain RNA + peak dataset.
At the current stage, this notebook serves as a runnable quick-start example for the smoder package. It introduces the full workflow, including:
configuration
input checking
data loading
preprocessing
feature engineering
graph construction
model training
result saving
output inspection
This tutorial is intended as a development-stage example and may continue to evolve as the package structure is refined.
Before you begin
This tutorial assumes that:
the
smoderpackage has already been installeda compatible Python environment is available
the required input datasets have already been prepared
the current workflow is being run in an environment where the SMODER package can access the data files
At the current stage, this tutorial is based on the validated mouse brain H3K27ac example workflow.
[ ]:
import os
import sys
from pprint import pprint
import pandas as pd
import smoder
from smoder.config.defaults import (
get_mousebrain_h3k27ac_base_config,
get_mousebrain_h3k27ac_params,
)
from smoder.pipelines.mousebrain_h3k27ac import (
load_and_show_data,
init_model_and_preprocess,
feature_engineering_and_graph_build,
train_model,
save_results,
)
print("Python executable:")
print(sys.executable)
print("\nSMODER package location:")
print(smoder.__file__)
Step 1. Define tutorial settings
We first define a tutorial-specific run name and choose whether to perform full training.
If RUN_TRAINING = False, this notebook can still be used for configuration checking and data loading without performing the full training step.
[ ]:
RUN_NAME = "tutorial_01_mousebrain_h3k27ac_v2"
RUN_TRAINING = True
base_config = get_mousebrain_h3k27ac_base_config(run_name=RUN_NAME)
params = get_mousebrain_h3k27ac_params()
# Keep the tutorial lightweight
params["epochs"] = 10
params["model_save"] = False
Step 2. Review configuration
The configuration contains:
input data paths
output directory
model save directory
preprocessing parameters
training parameters
You can optionally modify the paths below if needed for your own environment.
[ ]:
print("Base configuration:")
pprint(base_config)
print("\nParameter configuration:")
pprint(params)
Step 3. Check required input files
The current validated workflow expects three input files:
single-cell reference RNA
spatial RNA
spatial peak data
We verify that all required input files are available before proceeding.
[ ]:
required_files = {
"sc_rna_path": base_config["sc_rna_path"],
"st_rna_path": base_config["st_rna_path"],
"st_adt_path": base_config["st_adt_path"],
}
missing_files = []
for key, path in required_files.items():
exists = os.path.exists(path)
print(f"{key}: {path}")
print(f" -> {'FOUND' if exists else 'MISSING'}")
if not exists:
missing_files.append(path)
if missing_files:
raise FileNotFoundError("Some required input files are missing.")
Step 4. Load input data
This step loads the validated mouse brain example data and prints basic information such as:
dataset shapes
modality types
number of cell types
selected categories
[ ]:
ref_dict, smo_dict, params = load_and_show_data(base_config, params)
Step 5. Initialize the model and preprocess the data
This step initializes the SMODER model and performs built-in preprocessing, including:
reference RNA preprocessing
spatial RNA preprocessing
second-modality preprocessing
spot alignment
information gene selection
[ ]:
model = init_model_and_preprocess(ref_dict, smo_dict, params)
Step 6. Perform feature engineering and graph construction
In this step, the pipeline performs:
RNA feature engineering
second-modality feature engineering
spatial graph construction
feature graph construction
[ ]:
model, dim_rna, dim_modal2 = feature_engineering_and_graph_build(model, params)
print("RNA feature dimension:", dim_rna)
print("Second modality feature dimension:", dim_modal2)
Step 7. Optional model training
The next step trains the model and produces the main deconvolution outputs.
For this tutorial, we keep the number of epochs intentionally small so that the workflow remains suitable for quick validation.
If RUN_TRAINING = False, the training and saving steps will be skipped.
[ ]:
adata_result = None
cross_fusion = None
if RUN_TRAINING:
adata_result, cross_fusion = train_model(
model=model,
params=params,
dim_rna=dim_rna,
dim_modal2=dim_modal2,
base_config=base_config,
)
else:
print("Training step skipped because RUN_TRAINING = False")
Step 8. Save results
If training has been performed, the outputs are saved to the tutorial-specific output directory.
[ ]:
if RUN_TRAINING:
save_results(
adata_result=adata_result,
model=model,
base_config=base_config,
params=params,
)
else:
print("Save step skipped because training was not run.")
Step 9. Inspect output directory
We now inspect the output directory and list the generated files.
[ ]:
output_dir = base_config["output_dir"]
print("Output directory:")
print(output_dir)
print()
if os.path.exists(output_dir):
for name in sorted(os.listdir(output_dir)):
full_path = os.path.join(output_dir, name)
if os.path.isdir(full_path):
print(f"[DIR] {name}")
else:
size_mb = os.path.getsize(full_path) / (1024 * 1024)
print(f"[FILE] {name} ({size_mb:.2f} MB)")
else:
print("Output directory does not exist.")
Step 10. Preview the cell type proportions
If training has been run successfully, the output directory should contain a cell_type_proportions.csv file. We preview the first few rows below.
[ ]:
cell_type_csv = os.path.join(output_dir, "cell_type_proportions.csv")
if os.path.exists(cell_type_csv):
df_props = pd.read_csv(cell_type_csv)
print("cell_type_proportions.csv shape:", df_props.shape)
display(df_props.head())
else:
print("cell_type_proportions.csv not found.")
Step 11. Inspect result object metadata
If training has been executed, we can also inspect some basic information stored in the result object.
[ ]:
if adata_result is not None:
print("Result AnnData shape:")
print(adata_result.shape)
print("\nAvailable obsm keys:")
print(list(adata_result.obsm.keys()))
print("\nAvailable uns keys:")
print(list(adata_result.uns.keys())[:20])
else:
print("No result object is available because training was skipped.")
Summary
In this tutorial, we completed a full SMODER quick-start workflow for the current validated mouse brain RNA + peak example.
This notebook covered:
configuration preparation
input file checking
data loading
model initialization and preprocessing
feature engineering and graph construction
optional model training
result saving
output inspection
result preview
This notebook is intended to serve as the first practical tutorial for the current SMODER package and can be further refined as the package, documentation, and public release workflow continue to improve.
Next steps
For downstream visualization and representative result figures, please see:
Tutorial 02: Mouse Brain H3K27ac Result Visualization
Tutorial 03: Simulated Human Melanoma Result Visualization
Tutorial 04: HBC Result Visualization
These tutorials describe how to generate representative SMODER result figures for each dataset.