main.py executes the full embedding → aggregation → ROI-level classification pipeline for multiplexed imaging (mxIF) datasets.
It loads cell tables and ROI labels, builds per-ROI graphs, computes node embeddings, aggregates them into ROI-level embeddings, and trains/evaluates ROI classifiers.
The workflow is driven by a YAML config file. Main input components include:
.rdsor.csvfile containing per-cellx,y,phenotype,roi_id,cell_id.- Column names are mapped through
cell_columns.
- CSV mapping
roi_id→roi_label. - Optionally supports
patient_idand subject-level labels.
- Graph type:
knnorradius - Parameters such as
knn_korradius.
- Embedding parameter search space
- Aggregation method (mean / attention / pooling)
- ROI classifier and CV settings.
All results are written under the directory specified by --outdir. Key outputs:
dataframes/df.csv— merged and validated cell table with ROI/subject labels.
graphs/graph_dict_<type>.pkl— per-ROI graphs.graphs/G_all_<type>.pkl— disconnected union graph.
- Cached node embeddings.
- ROI embeddings stored under:
evaluate/roi_supervised_best/<metric>/roi_embedding.mat - Best hyperparameters (
best_roi_supervision.yaml). - Trained classifier objects.
logs/run_*.log— timestamped logs.config/resolved_config.yaml— exact config used in the run.
- Load and validate cell + ROI/subject labels
- Build per-ROI graphs
- Construct global union graph
- Run supervised search over embedding + aggregation hyperparameters
- Compute best node embeddings
- Aggregate node embeddings into ROI-level vectors
- Train and export ROI-level classifiers
- Save embeddings, parameters, and diagnostics