Official repository for: Kim et al., "Ranking-Aware Multiple Instance Learning for Histopathology Slide Classification" presented at the Medical Imaging with Deep Learning (MIDL) 2025 conference.
Rank Induction is a training strategy for Multiple Instance Learning (MIL) that leverages expert annotations using a ranking constraint—rather than exact attention matching—to guide the model’s focus toward diagnostically meaningful areas. By ranking annotated lesion patches higher than non-lesion patches, our method guides the model to focus on diagnostically meaningful regions without over-constraining attention distribution.
⚠️ Most MIL methods either ignore expert annotations or enforce overly strict attention constraints.
✅ Our method strikes a balance by ranking annotated lesion patches higher than non-lesion ones—offering better interpretability and performance, especially in low-data regimes.
Figure 1. Model performance under data scarcity and interpretation. Left: Performance at varying sampling ratios. Middle: WSI thumbnail. Right: Corresponding attention heatmap.
As shown above, our method maintains stable performance across varying sampling ratios (left) and produces more accurate, interpretable attention maps (right). Red boxes highlight cancerous areas, while blue boxes indicate non-lesion regions.
Most classic MIL methods only assume that the presence of at least one positive patch makes a slide positive, with no guidance on how important each patch is. In contrast, Rank Induction uses expert annotations (patches that are known to be lesion or non-lesion) to induce a pairwise ranking preference:
- Lesion patches should have higher attention scores than
- Non-lesion patches.
Formally, for lesion patches ( s_i ) and non-lesion patches ( s_j ), we want:
To implement this, we convert the score difference into a pairwise probability ( P_{i,j} ):
where
-
$s_i$ and$s_j$ are the raw attention score (before softmax), -
$\sigma$ is a scaling factor, -
$m$ is a margin to encourage a significant gap between lesion and non-lesion attention scores.
We define the Rank Loss $\mathcal{L}\mathrm{rank}$ by comparing the predicted pairwise probability ($P{i,j}$) against the ground-truth preference (
where (
Finally, the slide-level classification objective (e.g., binary cross-entropy) and the rank loss are jointly optimized.
| Method | AUROC | AUPRC |
|---|---|---|
| AB-MIL | 0.740 ± 0.146 | 0.730 ± 0.183 |
| Attention Induction | 0.743 ± 0.142 | 0.727 ± 0.179 |
| Rank Induction | 0.836 ± 0.044 | 0.851 ± 0.036 |
| Method | AUROC | AUPRC |
|---|---|---|
| AB-MIL | 0.992 ± 0.002 | 0.989 ± 0.004 |
| Attention Induction | 0.993 ± 0.002 | 0.990 ± 0.003 |
| Rank Induction | 0.995 ± 0.001 | 0.993 ± 0.002 |
🧪 Reproducibility
To reproduce the above results, please refer to the Jupyter notebook provided in:
result/
├── 0_loss.ipynb
├── 1_performance.ipynb
├── 2_posthoc.ipynb
├── 3_data_scarcity.ipynb
├── 4_granularity.ipynb
└── 5_digestpath.ipynb
| Name | ORCID | Affiliation | Notes | |
|---|---|---|---|---|
| Ho Heon Kim | 0000-0001-7260-7504 | hoheon0509@mf.seegene.com |
|
Contributed equally |
| Gisu Hwang | 0000-0003-1046-9286 | gshwang@mf.seegene.com |
|
Contributed equally |
| Won Chang Jeong | 0009-0008-1931-5957 | jeongwonchan53@mf.seegene.com |
|
|
| YoungSin Ko | 0000-0003-1319-4847 | noteasy@mf.seegene.com |
|
Corresponding author |
-
$^{1}$ AI Research Center, Seegene Medical Foundation, 320 Cheonho-daero, Seoul, South Korea -
$^{2}$ Pathology Center, Seegene Medical Foundation, 320 Cheonho-daero, Seoul, South Korea
