feat(pipeline): Support resuming training runs

## Problem

Currently there's no way to resume an interrupted training run:

- **Without `--overwrite`**: Existing output files trigger the skip logic, so the `train` step is skipped entirely
- **With `--overwrite`**: The existing checkpoint directory is deleted, losing all training progress

## Motivation

Long training runs (hours) can be interrupted by:
- GPU crashes / OOM
- System reboots
- User mistakes

Losing progress is frustrating and wastes compute.

## Proposed Solution

Add a `--resume` flag (or similar) that:
1. Does **not** skip the train step when output exists
2. Does **not** delete existing checkpoints
3. Passes `--trainer.load-dir` pointing to the existing run's checkpoint directory

Alternatively, detect if a partial training run exists (checkpoints present but training incomplete) and automatically resume.

## Expected Behavior

Users should be able to continue training from a checkpoint without losing progress. The underlying trainer (`sdf-train`) already supports this via:
- `--trainer.load-dir PATH` — directory containing checkpoints
- `--trainer.load-step INT` — specific step to resume from

## Alternatives Considered

**Current workaround**: Users must manually invoke `sdf-train` with `--trainer.load-dir` to resume. This works but requires knowing the internal command structure.

## Tasks

- [ ] Add `--resume` flag to `run.sh`
- [ ] Modify skip logic to allow resume when flag is set
- [ ] Pass `--trainer.load-dir` to `sdf-train` when resuming
- [ ] Optionally auto-detect latest checkpoint step
- [ ] Update README with resume documentation
- [ ] Add tests for resume behavior


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pipeline): Support resuming training runs #10

Problem

Motivation

Proposed Solution

Expected Behavior

Alternatives Considered

Tasks

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat(pipeline): Support resuming training runs #10

Description

Problem

Motivation

Proposed Solution

Expected Behavior

Alternatives Considered

Tasks

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions