Question about the choice of the baseline model

Hi Jianyi @IceClear, thank you for your excellent work on SeedVR! I've been studying your project and have a question regarding the model's initialization. I noticed that SeedVR initializes its parameters from Stable Diffusion 3 Medium. This made me curious about the design choice. For a task like low-level video enhancement, my initial thought would be to use a pre-trained video latent diffusion model as the baseline. 

I have a hypothesis and would love to know if I'm on the right track:

Is the choice of an image model deliberate because low-level enhancement tasks require a more deterministic mapping from the low-quality input to the high-quality output? Perhaps the incredibly strong image priors from SD3-M for restoring textures and details within each frame are more critical than the temporal generation capabilities of a native video model. In this context, maybe introducing the kind of randomness or complex temporal dynamics inherent in video generation models is not the primary focus—or could even be undesirable—for an enhancer.

Thank you for your time and for sharing this fantastic project with the community!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the choice of the baseline model #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about the choice of the baseline model #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions