Implementing StableFlow in other DiT Architecture

I am trying to implement your work into Cosmos Predict2 DiT architecture, but it seems that there are no changes to my image.
I have found cosmos' vital layers by using a set of 64 prompts for diverse objects and random seeds.
I am doing an image2image generation with initial prompt is "A dog at the beach" and the edited prompt ""A cat at the beach", no changes after generating as shown below

### Input image
<img width="629" height="353" alt="Image" src="https://github.com/user-attachments/assets/4624b5bf-342b-43e2-ba5b-82c7cb59ae19" />

### Edited image
<img width="627" height="356" alt="Image" src="https://github.com/user-attachments/assets/620bd668-6f56-410b-9fe8-cd5f46d3e495" />

### Question(s)
For the self attention injection, it is written in the paper that **parallel generation is done to selectively replacing the keys and value**. Mean while in the code you just did it in one run and copying the keys and value directly from the first index, is there any explanation for this? 

<img width="1379" height="686" alt="Image" src="https://github.com/user-attachments/assets/85cad790-7740-42c3-b145-4f0eea6fe11f" />

<img width="589" height="121" alt="Image" src="https://github.com/user-attachments/assets/3cbe1caa-74e4-4d0d-8933-f67b2d5b63f9" />

Very appreciate your work, looking forward to your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementing StableFlow in other DiT Architecture #15

Input image

Edited image

Question(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementing StableFlow in other DiT Architecture #15

Description

Input image

Edited image

Question(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions