Skip to content

the shape of q,k,v #3

@zziC7

Description

@zziC7

Hello, I noticed that in your code, the projection method of q, k, v is
self.W_q = nn.Linear(d_model, 2 * self.d_head * num_heads, bias=False)

However, in other repository I found they calculate q, k, v as:
self.q_proj = nn.Linear(embed_dim, embed_dim, bias=False)
code from this link

The shape difference leads to differences in subsequent differential attention calculations.
So I wonder which code is the method in the paper, or are the two just different ways of writing it?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions