Possible bug in subtraction dimension?

https://github.com/spcl/QuaRot/blob/5008669b08c1f11f9b64d52d16fddd47ca754c5a/fake_quant/rotation_utils.py#L36

Hello. I have a question concerning how to bake mean subtraction from LayerNorm into the Linear layer.
I have managed to solve by hand that it is possible to merge mean subtraction from the layernorm into the linear layer by subtracting the mean of each column of the weight matrix.
However, because the `nn.Linear` class holds the weights transposed for memory contiguity, I think that one should do 
`W_ - W_.mean(dim=-1, keepdim=True)` instead of `W_ - W_.mean(dim=-2, keepdim=True)` to subtract from the columns of the weights.

To summarize, since `nn.Linear` does `x@self.weight.T`, I think that the dimensions should be flipped.
Please correct me if I am wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in subtraction dimension? #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible bug in subtraction dimension? #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions