Question about online Hadamard Heads in Figure 6

Hi, I'm curious about efficiency in applying online Hadamard operations on Attention outputs as notated 'hadamard heads' in Figure 6. I believe the purpose of this operation is to match rotation size with quantization size as it is most accurate most case. However, I find online operation you mentioned doesn't seem to increase model accuracy compared to no online operations by applying same Hadamard matrix on both Wv and Wout. I mean Wout seems to be a quantization friendly distribution but what was the reason behind applying online hadamard operation after softmax(QKT)V operation?
<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 15">
<meta name=Originator content="Microsoft Word 15">
<link rel=File-List
href="file:///C:/Users/jhkco/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

<link rel=themeData
href="file:///C:/Users/jhkco/AppData/Local/Temp/msohtmlclip1/01/clip_themedata.thmx">
<link rel=colorSchemeMapping
href="file:///C:/Users/jhkco/AppData/Local/Temp/msohtmlclip1/01/clip_colorschememapping.xml">

<style>

</style>

</head>

<body lang=KO style='tab-interval:40.0pt;word-wrap:break-word'>



Perplexity | W8A8 | W4A8 | W4A4
-- | -- | -- | --
QuaRot | 5.481 | 6.701 | 8.097
Single Block | 5.475 | 6.757 | 7.953




</body>

</html>

May I ask whether you found accuracy improvements by applying online operations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about online Hadamard Heads in Figure 6 #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about online Hadamard Heads in Figure 6 #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions