-
Notifications
You must be signed in to change notification settings - Fork 120
Description
Hello, thank you for this amazing work! I have a couple of questions regarding the use and role of calibration data in Wanda:
-
In the paper, calibration data is used to estimate the metric defined in Equation (4), which is then used to rank each entry and determine the weight masking. Does this mean that the masking is inherently dependent on the specific calibration dataset used? In other words, would different calibration datasets result in different maskings and potentially lead to varying downstream performance?
-
When evaluating a Wanda-pruned model in a zero-shot setting, is it possible for Wanda to generate an effective masking using only the data from the zero-shot task itself, following the Algorithm 1? More generally, could Wanda be extended to function as an online pruning method?
Thanks again for your work and for any insights you can share!