Skip to content

GPTQ per-channel 量化配置 #442

@laobadao

Description

@laobadao
weight:
        bit: 8
        symmetric: True
        granularity: per_channel
        group_size: -1
        calib_algo: mse
    act:
        bit: 8
        symmetric: True
        granularity: per_token
        calib_algo: minmax
    special:
        actorder: True
        static_groups: False
        percdamp: 0.01
        blocksize: 128
        chunk_num: 4
        true_sequential: True
        online_rotate: False
        fp32_had: True

GPTQ, yaml 这么配置时, 虽然 granularity: per_channel group_size: -1 但是, blocksize: 128 ,请问下,那实际是按 blocksize: 128 优化的嘛? 还是说把 block_size 也改成 -1 , 相关代码对应修改下 更合理呢?

 def weight_transform(self, W, Hinv, Losses, tmp):
        # NOTE:
        curr_blocksize = self.blocksize
        if curr_blocksize <= 0:
            logger.info('Blocksize < 0 , using perchannel GPTQ')
            curr_blocksize = W.shape[1]
            print(f'=== W.shape:{W.shape}')
        
        for i1 in range(0, self.n_nonout, curr_blocksize):
        # for i1 in range(0, self.n_nonout, self.blocksize):
            i2 = min(i1 + curr_blocksize, self.n_nonout)
            count = i2 - i1
            W1, Hinv1 = W[:, i1:i2].clone(), Hinv[i1:i2, i1:i2]
            tmp1, Err1, Losses1 = (
                torch.zeros_like(W1),
                torch.zeros_like(W1),
                torch.zeros_like(W1),
            )

            for i in range(count):
                w, d = W1[:, i], Hinv1[i, i]
                if self.wquantizer.granularity == 'per_group':
                    idx = i1 + i
                    if not self.static_groups:
                        if (i1 + i) % self.wquantizer.group_size == 0:
                            column_tensors = W[
                                :,
                                (i1 + i): min(
                                    (i1 + i + self.wquantizer.group_size),
                                    (self.columns - self.n_out),
                                ),
                            ]
                            self.search_column_qparams(column_tensors, idx)
                    else:
                        if self.actorder:
                            idx = self.perm[idx]
                        self.qparams = self.groups[idx // self.wquantizer.group_size]

                q = self.wquantizer.quant_dequant(
                    w.unsqueeze(1),
                    self.qparams['scale'],
                    self.qparams['zero'],
                    self.qparams['qmax'],
                    self.qparams['qmin'],
                ).squeeze(1)

                tmp1[:, i] = w
                Losses1[:, i] = ((w - q) ** 2) / (2 * d**2)
                err1 = (w - q) / d
                W1[:, i:] -= err1.unsqueeze(1).matmul(Hinv1[i, i:].unsqueeze(0))
                Err1[:, i] = err1

            tmp[:, i1:i2], Losses[:, i1:i2] = tmp1, Losses1
            W[:, i2:] -= Err1.matmul(Hinv[i1:i2, i2:])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions