Problem about value range of valenc in func get_c4()

As I tried to use c4 dataset to run gptq, got an ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width)) , during debugging i found there may be some logical trouble about the code below when the number of `tmp.input_ids.shape[1]` is equal to the value of `seqlen`. We got a `random.randint(0, -1)`.

```
    import random
    random.seed(0)
    valenc = []
    for _ in range(256):
        while True:
            i = random.randint(0, len(valdata) - 1)
            tmp = tokenizer(valdata[i]['text'], return_tensors='pt')
            if tmp.input_ids.shape[1] >= seqlen:
                break
        i = random.randint(0, tmp.input_ids.shape[1] - seqlen - 1)
        j = i + seqlen
        valenc.append(tmp.input_ids[:, i:j])
```
Please find attached my debugging record.
![debug for gptq](https://github.com/user-attachments/assets/239d776e-f63a-4a0d-b66a-4c04be7ed031)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem about value range of valenc in func get_c4() #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem about value range of valenc in func get_c4() #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions