-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Description
Dataset
https://huggingface.co/datasets/EleutherAI/the_pile_deduplicated/tree/main/data
Problem
The above dataset has 1659 files but the downloader only downloads 1000 files. This is not specifically a problem of the downloader but a way of HTTP request. The HTTP GET request only gets 1000 files.
curl https://huggingface.co/datasets/EleutherAI/the_pile_deduplicated/tree/main/data -o a.out
cat a.out
...
"2b6a58077011c0cdaf57675ab5d3f3cc64f1b36b","size":285632877,"lfs":{"oid":"d2115061684c0cd7b286c04f6d1a644490bbe8a91d7822480b9f8edbfd659c7e","size":285632877,"pointerSize":134},"path":"data/train-00999-of-01650-c966fff517a32923.parquet"}]
...I was investigating a solution but I have not found any clear solution yet. Can you provide any info on this?
e-mon, Aadik1ng, OriginalByteMe and anthonysbr
Metadata
Metadata
Assignees
Labels
No labels