Putting this here as a note because I think it is funny.
Was loading in all of the datasets, and noticed that it was taking an unreasonably long amount of time.
Checking their sizes, it turns out the Ruby dataset is 7.1 GB. In comparison, the Python dataset isn't even a quarter of a GB, and the assembly dataset is 800 KB.