This repository provides a structured approach for loading, processing, and saving datasets in a binary format using Python. It is designed to work with popular datasets (such as MNIST) stored in binary formats and allows for easy serialization with pickle. The code processes images and labels into structured data, which can be loaded into memory as needed.
This repo is mainly for internal usage but also has perma links for preprocesssed and pickle loaded popular datasets.
Each data file is named as ncxlib..data inside the data// folder. Every pickle file contains data in the following structure once loaded:
{
"X_train": list[],
"X_test": list[],
"y_train": list[],
"y_test": list[],
}You can directly download the dataset using curl:
curl -o ncxlib.mnist.data <perma-link>| Dataset | Description | Permanent Link |
|---|---|---|
| MNIST | A dataset for handwritten number images and labels by the NIST foundation. | Link |