Skip to content

ncxlib/dataset-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ncxlib Dataset storage and loader

This repository provides a structured approach for loading, processing, and saving datasets in a binary format using Python. It is designed to work with popular datasets (such as MNIST) stored in binary formats and allows for easy serialization with pickle. The code processes images and labels into structured data, which can be loaded into memory as needed.

This repo is mainly for internal usage but also has perma links for preprocesssed and pickle loaded popular datasets.

Storage Format

Each data file is named as ncxlib..data inside the data// folder. Every pickle file contains data in the following structure once loaded:

    {
        "X_train": list[],
        "X_test": list[],
        "y_train": list[],
        "y_test": list[],
    }

Getting started

You can directly download the dataset using curl:

curl -o ncxlib.mnist.data <perma-link>

Datasets

Dataset Description Permanent Link
MNIST A dataset for handwritten number images and labels by the NIST foundation. Link

About

A storage/parser to load new public datasets into pickle files

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages