dataserious is a Python package that enhances dataclasses with type validation, serialization, and search space generation. It builds on top of the standard dataclasses module to provide additional functionality for configuration management, it only has a single dependency on pyyaml for YAML support.
- Type Validation: Ensures that the attributes of the dataclass instances match their type annotations.
- Serialization: Supports serialization to and from JSON and YAML formats.
- Search Space Generation: Generates search trees for hyperparameter tuning in grid and random search.
You can install dataserious with YAML support using pip.
pip install "dataserious[yaml] @ git+https://github.com/Noza23/dataserious.git"To define a configuration class, inherit from BaseConfig and use ConfigField for fields that require additional metadata.
from dataserious import BaseConfig, ConfigField
class ModelConfig(BaseConfig):
name: str
"""Name of the model."""
n_layers: int = ConfigField(searchable=True, description="Number of layers in the model.")
n_heads: int = ConfigField(searchable=True)
"""Number of heads in the model."""ModelConfig.to_schema()
ModelConfig.schema_to_yaml("schema.yaml")name: 'str: Name of the model.'
n_layers: 'int: Number of layers in the model.'
n_heads: 'int: Number of heads in the model.'You can load and save configurations in JSON and YAML formats.
config = ModelConfig(name="GPT", n_layers=12, n_heads=12)
config.to_yaml("config.yaml")
loaded_config = ModelConfig.from_yaml("config.yaml")Generate search trees for hyperparameter tuning.
print(config.to_search_tree())
config.search_tree_to_yaml("search_tree.yaml")n_layers:
- int
n_heads:
- int-
Filled out search tree test.yaml:
n_layers: - 12 - 24 - 36 n_heads: - 12 - 24 - 36
-
Loading the search tree and generating search spaces:
configs = config.get_configs_grid_from_path("test.yaml") configs_random = config.get_configs_random_from_path("test.yaml", n=2, seed=42)
-
Resulting Configs:
[print(json.dumps(config.to_dict(), indent=4)) for config in configs] print("\nRandom configs:") [print(json.dumps(config.to_dict(), indent=4)) for config in configs_random]
Ensure that the configurations match the expected types.
try:
config = ModelConfig(name="GPT", n_layers="twelve", n_heads=12)
except TypeError as e:
print(e)Contributions are welcome!