fix(KDP): Preserve original dtype for PASSTHROUGH features #30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new
PreserveDtypeLayerto handle passthrough features while preserving their original data types or casting them to a specified type. It also updates the preprocessing pipeline to utilize this layer and adds extensive testing to ensure its functionality. Below is a summary of the most important changes:Feature Addition and Integration:
PreserveDtypeLayerinkdp/layers/preserve_dtype.pyfor preserving or casting input tensor data types. The layer supports serialization and deserialization for integration into Keras models.PreserveDtypeLayerinto the layer factory by adding apreserve_dtype_layermethod inkdp/layers_factory.py. This allows dynamic creation of the layer.kdp/processor.pyto usePreserveDtypeLayerfor passthrough features, replacing the previous approach that cast all features tofloat32.Testing Enhancements:
PreserveDtypeLayerin a new file,test/layers/test_preserve_dtype_layer.py, covering scenarios like preserving original data types, casting to target data types, batch processing, serialization, and integration into Keras models.test/layers/test_layer_factory.pyto include cases forPreserveDtypeLayer, ensuring its compatibility with the layer factory.test/test_processor.pyto validate the behavior of passthrough features with various data types (e.g., string, integer, float) and their preservation in the preprocessing pipeline.Configuration and Test Suite Updates:
micromarker inpytest.inifor categorizing the fastest tests, including those forPreserveDtypeLayer.test/layers/test_layer_factory.pyandtest/test_processor.py) to include themicromarker and additional pytest markers for better test categorization. [1] [2]These changes enhance the flexibility and robustness of the preprocessing pipeline by enabling precise handling of passthrough features while maintaining or transforming their data types as needed.