Skip to content

Conversation

@piotrlaczkowski
Copy link
Collaborator

@piotrlaczkowski piotrlaczkowski commented Jul 30, 2025

This pull request introduces a new PreserveDtypeLayer to handle passthrough features while preserving their original data types or casting them to a specified type. It also updates the preprocessing pipeline to utilize this layer and adds extensive testing to ensure its functionality. Below is a summary of the most important changes:

Feature Addition and Integration:

  • Added a new PreserveDtypeLayer in kdp/layers/preserve_dtype.py for preserving or casting input tensor data types. The layer supports serialization and deserialization for integration into Keras models.
  • Integrated PreserveDtypeLayer into the layer factory by adding a preserve_dtype_layer method in kdp/layers_factory.py. This allows dynamic creation of the layer.
  • Updated the preprocessing pipeline in kdp/processor.py to use PreserveDtypeLayer for passthrough features, replacing the previous approach that cast all features to float32.

Testing Enhancements:

  • Added unit tests for PreserveDtypeLayer in a new file, test/layers/test_preserve_dtype_layer.py, covering scenarios like preserving original data types, casting to target data types, batch processing, serialization, and integration into Keras models.
  • Enhanced existing tests in test/layers/test_layer_factory.py to include cases for PreserveDtypeLayer, ensuring its compatibility with the layer factory.
  • Extended tests in test/test_processor.py to validate the behavior of passthrough features with various data types (e.g., string, integer, float) and their preservation in the preprocessing pipeline.

Configuration and Test Suite Updates:

  • Added a new micro marker in pytest.ini for categorizing the fastest tests, including those for PreserveDtypeLayer.
  • Updated test files (test/layers/test_layer_factory.py and test/test_processor.py) to include the micro marker and additional pytest markers for better test categorization. [1] [2]

These changes enhance the flexibility and robustness of the preprocessing pipeline by enabling precise handling of passthrough features while maintaining or transforming their data types as needed.

Co-authored-by: piotr.laczkowski <piotr.laczkowski@gmail.com>
@piotrlaczkowski piotrlaczkowski marked this pull request as ready for review July 30, 2025 13:05
@piotrlaczkowski piotrlaczkowski changed the title Preserve original dtype for PASSTHROUGH features fix(KDP): Preserve original dtype for PASSTHROUGH features Jul 30, 2025
@piotrlaczkowski piotrlaczkowski merged commit 82b6d7e into main Jul 30, 2025
15 checks passed
github-actions bot pushed a commit that referenced this pull request Jul 30, 2025
## <small>1.11.1 (2025-07-30)</small>

* fix(KDP): fixing tests ([6326dbf](6326dbf))
* fix(KDP): formatting issues fixes ([6c60aed](6c60aed))
* fix(KDP): increasing package version ([ce0dbf3](ce0dbf3))
* fix(KDP): Preserve original dtype for PASSTHROUGH features (#30) ([82b6d7e](82b6d7e)), closes [#30](#30)
* fix(KDP): update upload-artifact action to v4 in GitHub workflow ([68ee7c5](68ee7c5))
* Add preserve dtype layer and update passthrough feature handling ([a700bad](a700bad))
* Add pytest markers and improve test categorization for GitHub workflow ([06b5112](06b5112))
* Checkpoint before follow-up message ([47ec0ef](47ec0ef))
* chore: save last release version for recovery [skip ci] ([84e0b1f](84e0b1f))
* refactor(KDP): improving tests execution ([b9d237e](b9d237e))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants