feat(dataset-api): extend upload_dataset and add upload_dataset_parts#64
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request extends the Dataset API functionality by adding support for tags and additional metadata to dataset uploads, and introduces a new upload_dataset_parts method for adding parts to existing datasets. The PR also includes comprehensive code formatting improvements, primarily reorganizing imports to follow PEP 8 standards (stdlib, third-party, local) and removing trailing whitespace.
Key Changes
- Extended
upload_datasetmethod to accept optionaltagsandadditional_dataparameters - Added new
upload_dataset_partsmethod to upload parts to existing datasets with optional replacement logic - Reorganized imports across the codebase to follow PEP 8 conventions (stdlib → third-party → local)
Reviewed changes
Copilot reviewed 65 out of 72 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| cosmotech/coal/cosmotech_api/apis/dataset.py | Extended upload_dataset with tags/additional_data parameters; added upload_dataset_parts method |
| tests/unit/coal/test_cosmotech_api/test_apis/test_dataset.py | Added comprehensive test coverage for new dataset API features |
| cosmotech/translation/coal/en-US/coal/services/dataset.yml | Added translation strings for new dataset parts operations |
| cosmotech/coal/store/init.py | Reorganized imports (with duplicate import issue) |
| Multiple tutorial files | Reorganized imports to follow PEP 8 conventions |
| Multiple test files | Reorganized imports, removed unused imports and variables |
| cosmotech/coal/azure/adx/ingestion.py | Consolidated typing imports into single line |
| cosmotech/coal/azure/adx/init.py | Reorganized exports alphabetically |
| generate_test_files.py | Removed redundant pass statement after comment |
| Documentation files | Removed trailing whitespace |
| Configuration files | Minor formatting fixes (trailing whitespace, newlines) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.create_dataset_part( | ||
| organization_id=self.configuration.cosmotech.organization_id, | ||
| workspace_id=self.configuration.cosmotech.workspace_id, | ||
| dataset_id=dataset_id, | ||
| dataset_part_create_request=part_request, | ||
| file=(_p_name, _p_path.open("rb").read()), | ||
| ) |
There was a problem hiding this comment.
The file handle opened by _p_path.open("rb") is not properly closed. This can lead to resource leaks, especially when uploading multiple files. Consider using a context manager to ensure proper file closure.
| self.create_dataset_part( | |
| organization_id=self.configuration.cosmotech.organization_id, | |
| workspace_id=self.configuration.cosmotech.workspace_id, | |
| dataset_id=dataset_id, | |
| dataset_part_create_request=part_request, | |
| file=(_p_name, _p_path.open("rb").read()), | |
| ) | |
| with _p_path.open("rb") as _f: | |
| self.create_dataset_part( | |
| organization_id=self.configuration.cosmotech.organization_id, | |
| workspace_id=self.configuration.cosmotech.workspace_id, | |
| dataset_id=dataset_id, | |
| dataset_part_create_request=part_request, | |
| file=(_p_name, _f.read()), | |
| ) |
| from cosmotech.coal.store.pandas import ( | ||
| store_dataframe, | ||
| convert_store_table_to_dataframe as convert_store_table_to_pandas_dataframe, | ||
| ) | ||
|
|
||
| # Re-export functions from the pyarrow module (if available) | ||
|
|
||
| from cosmotech.coal.store.pandas import ( | ||
| store_dataframe, | ||
| ) | ||
| from cosmotech.coal.store.pyarrow import ( | ||
| store_table, | ||
| convert_store_table_to_dataframe as convert_store_table_to_pyarrow_table, | ||
| ) | ||
| from cosmotech.coal.store.pyarrow import ( | ||
| store_table, | ||
| ) | ||
|
|
||
| # Re-export the Store class | ||
| from cosmotech.coal.store.store import Store | ||
|
|
||
| # Re-export functions from the pandas module (if available) | ||
|
|
||
|
|
||
| # Re-export functions from the pyarrow module (if available) |
There was a problem hiding this comment.
The imports from pandas and pyarrow modules are duplicated. Lines 26-31 import from pandas, and lines 32-37 import from pyarrow, but then lines 42-45 have redundant comments suggesting these imports should go there. The duplicate import blocks (lines 26-31 and 29-31 for pandas; lines 32-34 and 35-37 for pyarrow) should be consolidated into single import blocks.
|
|
||
| import pyarrow as pa | ||
| import pandas as pd | ||
| import pyarrow as pa |
There was a problem hiding this comment.
Import of 'pa' is not used.
| import pyarrow as pa | ||
| import pandas as pd | ||
| import pyarrow as pa | ||
| import pytest |
There was a problem hiding this comment.
Import of 'pytest' is not used.
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pyarrow as pa | ||
| import pytest |
There was a problem hiding this comment.
Import of 'pytest' is not used.
| from unittest.mock import MagicMock | ||
|
|
||
| import pyarrow as pa | ||
| import pytest |
There was a problem hiding this comment.
Import of 'pytest' is not used.
f07a0e7 to
a7b6126
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.