-
Notifications
You must be signed in to change notification settings - Fork 0
Add missing Many-to-Many relations for Datasets and assets #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Changed Files
|
|
Something to consider: How much data will need to be migrated and is this better to do as a manual command run or django migration. Testing the command for performance will probably be easier. This is also not something that could be reversed/rolled back really, so a migration might be finnicky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds many-to-many relationships between Files/Captures and Datasets, replacing the previous foreign key relationships that only allowed a single dataset association. The old foreign key fields are renamed to *_deprecated while new ManyToManyFields are introduced, enabling assets to belong to multiple datasets.
Key Changes
- Added
datasetsandcapturesManyToManyFields to File and Capture models - Updated access control logic to check all associated datasets/captures instead of just one
- Modified capture ingestion to use ManyToMany
.add()and.remove()operations
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| gateway/sds_gateway/api_methods/models.py | Adds new ManyToMany fields (datasets, captures) and renames existing ForeignKey fields to *_deprecated |
| gateway/sds_gateway/api_methods/migrations/0018_capture_datasets_file_captures_file_datasets_and_more.py | Django migration that creates the new ManyToMany fields and updates related_name on deprecated fields |
| gateway/sds_gateway/api_methods/migrations/max_migration.txt | Updates migration tracker to reference the new migration |
| gateway/sds_gateway/api_methods/utils/asset_access_control.py | Updates access control functions to iterate through all associated datasets/captures using the new ManyToMany fields |
| gateway/sds_gateway/api_methods/views/capture_endpoints.py | Updates file connection/disconnection logic to use ManyToMany .add() and .remove() methods |
| gateway/sds_gateway/users/mixins.py | Changes file type detection to use .exists() check on the new captures ManyToMany field |
| gateway/sds_gateway/static/js/components.js | Adds header comment explaining deprecation and TODO for refactoring to handle multiple associations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
are you planning to migrate data (fill many-to-many based on the current foreign key fields) in a future PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave this out of the PR until it's time to rebase, so we preserve linear migrations
| # Check if file is part of a capture that is shared with the user | ||
| if file.capture: | ||
| if user_has_access_to_capture(user, file.capture): | ||
| return True | ||
| if file.captures.exists(): | ||
| for capture in file.captures.all(): | ||
| if user_has_access_to_capture(user, capture): | ||
| return True | ||
|
|
||
| # Check if file is part of a dataset that is shared with them | ||
| if file.dataset: | ||
| if user_has_access_to_item(user, file.dataset.uuid, ItemType.DATASET): | ||
| return True | ||
| if file.datasets.exists(): | ||
| for dataset in file.datasets.all(): | ||
| if user_has_access_to_item(user, dataset.uuid, ItemType.DATASET): | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assumes the fields are already populated, meaning we need the data migration before merging this, or we break functionality.
The same applies anywhere in this PR where we're accessing the plural names alone (.captures or .datasets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I separate the model changes and migrations into a separate PR and merge that first and update THEN merge this? Or have some check for which field is populated (which would probably be removed/redundant later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think of it as the expand and contract pattern (parallel change)
https://martinfowler.com/bliki/ParallelChange.html
- Expand: introduce the many-to-many (M2M) relationships alongside the old, writing to both
- Migrate: update consumers to use the new M2M
- Contract: remove the old foreign key relationships after they're confirmed to be unused
then the data migration needs to happen before or along with step 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, yes, I've done this before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this could be 2 PRs one with the migration steps and tracking for foreign key and many2many (additive changes). then a PR to remove the foreign key relations (subtractive changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, or even 3, up to you.
Also, there's no rush to ship the contract step: we can leave it in a draft PR for a few weeks. In the future when other sites deploy SDS we'll want to leave a larger window like that to make sure all production sites migrate without problems before removing their old relationships.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise_if_file_deletion_is_blocked needs to be updated to raise for M2M fields too (data integrity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above comment.
|
Just adding a note: https://crc-dev.atlassian.net/browse/SFDS-286 Quick add capture is blocked by this PR. |
94cfd43 to
08cc392
Compare
[Issue] The captures and files have foreign key relationships to datasets, when they should be many-to-many. Meaning when assets already on a dataset are added to another, the relation is replaced.
[Solution] Add the many-to-many relations and update references to use them.
[Future TODO] Need to migrate existing relations to many-to-many field (suggest edits here) and clean up the old fields in another DB migration.