Skip to content

Conversation

@klpoland
Copy link
Collaborator

@klpoland klpoland commented Dec 4, 2025

[Issue] The captures and files have foreign key relationships to datasets, when they should be many-to-many. Meaning when assets already on a dataset are added to another, the relation is replaced.

[Solution] Add the many-to-many relations and update references to use them.

[Future TODO] Need to migrate existing relations to many-to-many field (suggest edits here) and clean up the old fields in another DB migration.

@klpoland klpoland requested a review from lucaspar December 4, 2025 21:01
@klpoland klpoland self-assigned this Dec 4, 2025
@klpoland klpoland added bug Something isn't working gateway Gateway component labels Dec 4, 2025
@semanticdiff-com
Copy link

semanticdiff-com bot commented Dec 4, 2025

@klpoland
Copy link
Collaborator Author

klpoland commented Dec 5, 2025

Something to consider:

How much data will need to be migrated and is this better to do as a manual command run or django migration. Testing the command for performance will probably be easier. This is also not something that could be reversed/rolled back really, so a migration might be finnicky.

@klpoland klpoland added the migrations Code changes that require data or schema migrations in the database. label Dec 5, 2025
@lucaspar lucaspar requested review from Copilot and srucker01 December 5, 2025 23:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds many-to-many relationships between Files/Captures and Datasets, replacing the previous foreign key relationships that only allowed a single dataset association. The old foreign key fields are renamed to *_deprecated while new ManyToManyFields are introduced, enabling assets to belong to multiple datasets.

Key Changes

  • Added datasets and captures ManyToManyFields to File and Capture models
  • Updated access control logic to check all associated datasets/captures instead of just one
  • Modified capture ingestion to use ManyToMany .add() and .remove() operations

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
gateway/sds_gateway/api_methods/models.py Adds new ManyToMany fields (datasets, captures) and renames existing ForeignKey fields to *_deprecated
gateway/sds_gateway/api_methods/migrations/0018_capture_datasets_file_captures_file_datasets_and_more.py Django migration that creates the new ManyToMany fields and updates related_name on deprecated fields
gateway/sds_gateway/api_methods/migrations/max_migration.txt Updates migration tracker to reference the new migration
gateway/sds_gateway/api_methods/utils/asset_access_control.py Updates access control functions to iterate through all associated datasets/captures using the new ManyToMany fields
gateway/sds_gateway/api_methods/views/capture_endpoints.py Updates file connection/disconnection logic to use ManyToMany .add() and .remove() methods
gateway/sds_gateway/users/mixins.py Changes file type detection to use .exists() check on the new captures ManyToMany field
gateway/sds_gateway/static/js/components.js Adds header comment explaining deprecation and TODO for refactoring to handle multiple associations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lucaspar
Copy link
Member

are you planning to migrate data (fill many-to-many based on the current foreign key fields) in a future PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave this out of the PR until it's time to rebase, so we preserve linear migrations

Comment on lines 66 to 76
# Check if file is part of a capture that is shared with the user
if file.capture:
if user_has_access_to_capture(user, file.capture):
return True
if file.captures.exists():
for capture in file.captures.all():
if user_has_access_to_capture(user, capture):
return True

# Check if file is part of a dataset that is shared with them
if file.dataset:
if user_has_access_to_item(user, file.dataset.uuid, ItemType.DATASET):
return True
if file.datasets.exists():
for dataset in file.datasets.all():
if user_has_access_to_item(user, dataset.uuid, ItemType.DATASET):
return True
Copy link
Member

@lucaspar lucaspar Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assumes the fields are already populated, meaning we need the data migration before merging this, or we break functionality.

The same applies anywhere in this PR where we're accessing the plural names alone (.captures or .datasets)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I separate the model changes and migrations into a separate PR and merge that first and update THEN merge this? Or have some check for which field is populated (which would probably be removed/redundant later.

Copy link
Member

@lucaspar lucaspar Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think of it as the expand and contract pattern (parallel change)

https://martinfowler.com/bliki/ParallelChange.html

  1. Expand: introduce the many-to-many (M2M) relationships alongside the old, writing to both
  2. Migrate: update consumers to use the new M2M
  3. Contract: remove the old foreign key relationships after they're confirmed to be unused

then the data migration needs to happen before or along with step 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yes, I've done this before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this could be 2 PRs one with the migration steps and tracking for foreign key and many2many (additive changes). then a PR to remove the foreign key relations (subtractive changes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, or even 3, up to you.

Also, there's no rush to ship the contract step: we can leave it in a draft PR for a few weeks. In the future when other sites deploy SDS we'll want to leave a larger window like that to make sure all production sites migrate without problems before removing their old relationships.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise_if_file_deletion_is_blocked needs to be updated to raise for M2M fields too (data integrity)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above comment.

@LLKruczek
Copy link
Collaborator

Just adding a note: https://crc-dev.atlassian.net/browse/SFDS-286 Quick add capture is blocked by this PR.

@klpoland klpoland force-pushed the fix-kpoland-dataset-asset-relations branch from 94cfd43 to 08cc392 Compare December 18, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gateway Gateway component migrations Code changes that require data or schema migrations in the database.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants