-
-
Notifications
You must be signed in to change notification settings - Fork 311
Refactor term to be non unique and add dependencies #3079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Refactor term to be non unique and add dependencies #3079
Conversation
…itor before being replaced with different logic).
1. add `load_by_primary_title`. 2. remove validations regarding titles. rather add validation for new term's name doesn't exist.
load terms by primary titles, which ensure the category will get the required titles. remove term's name setting, leaving it to be determined by the term saving.
remove dead code (if False) and commented-out code.
add validation name is not changed
…ught by only `add`), and have the same view as edit/terms.
…an have the same title.
…titles` since different terms can have one equal primary title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Refactors the Term model to decouple stable identifiers (name as a slug) from user-facing titles, allowing terms to share titles across languages and enabling cascading rebuilds/updates when primary titles change.
Changes:
- Updates
Termuniqueness/validation and replacesload_by_titleusage with lookups bynameor(en_primary, he_primary)pairs. - Adds a dependency hook intended to rebuild library caches and cascade ref updates when term primary titles change.
- Cleans up related APIs/routes and removes some unused helper code.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
sefaria/utils/calendars.py |
Updates parasha Term lookup to load by name. |
sefaria/urls_library.py |
Adjusts term editor route parameter name and removes a redundant URL. |
sefaria/model/text.py |
Import cleanups and minor internal refactors. |
sefaria/model/tests/terms_test.py |
Removes an obsolete load_by_title test. |
sefaria/model/schema.py |
Refactors Term keys/validation and adds load_by_primary_titles + name generation. |
sefaria/model/dependencies.py |
Subscribes to term title changes to trigger rebuild/cascade behavior. |
sefaria/helper/schema.py |
Adds cascading/rebuild logic for term primary title changes and fixes some cascade calls. |
sefaria/helper/category.py |
Removes unused check_term helper. |
reader/views.py |
Updates term editor + term/category APIs to use name and new primary-title lookup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sefaria/helper/schema.py
Outdated
| if cause_ref_change: # Now new refs are available and can be cascaded | ||
| for index in library.all_index_records(): | ||
| for node in [index.nodes] + index.nodes.all_children(): | ||
| if getattr(node, "sharedTitle", None) == old_primary_en: |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks for affected schema nodes by comparing node.sharedTitle to old_primary_en, but sharedTitle is intended to reference Term.name (the stable slug), which may differ from the English primary title after this refactor (or for disambiguated names like Foo1). Use term.name to select nodes, while still cascading refs from old_primary_en to new_primary_en.
| if getattr(node, "sharedTitle", None) == old_primary_en: | |
| if getattr(node, "sharedTitle", None) == term.name: |
sefaria/model/schema.py
Outdated
| name = self.get_primary_title() | ||
| dupes = len(TermSet({'name': {'$regex': fr'^{name}\d*$'}})) | ||
| if dupes: | ||
| name = f'{name}{dupes}' |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The autogenerated name disambiguation is based on len(TermSet(regex)), which (1) loads all matching records into memory, (2) doesn’t escape regex metacharacters in name, and (3) can produce collisions when suffixes are non-contiguous (e.g., existing Foo and Foo2 => count=2 => generates Foo2). Prefer count_documents()/TermSet(...).count() and pick the next available suffix via a loop (similar to SluggedAbstractMongoRecord.normalize_slug_field), using re.escape(name) when building regex patterns.
| name = self.get_primary_title() | |
| dupes = len(TermSet({'name': {'$regex': fr'^{name}\d*$'}})) | |
| if dupes: | |
| name = f'{name}{dupes}' | |
| base_name = self.get_primary_title() | |
| name = base_name | |
| # Ensure the autogenerated name is unique by finding the next available numeric suffix. | |
| if Term().load({'name': base_name}): | |
| suffix = 2 | |
| while True: | |
| candidate = f"{base_name}{suffix}" | |
| if not Term().load({'name': candidate}): | |
| name = candidate | |
| break | |
| suffix += 1 |
reader/views.py
Outdated
| "content": "Please include the primary Term name in the URL to uses the Terms Editor." | ||
| }) | ||
| existing_term = Term().load({'name': name}) | ||
| data = existing_term.contents() if existing_term else {} |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the term doesn’t exist, data is set to {}. The term editor UI/JSON flow previously pre-populated name and an empty titles array; returning an empty object can break the editor and/or lead to a POST missing required fields. Consider initializing data to at least {"name": name, "titles": []} for new terms (while still preventing name edits server-side).
| data = existing_term.contents() if existing_term else {} | |
| if existing_term: | |
| data = existing_term.contents() | |
| else: | |
| # Initialize minimal structure for new terms to support the editor UI. | |
| data = {"name": name, "titles": []} |
sefaria/model/schema.py
Outdated
| if self.is_new() and Term().load({'name': self.name}): | ||
| raise DuplicateRecordError(f"A Term with the name {self.name} already exists") | ||
| elif not self.is_new() and self.is_key_changed('name'): | ||
| raise ValidationError({"name": "This field cannot be changed."}) |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising django.core.exceptions.ValidationError here is inconsistent with the rest of the Mongo model layer (which generally raises InputError/DuplicateRecordError). This may not be caught by existing API error handling in non-Django contexts and can surface as a 500. Prefer raising an InputError (or a project-specific validation exception) for immutability violations.
| raise ValidationError({"name": "This field cannot be changed."}) | |
| raise InputError("Field 'name' cannot be changed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure about that
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…st in pkeys. The titles list was stored by reference in pkeys_orig_values, so in-place mutations bypassed dependency tracking. Replace with `_primary_en`/`_primary_he` string attrs that reliably detect changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
sefaria/model/schema.py:238
- The class 'Term' does not override 'eq', but adds the new attribute _primary_en.
The class 'Term' does not override 'eq', but adds the new attribute _primary_he.
The class 'Term' does not override 'eq', but adds the new attribute title_group.
The class 'Term' does not override 'eq', but adds the new attribute name.
The class 'Term' does not override 'eq', but adds the new attribute titles.
class Term(abst.AbstractMongoRecord, AbstractTitledObject):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sefaria/model/schema.py
Outdated
| if self.is_new() and Term().load({'name': self.name}): | ||
| raise DuplicateRecordError(f"A Term with the name {self.name} already exists") | ||
| elif not self.is_new() and self.is_key_changed('name'): | ||
| raise ValidationError({"name": "This field cannot be changed."}) |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising django.core.exceptions.ValidationError here will bypass the project’s catch_error_as_json handling (which only catches InputError) and can turn a user-caused validation issue into a 500. Prefer raising an InputError/DuplicateRecordError (or another sefaria.system.exceptions type) for consistency with the rest of the Mongo model layer.
| raise ValidationError({"name": "This field cannot be changed."}) | |
| raise InputError({"name": "This field cannot be changed."}) |
sefaria/model/schema.py
Outdated
| def load_by_primary_titles(self, en_title, he_title): | ||
| query = { | ||
| 'titles': { | ||
| '$all': [{'$elemMatch': { | ||
| 'text': t, 'primary': True | ||
| }} for t in [en_title, he_title]] | ||
| } | ||
| } | ||
| return self.load(query=query) | ||
|
|
||
| def _update_tracked_primary_titles(self): | ||
| self._primary_en = self.get_primary_title("en") | ||
| self._primary_he = self.get_primary_title("he") | ||
|
|
||
| def _set_pkeys(self): | ||
| self.set_titles(getattr(self, "titles", None)) | ||
| self._update_tracked_primary_titles() | ||
| super()._set_pkeys() | ||
|
|
||
| def _set_derived_attributes(self): | ||
| self.set_titles(getattr(self, "titles", None)) | ||
|
|
||
| def set_titles(self, titles): | ||
| self.title_group = TitleGroup(titles) | ||
|
|
||
| def _set_name(self): | ||
| name = base_name = self.get_primary_title() | ||
| terms = TermSet({'name': {'$regex': f'^{re.escape(name)}\d*$'}}) | ||
| existing_names = {t.name for t in terms} | ||
| i = 1 | ||
| while name in existing_names: | ||
| name = base_name + i | ||
| i += 1 | ||
| self.name = name | ||
|
|
||
| def _normalize(self): | ||
| self.titles = self.title_group.titles | ||
| self._update_tracked_primary_titles() | ||
| if not hasattr(self, 'name'): | ||
| self._set_name() | ||
|
|
||
| def _validate(self): | ||
| super(Term, self)._validate() | ||
| # do not allow duplicates: | ||
| for title in self.get_titles(): | ||
| other_term = Term().load_by_title(title) | ||
| if other_term and not self.same_record(other_term): | ||
| raise InputError("A Term with the title {} in it already exists".format(title)) | ||
| # ensue uniqueness of primary titles together | ||
| same_titles_term = Term().load_by_primary_titles(self.get_primary_title(), self.get_primary_title('he')) | ||
| if same_titles_term and not self.same_record(same_titles_term): | ||
| raise DuplicateRecordError(f"A Term with the primary titles {self.get_primary_title()} and {self.get_primary_title('he')} already exists") | ||
| # do not allow duplicate names | ||
| if self.is_new() and Term().load({'name': self.name}): | ||
| raise DuplicateRecordError(f"A Term with the name {self.name} already exists") | ||
| elif not self.is_new() and self.is_key_changed('name'): | ||
| raise ValidationError({"name": "This field cannot be changed."}) |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New behavior around primary-title pair uniqueness and automatic name generation (including collision handling) doesn’t appear to have direct test coverage. Adding tests for load_by_primary_titles() (including language-specific matching) and for _set_name() when a collision exists would help ensure future refactors don’t break term creation/validation.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…or-term-to-be-non-unique
Description
Termhas unique primary titles, and name equal to english primary title. That made it unflexible - we couldn't have terms with the same one language and different other language, and changing title meant change all over the library. Now thenameis used as a slug, so titles can be changed and affect other objects with no need to change them.sharedTitleto cascade all objects with this ref.Code Changes
_update_tracked_primary_titlesfunction and run it before_set_pkeysand in_normalize.name, duplicate primary titles together and not changingname(as it's a slug)load_by_titlefor 2 terms can have now same titles. rather addload_by_primary_titlesfor getting a Term by both primary titles together.load_by_titleand Term().loadto load by name (which for all existing terms has the same result as primary English title) orload_by_primary_titles` (views.py, terms_test.py (i think no tests needed for new function), calendars.py).collective_titleand indexes' nodesshredTitle(dependencies.py, helper/schema.py).attr_namewhere needed, changeManuscriptSettoManuscriptPageSet, changeManuscriptSettoWebpageSet(helper/schema.py).termtoname(views.py).nameis None for url doesn't allow this (views.py).