Refactor term to be non unique and add dependencies #3079

YishaiGlasner · 2026-02-08T12:07:33Z

Description

Until today Term has unique primary titles, and name equal to english primary title. That made it unflexible - we couldn't have terms with the same one language and different other language, and changing title meant change all over the library. Now the name is used as a slug, so titles can be changed and affect other objects with no need to change them.
That makes adding dependinncies for Term easier (though not easy) - we have to rebuild library (caching of terms, categories and indexes), and then, if the Term was used as a sharedTitle to cascade all objects with this ref.

Code Changes

Term (model/shema/py):

add titles to pkeys as internal attributes (not saved in db). add _update_tracked_primary_titles function and run it before _set_pkeys and in _normalize.
change validation, for not having duplicate name, duplicate primary titles together and not changing name (as it's a slug)
generate a name (like a slug) as part of the validate function
remove load_by_title for 2 terms can have now same titles. rather add load_by_primary_titles for getting a Term by both primary titles together.

Change uses of load_by_title and Term().loadto load by name (which for all existing terms has the same result as primary English title) orload_by_primary_titles` (views.py, terms_test.py (i think no tests needed for new function), calendars.py).
Change tests to allow duplicate titles, but not of primary titles together.
add dependenciesw for changing primary titles (cascade categories, indexes' collective_title and indexes' nodes shredTitle (dependencies.py, helper/schema.py).
fix bugs in cascade function (add attr_name where needed, change ManuscriptSet to ManuscriptPageSet, change ManuscriptSet to WebpageSet (helper/schema.py).
terms api (wasn't un use AFAIK):

remove redundant url (which is also is not caught because a former url) (urls_library.py).
rename param term to name (views.py).
remove condition if name is None for url doesn't allow this (views.py).

category api - change check of existing term (views.py).
imports clean (text.py)
Remove of unused code - commented ou, conditioned by False or functions with no uses (model/shema/py, category.py)

…name.

…ence).

…itor before being replaced with different logic).

1. add `load_by_primary_title`. 2. remove validations regarding titles. rather add validation for new term's name doesn't exist.

load terms by primary titles, which ensure the category will get the required titles. remove term's name setting, leaving it to be determined by the term saving.

remove dead code (if False) and commented-out code.

add validation name is not changed

…d by url.

…ught by only `add`), and have the same view as edit/terms.

…an have the same title.

…titles` since different terms can have one equal primary title.

…ogether.

Copilot

Pull request overview

Refactors the Term model to decouple stable identifiers (name as a slug) from user-facing titles, allowing terms to share titles across languages and enabling cascading rebuilds/updates when primary titles change.

Changes:

Updates Term uniqueness/validation and replaces load_by_title usage with lookups by name or (en_primary, he_primary) pairs.
Adds a dependency hook intended to rebuild library caches and cascade ref updates when term primary titles change.
Cleans up related APIs/routes and removes some unused helper code.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`sefaria/utils/calendars.py`	Updates parasha Term lookup to load by `name`.
`sefaria/urls_library.py`	Adjusts term editor route parameter name and removes a redundant URL.
`sefaria/model/text.py`	Import cleanups and minor internal refactors.
`sefaria/model/tests/terms_test.py`	Removes an obsolete `load_by_title` test.
`sefaria/model/schema.py`	Refactors `Term` keys/validation and adds `load_by_primary_titles` + name generation.
`sefaria/model/dependencies.py`	Subscribes to term title changes to trigger rebuild/cascade behavior.
`sefaria/helper/schema.py`	Adds cascading/rebuild logic for term primary title changes and fixes some cascade calls.
`sefaria/helper/category.py`	Removes unused `check_term` helper.
`reader/views.py`	Updates term editor + term/category APIs to use `name` and new primary-title lookup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sefaria/helper/schema.py

Copilot · 2026-02-08T12:14:26Z

sefaria/helper/schema.py

+    if cause_ref_change:  # Now new refs are available and can be cascaded
+        for index in library.all_index_records():
+            for node in [index.nodes] + index.nodes.all_children():
+                if getattr(node, "sharedTitle", None) == old_primary_en:


This code looks for affected schema nodes by comparing node.sharedTitle to old_primary_en, but sharedTitle is intended to reference Term.name (the stable slug), which may differ from the English primary title after this refactor (or for disambiguated names like Foo1). Use term.name to select nodes, while still cascading refs from old_primary_en to new_primary_en.

Suggested change

if getattr(node, "sharedTitle", None) == old_primary_en:

if getattr(node, "sharedTitle", None) == term.name:

Copilot · 2026-02-08T12:14:26Z

sefaria/model/schema.py

+            name = self.get_primary_title()
+            dupes = len(TermSet({'name': {'$regex': fr'^{name}\d*$'}}))
+            if dupes:
+                name = f'{name}{dupes}'


The autogenerated name disambiguation is based on len(TermSet(regex)), which (1) loads all matching records into memory, (2) doesn’t escape regex metacharacters in name, and (3) can produce collisions when suffixes are non-contiguous (e.g., existing Foo and Foo2 => count=2 => generates Foo2). Prefer count_documents()/TermSet(...).count() and pick the next available suffix via a loop (similar to SluggedAbstractMongoRecord.normalize_slug_field), using re.escape(name) when building regex patterns.

Suggested change

name = self.get_primary_title()

dupes = len(TermSet({'name': {'$regex': fr'^{name}\d*$'}}))

if dupes:

name = f'{name}{dupes}'

base_name = self.get_primary_title()

name = base_name

# Ensure the autogenerated name is unique by finding the next available numeric suffix.

if Term().load({'name': base_name}):

suffix = 2

while True:

candidate = f"{base_name}{suffix}"

if not Term().load({'name': candidate}):

name = candidate

break

suffix += 1

sefaria/model/schema.py

sefaria/model/dependencies.py

Copilot · 2026-02-08T12:14:27Z

reader/views.py

-            "content": "Please include the primary Term name in the URL to uses the Terms Editor."
-        })
+    existing_term = Term().load({'name': name})
+    data = existing_term.contents() if existing_term else {}


When the term doesn’t exist, data is set to {}. The term editor UI/JSON flow previously pre-populated name and an empty titles array; returning an empty object can break the editor and/or lead to a POST missing required fields. Consider initializing data to at least {"name": name, "titles": []} for new terms (while still preventing name edits server-side).

Suggested change

data = existing_term.contents() if existing_term else {}

if existing_term:

data = existing_term.contents()

else:

# Initialize minimal structure for new terms to support the editor UI.

data = {"name": name, "titles": []}

sefaria/model/schema.py

Copilot · 2026-02-08T12:14:27Z

sefaria/model/schema.py

+        if self.is_new() and Term().load({'name': self.name}):
+            raise DuplicateRecordError(f"A Term with the name {self.name} already exists")
+        elif not self.is_new() and self.is_key_changed('name'):
+            raise ValidationError({"name": "This field cannot be changed."})


Raising django.core.exceptions.ValidationError here is inconsistent with the rest of the Mongo model layer (which generally raises InputError/DuplicateRecordError). This may not be caught by existing API error handling in non-Django contexts and can surface as a 500. Prefer raising an InputError (or a project-specific validation exception) for immutability violations.

Suggested change

raise ValidationError({"name": "This field cannot be changed."})

raise InputError("Field 'name' cannot be changed.")

i'm not sure about that

sefaria/model/tests/terms_test.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mergify · 2026-02-08T12:59:00Z

🧪 CI Insights

Here's what we observed from your CI run for b111d42.

❌ Job Failures

Pipeline	Job	Health on `master`	Retries	🔍 CI Insights	📄 Logs
`Continuous`	`Continuous Testing: PyTest`		`0`	View	View

…r old term.

…st in pkeys. The titles list was stored by reference in pkeys_orig_values, so in-place mutations bypassed dependency tracking. Replace with `_primary_en`/`_primary_he` string attrs that reliably detect changes.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

sefaria/model/schema.py:238

The class 'Term' does not override 'eq', but adds the new attribute _primary_en.
The class 'Term' does not override 'eq', but adds the new attribute _primary_he.
The class 'Term' does not override 'eq', but adds the new attribute title_group.
The class 'Term' does not override 'eq', but adds the new attribute name.
The class 'Term' does not override 'eq', but adds the new attribute titles.

class Term(abst.AbstractMongoRecord, AbstractTitledObject):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sefaria/model/schema.py

Copilot · 2026-02-08T16:58:22Z

sefaria/model/schema.py

+        if self.is_new() and Term().load({'name': self.name}):
+            raise DuplicateRecordError(f"A Term with the name {self.name} already exists")
+        elif not self.is_new() and self.is_key_changed('name'):
+            raise ValidationError({"name": "This field cannot be changed."})


Raising django.core.exceptions.ValidationError here will bypass the project’s catch_error_as_json handling (which only catches InputError) and can turn a user-caused validation issue into a 500. Prefer raising an InputError/DuplicateRecordError (or another sefaria.system.exceptions type) for consistency with the rest of the Mongo model layer.

Suggested change

raise ValidationError({"name": "This field cannot be changed."})

raise InputError({"name": "This field cannot be changed."})

Copilot · 2026-02-08T16:58:22Z

sefaria/model/schema.py

+    def load_by_primary_titles(self, en_title, he_title):
+        query = {
+            'titles': {
+                '$all': [{'$elemMatch': {
+                    'text': t, 'primary': True
+                }} for t in [en_title, he_title]]
+            }
+        }
        return self.load(query=query)

+    def _update_tracked_primary_titles(self):
+        self._primary_en = self.get_primary_title("en")
+        self._primary_he = self.get_primary_title("he")
+
+    def _set_pkeys(self):
+        self.set_titles(getattr(self, "titles", None))
+        self._update_tracked_primary_titles()
+        super()._set_pkeys()
+
    def _set_derived_attributes(self):
        self.set_titles(getattr(self, "titles", None))

    def set_titles(self, titles):
        self.title_group = TitleGroup(titles)

+    def _set_name(self):
+        name = base_name = self.get_primary_title()
+        terms = TermSet({'name': {'$regex': f'^{re.escape(name)}\d*$'}})
+        existing_names = {t.name for t in terms}
+        i = 1
+        while name in existing_names:
+            name = base_name + i
+            i += 1
+        self.name = name
+
    def _normalize(self):
        self.titles = self.title_group.titles
+        self._update_tracked_primary_titles()
+        if not hasattr(self, 'name'):
+            self._set_name()

    def _validate(self):
        super(Term, self)._validate()
-        # do not allow duplicates:
-        for title in self.get_titles():
-            other_term = Term().load_by_title(title)
-            if other_term and not self.same_record(other_term):
-                raise InputError("A Term with the title {} in it already exists".format(title))
+        # ensue uniqueness of primary titles together
+        same_titles_term = Term().load_by_primary_titles(self.get_primary_title(), self.get_primary_title('he'))
+        if same_titles_term and not self.same_record(same_titles_term):
+            raise DuplicateRecordError(f"A Term with the primary titles {self.get_primary_title()} and {self.get_primary_title('he')} already exists")
+        # do not allow duplicate names
+        if self.is_new() and Term().load({'name': self.name}):
+            raise DuplicateRecordError(f"A Term with the name {self.name} already exists")
+        elif not self.is_new() and self.is_key_changed('name'):
+            raise ValidationError({"name": "This field cannot be changed."})


New behavior around primary-title pair uniqueness and automatic name generation (including collision handling) doesn’t appear to have direct test coverage. Adding tests for load_by_primary_titles() (including language-specific matching) and for _set_name() when a collision exists would help ensure future refactors don’t break term creation/validation.

…les together.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…or-term-to-be-non-unique

YishaiGlasner added 18 commits February 4, 2026 15:22

chore: remove load_by_title from terms_api, as it should be based on …

782815a

…name.

chore: change term load in hok_leyisrael to rely on term name.

f2ef5e5

chore: change term normalize to rely on name (checked the only occurr…

467ac49

…ence).

chore: remove unused check_term function (was used in the category ed…

2213e8a

…itor before being replaced with different logic).

fix: check if 'path' is a key in j BEFORE using it.

eceed09

feat(Term):

b033a07

1. add `load_by_primary_title`. 2. remove validations regarding titles. rather add validation for new term's name doesn't exist.

refactor(category api):

98ca067

load terms by primary titles, which ensure the category will get the required titles. remove term's name setting, leaving it to be determined by the term saving.

chore(TitledTreeNode):

ef5804b

remove dead code (if False) and commented-out code.

refactor(Term):

6aeba1c

add validation name is not changed

refactor(terms api): change term into name.

c4dc101

refactor(terms api): remove checking if name exists as it's guarantee…

275f71d

…d by url.

refactor(terms api): remove add/terms endpoint - it's unreachable (ca…

aa182b4

…ught by only `add`), and have the same view as edit/terms.

refactor(Term): remove load_by_title since now more than one term c…

4ad5dcf

…an have the same title.

refactor(Term): refactor load_by_primary_title to `load_by_primary_…

aea84ca

…titles` since different terms can have one equal primary title.

chore(Term): newlines and indentation.

df3eb12

refactor(Term): add validation ensuing uniqueness of primary titles t…

3198f75

…ogether.

chore(text.py): remove unused imports

abe7562

feat(Term): dependencies for term changing.

28d87ea

YishaiGlasner requested review from Copilot, nsantacruz, stevekaplan123 and yitzhakc February 8, 2026 12:07

Copilot started reviewing on behalf of YishaiGlasner February 8, 2026 12:08 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

YishaiGlasner and others added 3 commits February 8, 2026 14:23

fix: import form settings

c275e51

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix(term api): add empty list titles to data when term doesnt exist.

c5b045a

fix(term tests): remove test for name doesn't equal primary en.

ff5dec9

YishaiGlasner added 2 commits February 8, 2026 16:32

refactor(term): change setting name logic not to count docs.

4a62c22

refactor(term): validate terms with same titles doesn't exist also fo…

ecdae5c

…r old term.

refactor(term): track primary titles as scalars instead of mutable li…

7f58384

…st in pkeys. The titles list was stored by reference in pkeys_orig_values, so in-place mutations bypassed dependency tracking. Replace with `_primary_en`/`_primary_he` string attrs that reliably detect changes.

YishaiGlasner requested a review from Copilot February 8, 2026 16:52

Copilot started reviewing on behalf of YishaiGlasner February 8, 2026 16:53 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

YishaiGlasner and others added 8 commits February 8, 2026 19:02

refactor(term): use InputError rather than ValidationError.

d375f94

test(term): fix tests to allow dupliacte title but not of primary tit…

b183ad6

…les together.

test(mongo indexes): remove unique from term's titles index.

3692fe5

fix(term): use f string rather than plus to concaterate int to str

4c4f352

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: update gunicorn version to 25.0.3

51ca59a

chore: downgrade gunicorn version to 23.0.0

0e9b37b

chore: downgrade gunicorn version to 23.0.0

cfd4bab

Merge branch 'fix-setup-tools-missing-bug' into chore/sc-41467/refact…

b111d42

…or-term-to-be-non-unique

	if getattr(node, "sharedTitle", None) == old_primary_en:
	if getattr(node, "sharedTitle", None) == term.name:

-            name = self.get_primary_title()
-            dupes = len(TermSet({'name': {'$regex': fr'^{name}\d*$'}}))
-            if dupes:
-                name = f'{name}{dupes}'
+            base_name = self.get_primary_title()
+            name = base_name
+            # Ensure the autogenerated name is unique by finding the next available numeric suffix.
+            if Term().load({'name': base_name}):
+                suffix = 2
+                while True:
+                    candidate = f"{base_name}{suffix}"
+                    if not Term().load({'name': candidate}):
+                        name = candidate
+                        break
+                    suffix += 1

-    data = existing_term.contents() if existing_term else {}
+    if existing_term:
+        data = existing_term.contents()
+    else:
+        # Initialize minimal structure for new terms to support the editor UI.
+        data = {"name": name, "titles": []}

	raise ValidationError({"name": "This field cannot be changed."})
	raise InputError("Field 'name' cannot be changed.")

	raise ValidationError({"name": "This field cannot be changed."})
	raise InputError({"name": "This field cannot be changed."})

Uh oh!

Refactor term to be non unique and add dependencies #3079

Are you sure you want to change the base?

Refactor term to be non unique and add dependencies #3079

Conversation

YishaiGlasner commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Code Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

YishaiGlasner Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

❌ Job Failures

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YishaiGlasner commented Feb 8, 2026 •

edited

Loading

mergify bot commented Feb 8, 2026 •

edited

Loading