Skip to content

Conversation

@Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Dec 1, 2025

Fixes #159


This change is Reviewable

@Enkidu93 Enkidu93 requested a review from ddaspit December 1, 2025 20:38
@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Dec 1, 2025

This took me much longer than it should have. I'm sorry for the delay. Glad to get it done though!

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 97.37762% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.77%. Comparing base (68cb79d) to head (1c055b1).

Files with missing lines Patch % Lines
machine/corpora/n_parallel_text_row.py 77.14% 8 Missing ⚠️
machine/corpora/n_parallel_text_corpus.py 97.81% 6 Missing ⚠️
machine/corpora/standard_parallel_text_corpus.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #247      +/-   ##
==========================================
+ Coverage   90.61%   90.77%   +0.15%     
==========================================
  Files         347      352       +5     
  Lines       21981    22290     +309     
==========================================
+ Hits        19919    20234     +315     
+ Misses       2062     2056       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @Enkidu93)


machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):

        self._corpora = corpora
        self._row_ref_comparer = row_ref_comparer if row_ref_comparer is not None else default_row_ref_comparer
        self.all_rows = [False for _ in range(len(corpora))]

The type should be Sequence[bool].

Also, I know that you are porting this as-is from the C#, but I don't like that this property is mutable.


machine/corpora/n_parallel_text_corpus.py line 94 at r1 (raw file):

    @property
    def corpora(self) -> List[TextCorpus]:

The type should be Sequence[TextCorpus].


machine/corpora/n_parallel_text_corpus.py line 380 at r1 (raw file):

def default_row_ref_comparer(x: object, y: object) -> int:

I think the type for x and y should be Any.


machine/corpora/text_corpus.py line 31 at r1 (raw file):

    @property
    @abstractmethod
    def versification(self) -> Versification: ...

Please change the type here to Optional[Versification], since it is now clear that it can be None.


machine/corpora/n_parallel_text_corpus_base.py line 20 at r1 (raw file):

    @abstractmethod
    def get_rows(self, text_ids: List[str]) -> Sequence[NParallelTextRow]: ...

The type for text_ids should be Iterable[str].


machine/corpora/text_corpus_enumerator.py line 9 at r1 (raw file):

class _TextCorpusEnumerator(ContextManager["_TextCorpusEnumerator"], Generator[TextRow, None, None]):

This should be named TextCorpusEnumerator, since it is used outside of this module.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @ddaspit)


machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The type should be Sequence[bool].

Also, I know that you are porting this as-is from the C#, but I don't like that this property is mutable.

Done.

I've made it a tuple, but I'm not sure what solution you have in mind. These are just default values here anyways. I could allow all_rows to be passed through the constructor and then make this property 'private' entirely.


machine/corpora/n_parallel_text_corpus.py line 94 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The type should be Sequence[TextCorpus].

Done.


machine/corpora/n_parallel_text_corpus.py line 380 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I think the type for x and y should be Any.

Done.


machine/corpora/n_parallel_text_corpus_base.py line 20 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The type for text_ids should be Iterable[str].

Done.


machine/corpora/text_corpus.py line 31 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Please change the type here to Optional[Versification], since it is now clear that it can be None.

Done. And elsewhere.


machine/corpora/text_corpus_enumerator.py line 9 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This should be named TextCorpusEnumerator, since it is used outside of this module.

Done.

@Enkidu93 Enkidu93 requested a review from ddaspit December 3, 2025 18:25
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Done.

I've made it a tuple, but I'm not sure what solution you have in mind. These are just default values here anyways. I could allow all_rows to be passed through the constructor and then make this property 'private' entirely.

What I don't like is that anything can just set all_rows to a new value, even a value with the wrong length. all_rows should be read-only, but boolean values inside can still be mutable. Does that make sense?


machine/corpora/n_parallel_text_row.py line 21 at r2 (raw file):

    @property
    def ref(self) -> object:

You should use Any here. That is the type that ParallelTextRow and TextRow use for refs. I would just use Any as the type for refs everywhere to be consistent across the codebase.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 10 files reviewed, 2 unresolved discussions (waiting on @ddaspit)


machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

What I don't like is that anything can just set all_rows to a new value, even a value with the wrong length. all_rows should be read-only, but boolean values inside can still be mutable. Does that make sense?

OK, so you do want it to be mutable, just read-only. Makes sense. Done. Does this have the effect you're looking for? Is it OK to leave as-is in the tests or should I set them index-by-index there as well?


machine/corpora/n_parallel_text_row.py line 21 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should use Any here. That is the type that ParallelTextRow and TextRow use for refs. I would just use Any as the type for refs everywhere to be consistent across the codebase.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, so you do want it to be mutable, just read-only. Makes sense. Done. Does this have the effect you're looking for? Is it OK to leave as-is in the tests or should I set them index-by-index there as well?

I would prefer that you don't set the underlying _all_rows field. I think you can set all of the values in one call, like this: all_rows[:] = [true, true].

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 10 files reviewed, 1 unresolved discussion (waiting on @ddaspit)


machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would prefer that you don't set the underlying _all_rows field. I think you can set all of the values in one call, like this: all_rows[:] = [true, true].

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@Enkidu93 Enkidu93 merged commit 423f81d into main Dec 5, 2025
14 checks passed
@Enkidu93 Enkidu93 deleted the port_n_parallel_text_corpus branch December 5, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement NParallelTextCorpus

4 participants