-
-
Notifications
You must be signed in to change notification settings - Fork 3
Port NParallelText corpus
#247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This took me much longer than it should have. I'm sorry for the delay. Glad to get it done though! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #247 +/- ##
==========================================
+ Coverage 90.61% 90.77% +0.15%
==========================================
Files 347 352 +5
Lines 21981 22290 +309
==========================================
+ Hits 19919 20234 +315
+ Misses 2062 2056 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @Enkidu93)
machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):
self._corpora = corpora self._row_ref_comparer = row_ref_comparer if row_ref_comparer is not None else default_row_ref_comparer self.all_rows = [False for _ in range(len(corpora))]
The type should be Sequence[bool].
Also, I know that you are porting this as-is from the C#, but I don't like that this property is mutable.
machine/corpora/n_parallel_text_corpus.py line 94 at r1 (raw file):
@property def corpora(self) -> List[TextCorpus]:
The type should be Sequence[TextCorpus].
machine/corpora/n_parallel_text_corpus.py line 380 at r1 (raw file):
def default_row_ref_comparer(x: object, y: object) -> int:
I think the type for x and y should be Any.
machine/corpora/text_corpus.py line 31 at r1 (raw file):
@property @abstractmethod def versification(self) -> Versification: ...
Please change the type here to Optional[Versification], since it is now clear that it can be None.
machine/corpora/n_parallel_text_corpus_base.py line 20 at r1 (raw file):
@abstractmethod def get_rows(self, text_ids: List[str]) -> Sequence[NParallelTextRow]: ...
The type for text_ids should be Iterable[str].
machine/corpora/text_corpus_enumerator.py line 9 at r1 (raw file):
class _TextCorpusEnumerator(ContextManager["_TextCorpusEnumerator"], Generator[TextRow, None, None]):
This should be named TextCorpusEnumerator, since it is used outside of this module.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @ddaspit)
machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The type should be
Sequence[bool].Also, I know that you are porting this as-is from the C#, but I don't like that this property is mutable.
Done.
I've made it a tuple, but I'm not sure what solution you have in mind. These are just default values here anyways. I could allow all_rows to be passed through the constructor and then make this property 'private' entirely.
machine/corpora/n_parallel_text_corpus.py line 94 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The type should be
Sequence[TextCorpus].
Done.
machine/corpora/n_parallel_text_corpus.py line 380 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I think the type for
xandyshould beAny.
Done.
machine/corpora/n_parallel_text_corpus_base.py line 20 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The type for
text_idsshould beIterable[str].
Done.
machine/corpora/text_corpus.py line 31 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Please change the type here to
Optional[Versification], since it is now clear that it can beNone.
Done. And elsewhere.
machine/corpora/text_corpus_enumerator.py line 9 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This should be named
TextCorpusEnumerator, since it is used outside of this module.
Done.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Done.
I've made it a tuple, but I'm not sure what solution you have in mind. These are just default values here anyways. I could allow
all_rowsto be passed through the constructor and then make this property 'private' entirely.
What I don't like is that anything can just set all_rows to a new value, even a value with the wrong length. all_rows should be read-only, but boolean values inside can still be mutable. Does that make sense?
machine/corpora/n_parallel_text_row.py line 21 at r2 (raw file):
@property def ref(self) -> object:
You should use Any here. That is the type that ParallelTextRow and TextRow use for refs. I would just use Any as the type for refs everywhere to be consistent across the codebase.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 10 files reviewed, 2 unresolved discussions (waiting on @ddaspit)
machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
What I don't like is that anything can just set
all_rowsto a new value, even a value with the wrong length.all_rowsshould be read-only, but boolean values inside can still be mutable. Does that make sense?
OK, so you do want it to be mutable, just read-only. Makes sense. Done. Does this have the effect you're looking for? Is it OK to leave as-is in the tests or should I set them index-by-index there as well?
machine/corpora/n_parallel_text_row.py line 21 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You should use
Anyhere. That is the type thatParallelTextRowandTextRowuse for refs. I would just useAnyas the type for refs everywhere to be consistent across the codebase.
Done.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
OK, so you do want it to be mutable, just read-only. Makes sense. Done. Does this have the effect you're looking for? Is it OK to leave as-is in the tests or should I set them index-by-index there as well?
I would prefer that you don't set the underlying _all_rows field. I think you can set all of the values in one call, like this: all_rows[:] = [true, true].
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 10 files reviewed, 1 unresolved discussion (waiting on @ddaspit)
machine/corpora/n_parallel_text_corpus.py line 84 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I would prefer that you don't set the underlying
_all_rowsfield. I think you can set all of the values in one call, like this:all_rows[:] = [true, true].
Done.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 2 of 2 files at r4, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
Fixes #159
This change is