fix(citation): Ensure full_span is aligned for parallel citations, fix full_span_end#288
fix(citation): Ensure full_span is aligned for parallel citations, fix full_span_end#288branliu0 wants to merge 5 commits intofreelawproject:mainfrom
Conversation
|
Okay I found the root cause issue. In POST_FULL_CITATION_REGEX, the "court" group is matching a lot of text, perhaps because it's trying to match to a court + month/year regex before trying to match to just a date. I fixed this by ensuring that the "court" group can't include a closing parenthesis. Another option might be to limit the length of the court group to something like 20 or 30 characters. Here's the part of the regex in question: |
|
I recently edited this regex (#243) and I think introduced this error. I think excluding closing parentheses from the court group makes sense -- indeed, it looks like something like that was previously there that I foolishly removed, probably because I didn't understand what it was doing (https://github.com/freelawproject/eyecite/pull/243/files#diff-cfcb2df6a1c6cb15160f5093212c9443c150c91c6c4062c0a4ca162553b2e2d4L298). So I would just suggest adding a brief comment to the court group line noting that closing parentheses are being intentionally excluded. Thanks! |
|
Great! done! |
Luis-manzur
left a comment
There was a problem hiding this comment.
Hi! sorry for the wait
Please update the branch, solve merge problems and let's discuss the requested changes.
| (?: | ||
| (?: | ||
| (?P<court>.*?) # treat anything before date as court | ||
| (?P<court>[^)\]]*?) # treat anything before date as court |
There was a problem hiding this comment.
Using only parentheses () is sufficient. I think square brackets [] are not needed in any current scenario.
It seems that the full_span_end can sometimes differ for a parallel citation due to the way POST_FULL_CITATION_REGEX is defined. Under certain conditions, it can end up matching to the next citation as opposed to the end of the current citation. However, we can trust that the post citation matching worked correctly for the first of the parallel citations.
The example I came across is "Kaiser Steel Corp. v. W.S. Ranch Co., 391 U.S. 593, 598, 88 S. Ct. 1753, 20 L.Ed.2d 835 (1968). We have previously held that the automatic stay provisions of the Bankruptcy Code may toll the statute of limitations under the Warsaw Convention, which is the precursor to the Montreal Convention. See Zicherman v. Korean Air Lines Co., Ltd., 516 F.3d 1237, 1254 (11th Cir. 2008)"
The third of the three parallel citations has a full_span_end that goes all the way to the end of the text. I think it's because POST_FULL_CITATATION_REGEX ends up matching with the citation at the end of the text.