-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Hi team,
While testing advanced Boolean search behaviour, I encountered a series of consistent parser failures that don’t appear to be documented. I’d like to confirm whether these are known or intended constraints, or if they represent bugs in the current implementation.
Key Findings
1. Proximity + OR-lists with multi-word phrases
Queries using NEAR/# fail when either side includes an OR-list with multi-word items, even when written as ADJ chains.
Examples:
(("Direct Action Everywhere" OR PETA) NEAR/20 milk) → returns 0 results
(((Direct ADJ Action ADJ Everywhere) OR PETA) NEAR/20 milk) → also fails
(PETA) NEAR/20 (milk OR dairy) → works
2. Quoted tokens break proximity
Any quoted token on either side of NEAR/# breaks the query—even if it's a single word.
"cow" NEAR/20 abuse → fails
cow NEAR/20 abuse → works
3. OR-list size limits (based on quoting)
- Quoted tokens: fails when the list exceeds 30 items
- Unquoted tokens: works up to around 70 items
("term1" OR "term2" ... "term31") → fails
(term1 OR term2 ... term50) → works
4. Query complexity limit appears to be token-based
There is a hard cap on the number of single-word tokens a query can include, regardless of character count. Using a repeated structure like:
((cow OR calf) NEAR/20 (welfare OR abuse)) OR ...
- Works with 69 tokens (≈ 923 characters)
- Fails at 70 tokens
- Using
ADJ(e.g.cow ADJ calf) lowers the cap to 66 tokens
This suggests a token or operator count limit rather than a character limit.
5. Wildcards inside quotes fail
"cow*" → fails
"animal welf*" → fails
cow* → works
Request
Can you confirm whether these are known limitations of the current govinfo.gov search parser? If they are intended, clarification around the following would be appreciated:
- Allowed structure and token count for proximity clauses
- Safe limits for OR-lists (quoted vs unquoted)
- Whether multi-token phrases can be safely used near
NEAR/#orADJ - Whether wildcard handling inside quotes is expected to fail
Happy to share a full reproducible test suite if that’s useful.
Thanks!