Undocumented query syntax limits and parsing failures in govinfo.gov search

Hi team,

While testing advanced Boolean search behaviour, I encountered a series of consistent parser failures that don’t appear to be documented. I’d like to confirm whether these are known or intended constraints, or if they represent bugs in the current implementation.

---

### Key Findings

#### 1. Proximity + OR-lists with multi-word phrases

Queries using `NEAR/#` fail when *either* side includes an OR-list with **multi-word items**, even when written as ADJ chains.

Examples:

```plaintext
(("Direct Action Everywhere" OR PETA) NEAR/20 milk)          → returns 0 results
(((Direct ADJ Action ADJ Everywhere) OR PETA) NEAR/20 milk)  → also fails
(PETA) NEAR/20 (milk OR dairy)                               → works
```

#### 2. Quoted tokens break proximity

Any quoted token on either side of `NEAR/#` breaks the query—even if it's a single word.

```plaintext
"cow" NEAR/20 abuse   → fails
cow NEAR/20 abuse     → works
```

#### 3. OR-list size limits (based on quoting)

* Quoted tokens: fails when the list exceeds 30 items
* Unquoted tokens: works up to around 70 items

```plaintext
("term1" OR "term2" ... "term31")   → fails
(term1 OR term2 ... term50)         → works
```

#### 4. Query complexity limit appears to be token-based

There is a hard cap on the number of single-word tokens a query can include, regardless of character count. Using a repeated structure like:

```plaintext
((cow OR calf) NEAR/20 (welfare OR abuse)) OR ...
```

* Works with **69** tokens (≈ 923 characters)
* Fails at **70** tokens
* Using `ADJ` (e.g. `cow ADJ calf`) lowers the cap to **66** tokens

This suggests a token or operator count limit rather than a character limit.

#### 5. Wildcards inside quotes fail

```plaintext
"cow*"                 → fails
"animal welf*"         → fails
cow*                   → works
```

---

### Request

Can you confirm whether these are known limitations of the current govinfo.gov search parser? If they are intended, clarification around the following would be appreciated:

* Allowed structure and token count for proximity clauses
* Safe limits for OR-lists (quoted vs unquoted)
* Whether multi-token phrases can be safely used near `NEAR/#` or `ADJ`
* Whether wildcard handling inside quotes is expected to fail

Happy to share a full reproducible test suite if that’s useful.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undocumented query syntax limits and parsing failures in govinfo.gov search #181

Key Findings

1. Proximity + OR-lists with multi-word phrases

2. Quoted tokens break proximity

3. OR-list size limits (based on quoting)

4. Query complexity limit appears to be token-based

5. Wildcards inside quotes fail

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Undocumented query syntax limits and parsing failures in govinfo.gov search #181

Description

Key Findings

1. Proximity + OR-lists with multi-word phrases

2. Quoted tokens break proximity

3. OR-list size limits (based on quoting)

4. Query complexity limit appears to be token-based

5. Wildcards inside quotes fail

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions