Skip to content

Conversation

@manikyarathore
Copy link
Contributor

This change fixes an issue where setting preferenceDepthHops = 0 (intended to enable depth first crawling) caused Heritrix to reject seed URLs immediately with fetch status -50.

The problem was in TooManyHopsDecideRule, where a negative maxHops value (used internally for DFS) caused all URIs including seeds with hop count 0 to be evaluated as exceeding the hop limit. As a result, no crawling occurred.

The fix updates the hop limit evaluation logic to:
Treat negative maxHops values as “no hop limit”
Always allow seed URLs (hopCount == 0) to pass the rule
This restores the documented DFS behavior while preserving existing behavior for normal breadth first crawls.

@ato
Copy link
Collaborator

ato commented Dec 21, 2025

I tested this change but still see status -50. Here's what I tried:

  1. New test job with default config.
  2. Set operator contact URL.
  3. Add seed: http://example.com/
  4. Set preferenceDepthHops = 0
  5. Launch and unpause.

The seed immediately failed with status -50:

2025-12-21T00:56:28.083Z   -50          - http://example.com/ - - unknown #007 - - - 30t

After that Heritrix kept issuing DNS requests without making progress:

2025-12-21T00:56:28.642Z     1         89 dns:example.com P http://example.com/ text/dns #007 20251221005628084+38 sha1:AMSS4LNS6TWYXNFYW4R7XGKJXKYXKOJY - -
2025-12-21T00:56:31.702Z     1         89 dns:example.com P http://example.com/ text/dns #007 20251221005631701+0 sha1:TTE4YPYC72NGJLRXP7QQE3TBXL2PSP4N - -
2025-12-21T00:56:34.754Z     1         89 dns:example.com P http://example.com/ text/dns #007 20251221005634753+0 sha1:OX44Z2HCQXQSC23I5QCFKB53ZMNEQV3E - -
2025-12-21T00:56:37.792Z     1         89 dns:example.com P http://example.com/ text/dns #007 20251221005637789+1 sha1:7KGPM4NIAU7KD5PFX3ZCS6DY75LX3QXR - -
2025-12-21T00:56:40.852Z     1         89 dns:example.com P http://example.com/ text/dns #007 20251221005640851+0 sha1:47DAR5REXOEHH4AXGK3EEAGGME5UK5Y2 - -

I also set a breakpoint in TooManyHopsDecideRule.evaluate() and it was never hit, which makes me think this rule is probably not involved and the root cause of #181 is likely elsewhere.

@ato ato closed this Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants