-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: HTML parser panic protection with multiple fallback #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tegies - Add ultra-aggressive HTML sanitization to reduce nesting depth - Implement size limiting (1MB) to prevent processing huge documents - Add plain text extraction fallback for complex HTML structures - Enhance panic recovery with comprehensive error handling - Remove deeply nestable elements (div, span, ul, ol, li) from sanitizer - Add comprehensive test coverage for edge cases Resolves HTML parser panic: 'html: open stack of elements exceeds 512 nodes' that occurred after switching to html-to-markdown/v2 library in PR #2255
WalkthroughThis change modifies HTML-to-text conversion logic in the page type classifier by introducing an ultra-aggressive sanitization policy that permits only basic text and formatting elements, enforcing a 1MB input size limit, implementing fallback mechanisms for conversion failures, and adding a helper function for plain-text extraction from sanitized HTML with corresponding test coverage updates. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
common/pagetypeclassifier/pagetypeclassifier.go (3)
44-55: Sanitizer policy change aligns with goal; consider documenting text-retention behaviorThe ultra-aggressive policy (only headings,
p,br, and basic formatting tags) is consistent with minimizing nesting and keeping the parser safe. Sincebluemonday.NewPolicydrops disallowed elements but keeps their inner text, you still preserve most page content for classification, which is what you want here.It may be worth adding a short note in the comment that disallowed elements’ content is retained (not discarded), to reassure future maintainers that narrowing the tag set does not wipe out text signal for the classifier.
60-102: htmlToText fallbacks are good; consider enhancing panic recovery and rune-safetyThe multi-step strategy (1MB cap → sanitize → html-to-markdown → plain-text fallback) looks solid and should prevent the previous parser panics while still extracting useful text.
Two possible improvements:
- Use extractPlainText in the panic path as well.
In thedeferrecovery you currently return an error and empty text, which forcesClassifyto fall back to"other". If you want “multiple fallbacks” to apply even on parser panics, you could salvage text viaextractPlainTextinstead of returning empty:func htmlToText(html string) (text string, err error) { defer func() { if r := recover(); r != nil { - err = fmt.Errorf("html parser panic: %v", r) - text = "" + // Recover from parser panic and fall back to plain text extraction. + text = extractPlainText(html) + err = nil } }()
- Rune-safe truncation (optional).
html = html[:maxHTMLSize]truncates by bytes, which can split multi-byte runes in UTF‑8 content. If you expect a lot of non-ASCII text and want to avoid malformed characters, consider truncating by runes (e.g., via[]rune(html)or a streaming limiter) at some cost to performance.Neither is a blocker, but they would make the behavior more robust and more in line with the documented “fallback” intent.
104-155: Plain-text fallback works but has sharp edges and could be more efficient
extractPlainTextdoes the job for a last-resort fallback, but a few details are worth noting:
- The comment says “regex-based” but the implementation now uses index/search loops only; updating the comment would avoid confusion.
- In both script/style loops, if a closing tag is missing, you drop everything from the opening tag to the end of the document (
text = text[:start]). That’s acceptable for a best-effort fallback, but it can erase a lot of legitimate text on malformed HTML; a safer option is to just break and keep the remainder.- The tag-stripping loop builds
resultby repeatedly concatenating strings in aforover runes, which is O(n²) on large inputs. Using astrings.Builder(orbytes.Buffer) would keep it linear and cheaper on the worst‑case 1MB inputs.- Script/style removal and tag matching are strictly lowercase (
"<script","<style","</script>","</style>"); if you care about robustness, you might want case-insensitive handling.All of these are non-critical since this is a fallback path, but tightening them up would make the behavior more predictable and scalable.
common/pagetypeclassifier/pagetypeclassifier_test.go (2)
60-82: Deeply nested HTML tests are helpful; tighten assertions to validate behaviorThe new subtests around deeply nested HTML are valuable and directly exercise the panic-protection and fallback logic.
A couple of tweaks could make them more meaningful:
- In
test resilience with deeply nested HTML,require.NotEmptyandrequire.NotEqual(t, "", result)are redundant, and sinceClassifynever returns an empty string, this test will always pass regardless of whether real text was extracted or if it just fell back to"other". Consider asserting on something stronger (e.g.,require.NotEqual(t, "other", result)or checking viahtmlToTextas you do below) if you want to ensure the classifier is using extracted content.- In
test htmlToText with deeply nested HTML, therequire.Contains(t, result, "Some text content")is exactly the kind of assertion that proves your sanitization/fallback pipeline now returns usable text; this is good.No blockers here; these are just suggestions to better lock in the intended behavior.
Also applies to: 84-101
121-142: extractPlainText fallback test gives good coverage; consider one more positive assertionThis test does a nice job validating that
extractPlainText:
- Produces non-empty output.
- Retains visible content such as
"Title","important", and"content".- Excludes script/style content like
"alert"and"color: red".As a small enhancement, you might also assert that “Nested content” (inside
<div><span>…</span></div>) survives, to confirm nested non-allowed container tags are handled as expected by the fallback too.Otherwise, this test accurately captures the intended behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
common/pagetypeclassifier/pagetypeclassifier.go(3 hunks)common/pagetypeclassifier/pagetypeclassifier_test.go(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
common/pagetypeclassifier/pagetypeclassifier_test.go (2)
common/pagetypeclassifier/pagetypeclassifier.go (1)
New(21-27)runner/runner.go (1)
New(117-406)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Functional Test (macOS-latest)
- GitHub Check: Functional Test (windows-latest)
- GitHub Check: Functional Test (ubuntu-latest)
- GitHub Check: Analyze (go)
- GitHub Check: release-test
🔇 Additional comments (1)
common/pagetypeclassifier/pagetypeclassifier_test.go (1)
3-5: Large HTML test and strings usage look appropriateImporting
stringsand usingstrings.Repeatto generate a >1MB HTML document is a clean way to exercise the new size-limiting behavior inhtmlToText. The test asserts that:
- No error is returned.
- The result is non-empty.
This is a sensible regression test for the previous panic issue on very large inputs. If you ever want to be stricter, you could additionally assert that processing time stays reasonable via a benchmark, but that’s not necessary for unit tests.
Also applies to: 110-119
* Improve error handling in htmlToText function
Enhance htmlToText function to handle panics and errors safely.
panic: html: open stack of elements exceeds 512 nodes
goroutine 5523922 [running]:
github.com/projectdiscovery/httpx/common/pagetypeclassifier.htmlToText(...)
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:36
github.com/projectdiscovery/httpx/common/pagetypeclassifier.(*PageTypeClassifier).Classify(0xc0005164d8, {0xc0ba03a000?, 0xd?})
/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:26 +0x6f
github.com/projectdiscovery/httpx/runner.(*Runner).analyze(_, _, {_, _}, {{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, ...)
/home/runner/work/httpx/httpx/runner/runner.go:2349 +0x7555
github.com/projectdiscovery/httpx/runner.(*Runner).process.func1({{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, {0x1686161?, 0x10?}, {0x16ace2d, 0xa})
/home/runner/work/httpx/httpx/runner/runner.go:1444 +0x125
created by github.com/projectdiscovery/httpx/runner.(*Runner).process in goroutine 1
/home/runner/work/httpx/httpx/runner/runner.go:1442 +0x8a6
* chore(deps): bump golang.org/x/text from 0.30.0 to 0.31.0
Bumps [golang.org/x/text](https://github.com/golang/text) from 0.30.0 to 0.31.0.
- [Release notes](https://github.com/golang/text/releases)
- [Commits](golang/text@v0.30.0...v0.31.0)
---
updated-dependencies:
- dependency-name: golang.org/x/text
dependency-version: 0.31.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): bump golang.org/x/net from 0.46.0 to 0.47.0
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.46.0 to 0.47.0.
- [Commits](golang/net@v0.46.0...v0.47.0)
---
updated-dependencies:
- dependency-name: golang.org/x/net
dependency-version: 0.47.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): bump github.com/PuerkitoBio/goquery from 1.10.3 to 1.11.0
Bumps [github.com/PuerkitoBio/goquery](https://github.com/PuerkitoBio/goquery) from 1.10.3 to 1.11.0.
- [Release notes](https://github.com/PuerkitoBio/goquery/releases)
- [Commits](PuerkitoBio/goquery@v1.10.3...v1.11.0)
---
updated-dependencies:
- dependency-name: github.com/PuerkitoBio/goquery
dependency-version: 1.11.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): bump the modules group with 5 updates
Bumps the modules group with 5 updates:
| Package | From | To |
| --- | --- | --- |
| [github.com/projectdiscovery/cdncheck](https://github.com/projectdiscovery/cdncheck) | `1.2.9` | `1.2.10` |
| [github.com/projectdiscovery/gologger](https://github.com/projectdiscovery/gologger) | `1.1.59` | `1.1.60` |
| [github.com/projectdiscovery/networkpolicy](https://github.com/projectdiscovery/networkpolicy) | `0.1.27` | `0.1.28` |
| [github.com/projectdiscovery/utils](https://github.com/projectdiscovery/utils) | `0.6.1-0.20251030144701-ce5c4b44e1e6` | `0.6.1` |
| [github.com/projectdiscovery/wappalyzergo](https://github.com/projectdiscovery/wappalyzergo) | `0.2.54` | `0.2.55` |
Updates `github.com/projectdiscovery/cdncheck` from 1.2.9 to 1.2.10
- [Release notes](https://github.com/projectdiscovery/cdncheck/releases)
- [Changelog](https://github.com/projectdiscovery/cdncheck/blob/main/.goreleaser.yaml)
- [Commits](projectdiscovery/cdncheck@v1.2.9...v1.2.10)
Updates `github.com/projectdiscovery/gologger` from 1.1.59 to 1.1.60
- [Release notes](https://github.com/projectdiscovery/gologger/releases)
- [Commits](projectdiscovery/gologger@v1.1.59...v1.1.60)
Updates `github.com/projectdiscovery/networkpolicy` from 0.1.27 to 0.1.28
- [Release notes](https://github.com/projectdiscovery/networkpolicy/releases)
- [Commits](projectdiscovery/networkpolicy@v0.1.27...v0.1.28)
Updates `github.com/projectdiscovery/utils` from 0.6.1-0.20251030144701-ce5c4b44e1e6 to 0.6.1
- [Release notes](https://github.com/projectdiscovery/utils/releases)
- [Changelog](https://github.com/projectdiscovery/utils/blob/main/CHANGELOG.md)
- [Commits](https://github.com/projectdiscovery/utils/commits/v0.6.1)
Updates `github.com/projectdiscovery/wappalyzergo` from 0.2.54 to 0.2.55
- [Release notes](https://github.com/projectdiscovery/wappalyzergo/releases)
- [Commits](projectdiscovery/wappalyzergo@v0.2.54...v0.2.55)
---
updated-dependencies:
- dependency-name: github.com/projectdiscovery/cdncheck
dependency-version: 1.2.10
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/gologger
dependency-version: 1.1.60
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/networkpolicy
dependency-version: 0.1.28
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/utils
dependency-version: 0.6.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/wappalyzergo
dependency-version: 0.2.55
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
...
Signed-off-by: dependabot[bot] <support@github.com>
* better error handling
* chore(deps): bump golang.org/x/crypto from 0.44.0 to 0.45.0
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.44.0 to 0.45.0.
- [Commits](golang/crypto@v0.44.0...v0.45.0)
---
updated-dependencies:
- dependency-name: golang.org/x/crypto
dependency-version: 0.45.0
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
* adding panic guard + tests
* lint
* chore(deps): bump github.com/weppos/publicsuffix-go
Bumps [github.com/weppos/publicsuffix-go](https://github.com/weppos/publicsuffix-go) from 0.50.0 to 0.50.1.
- [Changelog](https://github.com/weppos/publicsuffix-go/blob/main/CHANGELOG.md)
- [Commits](weppos/publicsuffix-go@v0.50.0...v0.50.1)
---
updated-dependencies:
- dependency-name: github.com/weppos/publicsuffix-go
dependency-version: 0.50.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): bump the modules group with 11 updates
Bumps the modules group with 11 updates:
| Package | From | To |
| --- | --- | --- |
| [github.com/projectdiscovery/cdncheck](https://github.com/projectdiscovery/cdncheck) | `1.2.10` | `1.2.11` |
| [github.com/projectdiscovery/dsl](https://github.com/projectdiscovery/dsl) | `0.8.4` | `0.8.5` |
| [github.com/projectdiscovery/fastdialer](https://github.com/projectdiscovery/fastdialer) | `0.4.15` | `0.4.17` |
| [github.com/projectdiscovery/gologger](https://github.com/projectdiscovery/gologger) | `1.1.60` | `1.1.61` |
| [github.com/projectdiscovery/hmap](https://github.com/projectdiscovery/hmap) | `0.0.95` | `0.0.96` |
| [github.com/projectdiscovery/networkpolicy](https://github.com/projectdiscovery/networkpolicy) | `0.1.28` | `0.1.29` |
| [github.com/projectdiscovery/retryablehttp-go](https://github.com/projectdiscovery/retryablehttp-go) | `1.0.131` | `1.0.132` |
| [github.com/projectdiscovery/tlsx](https://github.com/projectdiscovery/tlsx) | `1.2.1` | `1.2.2` |
| [github.com/projectdiscovery/useragent](https://github.com/projectdiscovery/useragent) | `0.0.102` | `0.0.103` |
| [github.com/projectdiscovery/utils](https://github.com/projectdiscovery/utils) | `0.6.1` | `0.7.1` |
| [github.com/projectdiscovery/wappalyzergo](https://github.com/projectdiscovery/wappalyzergo) | `0.2.55` | `0.2.56` |
Updates `github.com/projectdiscovery/cdncheck` from 1.2.10 to 1.2.11
- [Release notes](https://github.com/projectdiscovery/cdncheck/releases)
- [Changelog](https://github.com/projectdiscovery/cdncheck/blob/main/.goreleaser.yaml)
- [Commits](projectdiscovery/cdncheck@v1.2.10...v1.2.11)
Updates `github.com/projectdiscovery/dsl` from 0.8.4 to 0.8.5
- [Release notes](https://github.com/projectdiscovery/dsl/releases)
- [Commits](projectdiscovery/dsl@v0.8.4...v0.8.5)
Updates `github.com/projectdiscovery/fastdialer` from 0.4.15 to 0.4.17
- [Release notes](https://github.com/projectdiscovery/fastdialer/releases)
- [Commits](projectdiscovery/fastdialer@v0.4.15...v0.4.17)
Updates `github.com/projectdiscovery/gologger` from 1.1.60 to 1.1.61
- [Release notes](https://github.com/projectdiscovery/gologger/releases)
- [Commits](projectdiscovery/gologger@v1.1.60...v1.1.61)
Updates `github.com/projectdiscovery/hmap` from 0.0.95 to 0.0.96
- [Release notes](https://github.com/projectdiscovery/hmap/releases)
- [Commits](projectdiscovery/hmap@v0.0.95...v0.0.96)
Updates `github.com/projectdiscovery/networkpolicy` from 0.1.28 to 0.1.29
- [Release notes](https://github.com/projectdiscovery/networkpolicy/releases)
- [Commits](projectdiscovery/networkpolicy@v0.1.28...v0.1.29)
Updates `github.com/projectdiscovery/retryablehttp-go` from 1.0.131 to 1.0.132
- [Release notes](https://github.com/projectdiscovery/retryablehttp-go/releases)
- [Commits](projectdiscovery/retryablehttp-go@v1.0.131...v1.0.132)
Updates `github.com/projectdiscovery/tlsx` from 1.2.1 to 1.2.2
- [Release notes](https://github.com/projectdiscovery/tlsx/releases)
- [Changelog](https://github.com/projectdiscovery/tlsx/blob/main/.goreleaser.yml)
- [Commits](projectdiscovery/tlsx@v1.2.1...v1.2.2)
Updates `github.com/projectdiscovery/useragent` from 0.0.102 to 0.0.103
- [Release notes](https://github.com/projectdiscovery/useragent/releases)
- [Commits](projectdiscovery/useragent@v0.0.102...v0.0.103)
Updates `github.com/projectdiscovery/utils` from 0.6.1 to 0.7.1
- [Release notes](https://github.com/projectdiscovery/utils/releases)
- [Changelog](https://github.com/projectdiscovery/utils/blob/main/CHANGELOG.md)
- [Commits](projectdiscovery/utils@v0.6.1...v0.7.1)
Updates `github.com/projectdiscovery/wappalyzergo` from 0.2.55 to 0.2.56
- [Release notes](https://github.com/projectdiscovery/wappalyzergo/releases)
- [Commits](projectdiscovery/wappalyzergo@v0.2.55...v0.2.56)
---
updated-dependencies:
- dependency-name: github.com/projectdiscovery/cdncheck
dependency-version: 1.2.11
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/dsl
dependency-version: 0.8.5
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/fastdialer
dependency-version: 0.4.17
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/gologger
dependency-version: 1.1.61
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/hmap
dependency-version: 0.0.96
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/networkpolicy
dependency-version: 0.1.29
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/retryablehttp-go
dependency-version: 1.0.132
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/tlsx
dependency-version: 1.2.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/useragent
dependency-version: 0.0.103
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/utils
dependency-version: 0.7.1
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: modules
- dependency-name: github.com/projectdiscovery/wappalyzergo
dependency-version: 0.2.56
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
...
Signed-off-by: dependabot[bot] <support@github.com>
* fix test
* chore(deps): bump github.com/JohannesKaufmann/html-to-markdown/v2
Bumps [github.com/JohannesKaufmann/html-to-markdown/v2](https://github.com/JohannesKaufmann/html-to-markdown) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/JohannesKaufmann/html-to-markdown/releases)
- [Commits](JohannesKaufmann/html-to-markdown@v2.4.0...v2.5.0)
---
updated-dependencies:
- dependency-name: github.com/JohannesKaufmann/html-to-markdown/v2
dependency-version: 2.5.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): bump the modules group with 10 updates
Bumps the modules group with 10 updates:
| Package | From | To |
| --- | --- | --- |
| [github.com/projectdiscovery/cdncheck](https://github.com/projectdiscovery/cdncheck) | `1.2.11` | `1.2.12` |
| [github.com/projectdiscovery/dsl](https://github.com/projectdiscovery/dsl) | `0.8.5` | `0.8.6` |
| [github.com/projectdiscovery/fastdialer](https://github.com/projectdiscovery/fastdialer) | `0.4.17` | `0.4.18` |
| [github.com/projectdiscovery/gologger](https://github.com/projectdiscovery/gologger) | `1.1.61` | `1.1.62` |
| [github.com/projectdiscovery/hmap](https://github.com/projectdiscovery/hmap) | `0.0.96` | `0.0.97` |
| [github.com/projectdiscovery/networkpolicy](https://github.com/projectdiscovery/networkpolicy) | `0.1.29` | `0.1.30` |
| [github.com/projectdiscovery/retryablehttp-go](https://github.com/projectdiscovery/retryablehttp-go) | `1.0.132` | `1.0.133` |
| [github.com/projectdiscovery/useragent](https://github.com/projectdiscovery/useragent) | `0.0.103` | `0.0.104` |
| [github.com/projectdiscovery/utils](https://github.com/projectdiscovery/utils) | `0.7.1` | `0.7.3` |
| [github.com/projectdiscovery/wappalyzergo](https://github.com/projectdiscovery/wappalyzergo) | `0.2.56` | `0.2.57` |
Updates `github.com/projectdiscovery/cdncheck` from 1.2.11 to 1.2.12
- [Release notes](https://github.com/projectdiscovery/cdncheck/releases)
- [Commits](projectdiscovery/cdncheck@v1.2.11...v1.2.12)
Updates `github.com/projectdiscovery/dsl` from 0.8.5 to 0.8.6
- [Release notes](https://github.com/projectdiscovery/dsl/releases)
- [Commits](projectdiscovery/dsl@v0.8.5...v0.8.6)
Updates `github.com/projectdiscovery/fastdialer` from 0.4.17 to 0.4.18
- [Release notes](https://github.com/projectdiscovery/fastdialer/releases)
- [Commits](projectdiscovery/fastdialer@v0.4.17...v0.4.18)
Updates `github.com/projectdiscovery/gologger` from 1.1.61 to 1.1.62
- [Release notes](https://github.com/projectdiscovery/gologger/releases)
- [Commits](projectdiscovery/gologger@v1.1.61...v1.1.62)
Updates `github.com/projectdiscovery/hmap` from 0.0.96 to 0.0.97
- [Release notes](https://github.com/projectdiscovery/hmap/releases)
- [Commits](projectdiscovery/hmap@v0.0.96...v0.0.97)
Updates `github.com/projectdiscovery/networkpolicy` from 0.1.29 to 0.1.30
- [Release notes](https://github.com/projectdiscovery/networkpolicy/releases)
- [Commits](projectdiscovery/networkpolicy@v0.1.29...v0.1.30)
Updates `github.com/projectdiscovery/retryablehttp-go` from 1.0.132 to 1.0.133
- [Release notes](https://github.com/projectdiscovery/retryablehttp-go/releases)
- [Commits](projectdiscovery/retryablehttp-go@v1.0.132...v1.0.133)
Updates `github.com/projectdiscovery/useragent` from 0.0.103 to 0.0.104
- [Release notes](https://github.com/projectdiscovery/useragent/releases)
- [Commits](projectdiscovery/useragent@v0.0.103...v0.0.104)
Updates `github.com/projectdiscovery/utils` from 0.7.1 to 0.7.3
- [Release notes](https://github.com/projectdiscovery/utils/releases)
- [Changelog](https://github.com/projectdiscovery/utils/blob/main/CHANGELOG.md)
- [Commits](projectdiscovery/utils@v0.7.1...v0.7.3)
Updates `github.com/projectdiscovery/wappalyzergo` from 0.2.56 to 0.2.57
- [Release notes](https://github.com/projectdiscovery/wappalyzergo/releases)
- [Commits](projectdiscovery/wappalyzergo@v0.2.56...v0.2.57)
---
updated-dependencies:
- dependency-name: github.com/projectdiscovery/cdncheck
dependency-version: 1.2.12
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/dsl
dependency-version: 0.8.6
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/fastdialer
dependency-version: 0.4.18
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/gologger
dependency-version: 1.1.62
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/hmap
dependency-version: 0.0.97
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/networkpolicy
dependency-version: 0.1.30
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/retryablehttp-go
dependency-version: 1.0.133
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/useragent
dependency-version: 0.0.104
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/utils
dependency-version: 0.7.3
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
- dependency-name: github.com/projectdiscovery/wappalyzergo
dependency-version: 0.2.57
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: modules
...
Signed-off-by: dependabot[bot] <support@github.com>
* feat: update `-ldp` option to show default ports in CLI output (#2331)
feat: update -ldp option to show default ports in CLI output
- Modified URL formatting in runner.go to respect LeaveDefaultPorts option
- Fixed AddURLDefaultPort function to actually add default ports (80/443)
- When -ldp is used, CLI output now shows https://example.com:443 instead of https://example.com
- Maintains backward compatibility - default behavior unchanged
Fixes CLI output inconsistency where -ldp flag only affected Host headers
but not the displayed URL in console output.
* fix: HTML parser panic protection with multiple fallback (#2330)
fix: enhance HTML parser panic protection with multiple fallback strategies
- Add ultra-aggressive HTML sanitization to reduce nesting depth
- Implement size limiting (1MB) to prevent processing huge documents
- Add plain text extraction fallback for complex HTML structures
- Enhance panic recovery with comprehensive error handling
- Remove deeply nestable elements (div, span, ul, ol, li) from sanitizer
- Add comprehensive test coverage for edge cases
Resolves HTML parser panic: 'html: open stack of elements exceeds 512 nodes'
that occurred after switching to html-to-markdown/v2 library in PR #2255
* fix: host JSON field now returns hostname instead of IP (#2333)
- Changed 'host' field to return actual hostname (e.g., example.com)
- Added new 'host_ip' field for the resolved/dialed IP address
- Fixes semantic issue where 'host' was incorrectly returning IP
* version update
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: @GDATTACKER <37478652+GDATTACKER-RESEARCHER@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mzack9999 <mzack9999@protonmail.com>
Resolves HTML parser panic: 'html: open stack of elements exceeds 512 nodes' that occurred after switching to html-to-markdown/v2 library in PR #2255
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.