Skip to content

Conversation

@ehsandeep
Copy link
Member

@ehsandeep ehsandeep commented Dec 6, 2025

  • Added aggressive HTML sanitization to reduce nesting depth
  • Implement size limiting (1MB) to prevent processing huge documents
  • Add plain text extraction fallback for complex HTML structures
  • Enhance panic recovery with comprehensive error handling
  • Remove deeply nestable elements (div, span, ul, ol, li) from sanitizer
  • Add comprehensive test coverage for edge cases

Resolves HTML parser panic: 'html: open stack of elements exceeds 512 nodes' that occurred after switching to html-to-markdown/v2 library in PR #2255

$ ./httpx  -title -tech-detect -status-code -location -content-length -cname -web-server -follow-redirects -websocket -u https://unpkg.com/three@0.150.0/build/three.min.js

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/

		projectdiscovery.io

[INF] Current httpx version v1.7.2 (latest)
[WRN] UI Dashboard is disabled, Use -dashboard option to enable
https://unpkg.com/three@0.150.0/build/three.min.js [200] [] [613667] [cloudflare] [Cloudflare,Fly.io,HSTS,HTTP/3]

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved HTML content processing with stricter sanitization policies, retaining only essential text and formatting elements
    • Added support for large HTML inputs with 1MB size limit
    • Enhanced fallback mechanisms for robust content extraction when HTML conversion encounters issues
  • Tests

    • Expanded coverage for large input handling and edge case scenarios

✏️ Tip: You can customize this high-level summary in your review settings.

…tegies

- Add ultra-aggressive HTML sanitization to reduce nesting depth
- Implement size limiting (1MB) to prevent processing huge documents
- Add plain text extraction fallback for complex HTML structures
- Enhance panic recovery with comprehensive error handling
- Remove deeply nestable elements (div, span, ul, ol, li) from sanitizer
- Add comprehensive test coverage for edge cases

Resolves HTML parser panic: 'html: open stack of elements exceeds 512 nodes'
that occurred after switching to html-to-markdown/v2 library in PR #2255
@auto-assign auto-assign bot requested a review from Mzack9999 December 6, 2025 13:57
@coderabbitai
Copy link

coderabbitai bot commented Dec 6, 2025

Walkthrough

This change modifies HTML-to-text conversion logic in the page type classifier by introducing an ultra-aggressive sanitization policy that permits only basic text and formatting elements, enforcing a 1MB input size limit, implementing fallback mechanisms for conversion failures, and adding a helper function for plain-text extraction from sanitized HTML with corresponding test coverage updates.

Changes

Cohort / File(s) Summary
HTML Sanitization & Text Extraction Logic
common/pagetypeclassifier/pagetypeclassifier.go
Updated sanitizer policy from aggressive to ultra-aggressive (allows only p, br, h1-h6, strong, em, b, i tags with no attributes); added 1MB input size limit; implemented fallback to plain-text extraction when sanitized HTML is empty or markdown conversion fails; added new extractPlainText() helper function to strip scripts, styles, and HTML tags.
Test Coverage Updates
common/pagetypeclassifier/pagetypeclassifier_test.go
Renamed and adjusted panic recovery tests to expect successful extraction; added assertions for non-empty results; introduced new subtests for large input handling (extremely large HTML) and plain-text extraction fallback validation (script/style exclusion).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas requiring extra attention:
    • Verify the ultra-aggressive sanitization policy doesn't inadvertently strip important content for the use case
    • Confirm the 1MB size limit is appropriate and gracefully handled in error scenarios
    • Review the fallback logic flow to ensure all edge cases (empty sanitized HTML, markdown conversion errors, panics) are correctly handled
    • Validate that extractPlainText() correctly removes script/style elements while preserving visible text content

Poem

🐰 With whiskers twitching, I sanitize the stream,
Ultra-aggressive now—a stricter, cleaner dream!
No sprawling divs or scripts to cause a fright,
Just plaintext fallbacks when markdown goes not-right.
A megabyte's enough; the HTML's made light! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing panic protection with fallback mechanisms for the HTML parser to address deeply nested HTML issues.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/html-parser-panic-protection

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
common/pagetypeclassifier/pagetypeclassifier.go (3)

44-55: Sanitizer policy change aligns with goal; consider documenting text-retention behavior

The ultra-aggressive policy (only headings, p, br, and basic formatting tags) is consistent with minimizing nesting and keeping the parser safe. Since bluemonday.NewPolicy drops disallowed elements but keeps their inner text, you still preserve most page content for classification, which is what you want here.

It may be worth adding a short note in the comment that disallowed elements’ content is retained (not discarded), to reassure future maintainers that narrowing the tag set does not wipe out text signal for the classifier.


60-102: htmlToText fallbacks are good; consider enhancing panic recovery and rune-safety

The multi-step strategy (1MB cap → sanitize → html-to-markdown → plain-text fallback) looks solid and should prevent the previous parser panics while still extracting useful text.

Two possible improvements:

  1. Use extractPlainText in the panic path as well.
    In the defer recovery you currently return an error and empty text, which forces Classify to fall back to "other". If you want “multiple fallbacks” to apply even on parser panics, you could salvage text via extractPlainText instead of returning empty:
func htmlToText(html string) (text string, err error) {
    defer func() {
        if r := recover(); r != nil {
-           err = fmt.Errorf("html parser panic: %v", r)
-           text = ""
+           // Recover from parser panic and fall back to plain text extraction.
+           text = extractPlainText(html)
+           err = nil
        }
    }()
  1. Rune-safe truncation (optional).
    html = html[:maxHTMLSize] truncates by bytes, which can split multi-byte runes in UTF‑8 content. If you expect a lot of non-ASCII text and want to avoid malformed characters, consider truncating by runes (e.g., via []rune(html) or a streaming limiter) at some cost to performance.

Neither is a blocker, but they would make the behavior more robust and more in line with the documented “fallback” intent.


104-155: Plain-text fallback works but has sharp edges and could be more efficient

extractPlainText does the job for a last-resort fallback, but a few details are worth noting:

  • The comment says “regex-based” but the implementation now uses index/search loops only; updating the comment would avoid confusion.
  • In both script/style loops, if a closing tag is missing, you drop everything from the opening tag to the end of the document (text = text[:start]). That’s acceptable for a best-effort fallback, but it can erase a lot of legitimate text on malformed HTML; a safer option is to just break and keep the remainder.
  • The tag-stripping loop builds result by repeatedly concatenating strings in a for over runes, which is O(n²) on large inputs. Using a strings.Builder (or bytes.Buffer) would keep it linear and cheaper on the worst‑case 1MB inputs.
  • Script/style removal and tag matching are strictly lowercase ("<script", "<style", "</script>", "</style>"); if you care about robustness, you might want case-insensitive handling.

All of these are non-critical since this is a fallback path, but tightening them up would make the behavior more predictable and scalable.

common/pagetypeclassifier/pagetypeclassifier_test.go (2)

60-82: Deeply nested HTML tests are helpful; tighten assertions to validate behavior

The new subtests around deeply nested HTML are valuable and directly exercise the panic-protection and fallback logic.

A couple of tweaks could make them more meaningful:

  • In test resilience with deeply nested HTML, require.NotEmpty and require.NotEqual(t, "", result) are redundant, and since Classify never returns an empty string, this test will always pass regardless of whether real text was extracted or if it just fell back to "other". Consider asserting on something stronger (e.g., require.NotEqual(t, "other", result) or checking via htmlToText as you do below) if you want to ensure the classifier is using extracted content.
  • In test htmlToText with deeply nested HTML, the require.Contains(t, result, "Some text content") is exactly the kind of assertion that proves your sanitization/fallback pipeline now returns usable text; this is good.

No blockers here; these are just suggestions to better lock in the intended behavior.

Also applies to: 84-101


121-142: extractPlainText fallback test gives good coverage; consider one more positive assertion

This test does a nice job validating that extractPlainText:

  • Produces non-empty output.
  • Retains visible content such as "Title", "important", and "content".
  • Excludes script/style content like "alert" and "color: red".

As a small enhancement, you might also assert that “Nested content” (inside <div><span>…</span></div>) survives, to confirm nested non-allowed container tags are handled as expected by the fallback too.

Otherwise, this test accurately captures the intended behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b04cd9 and a260b32.

📒 Files selected for processing (2)
  • common/pagetypeclassifier/pagetypeclassifier.go (3 hunks)
  • common/pagetypeclassifier/pagetypeclassifier_test.go (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
common/pagetypeclassifier/pagetypeclassifier_test.go (2)
common/pagetypeclassifier/pagetypeclassifier.go (1)
  • New (21-27)
runner/runner.go (1)
  • New (117-406)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Functional Test (macOS-latest)
  • GitHub Check: Functional Test (windows-latest)
  • GitHub Check: Functional Test (ubuntu-latest)
  • GitHub Check: Analyze (go)
  • GitHub Check: release-test
🔇 Additional comments (1)
common/pagetypeclassifier/pagetypeclassifier_test.go (1)

3-5: Large HTML test and strings usage look appropriate

Importing strings and using strings.Repeat to generate a >1MB HTML document is a clean way to exercise the new size-limiting behavior in htmlToText. The test asserts that:

  • No error is returned.
  • The result is non-empty.

This is a sensible regression test for the previous panic issue on very large inputs. If you ever want to be stricter, you could additionally assert that processing time stays reasonable via a benchmark, but that’s not necessary for unit tests.

Also applies to: 110-119

@ehsandeep ehsandeep merged commit 599441e into dev Dec 6, 2025
15 checks passed
@ehsandeep ehsandeep deleted the fix/html-parser-panic-protection branch December 6, 2025 18:09
@ehsandeep ehsandeep removed the request for review from Mzack9999 December 6, 2025 18:09
ehsandeep added a commit that referenced this pull request Dec 6, 2025
* Improve error handling in htmlToText function

Enhance htmlToText function to handle panics and errors safely.

panic: html: open stack of elements exceeds 512 nodes

goroutine 5523922 [running]:
github.com/projectdiscovery/httpx/common/pagetypeclassifier.htmlToText(...)
	/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:36
github.com/projectdiscovery/httpx/common/pagetypeclassifier.(*PageTypeClassifier).Classify(0xc0005164d8, {0xc0ba03a000?, 0xd?})
	/home/runner/work/httpx/httpx/common/pagetypeclassifier/pagetypeclassifier.go:26 +0x6f
github.com/projectdiscovery/httpx/runner.(*Runner).analyze(_, _, {_, _}, {{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, ...)
	/home/runner/work/httpx/httpx/runner/runner.go:2349 +0x7555
github.com/projectdiscovery/httpx/runner.(*Runner).process.func1({{0xc00470c450, 0xb}, {0x0, 0x0}, {0x0, 0x0}}, {0x1686161?, 0x10?}, {0x16ace2d, 0xa})
	/home/runner/work/httpx/httpx/runner/runner.go:1444 +0x125
created by github.com/projectdiscovery/httpx/runner.(*Runner).process in goroutine 1
	/home/runner/work/httpx/httpx/runner/runner.go:1442 +0x8a6

* chore(deps): bump golang.org/x/text from 0.30.0 to 0.31.0

Bumps [golang.org/x/text](https://github.com/golang/text) from 0.30.0 to 0.31.0.
- [Release notes](https://github.com/golang/text/releases)
- [Commits](golang/text@v0.30.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/text
  dependency-version: 0.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump golang.org/x/net from 0.46.0 to 0.47.0

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.46.0 to 0.47.0.
- [Commits](golang/net@v0.46.0...v0.47.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.47.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump github.com/PuerkitoBio/goquery from 1.10.3 to 1.11.0

Bumps [github.com/PuerkitoBio/goquery](https://github.com/PuerkitoBio/goquery) from 1.10.3 to 1.11.0.
- [Release notes](https://github.com/PuerkitoBio/goquery/releases)
- [Commits](PuerkitoBio/goquery@v1.10.3...v1.11.0)

---
updated-dependencies:
- dependency-name: github.com/PuerkitoBio/goquery
  dependency-version: 1.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump the modules group with 5 updates

Bumps the modules group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [github.com/projectdiscovery/cdncheck](https://github.com/projectdiscovery/cdncheck) | `1.2.9` | `1.2.10` |
| [github.com/projectdiscovery/gologger](https://github.com/projectdiscovery/gologger) | `1.1.59` | `1.1.60` |
| [github.com/projectdiscovery/networkpolicy](https://github.com/projectdiscovery/networkpolicy) | `0.1.27` | `0.1.28` |
| [github.com/projectdiscovery/utils](https://github.com/projectdiscovery/utils) | `0.6.1-0.20251030144701-ce5c4b44e1e6` | `0.6.1` |
| [github.com/projectdiscovery/wappalyzergo](https://github.com/projectdiscovery/wappalyzergo) | `0.2.54` | `0.2.55` |


Updates `github.com/projectdiscovery/cdncheck` from 1.2.9 to 1.2.10
- [Release notes](https://github.com/projectdiscovery/cdncheck/releases)
- [Changelog](https://github.com/projectdiscovery/cdncheck/blob/main/.goreleaser.yaml)
- [Commits](projectdiscovery/cdncheck@v1.2.9...v1.2.10)

Updates `github.com/projectdiscovery/gologger` from 1.1.59 to 1.1.60
- [Release notes](https://github.com/projectdiscovery/gologger/releases)
- [Commits](projectdiscovery/gologger@v1.1.59...v1.1.60)

Updates `github.com/projectdiscovery/networkpolicy` from 0.1.27 to 0.1.28
- [Release notes](https://github.com/projectdiscovery/networkpolicy/releases)
- [Commits](projectdiscovery/networkpolicy@v0.1.27...v0.1.28)

Updates `github.com/projectdiscovery/utils` from 0.6.1-0.20251030144701-ce5c4b44e1e6 to 0.6.1
- [Release notes](https://github.com/projectdiscovery/utils/releases)
- [Changelog](https://github.com/projectdiscovery/utils/blob/main/CHANGELOG.md)
- [Commits](https://github.com/projectdiscovery/utils/commits/v0.6.1)

Updates `github.com/projectdiscovery/wappalyzergo` from 0.2.54 to 0.2.55
- [Release notes](https://github.com/projectdiscovery/wappalyzergo/releases)
- [Commits](projectdiscovery/wappalyzergo@v0.2.54...v0.2.55)

---
updated-dependencies:
- dependency-name: github.com/projectdiscovery/cdncheck
  dependency-version: 1.2.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/gologger
  dependency-version: 1.1.60
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/networkpolicy
  dependency-version: 0.1.28
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/utils
  dependency-version: 0.6.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/wappalyzergo
  dependency-version: 0.2.55
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
...

Signed-off-by: dependabot[bot] <support@github.com>

* better error handling

* chore(deps): bump golang.org/x/crypto from 0.44.0 to 0.45.0

Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.44.0 to 0.45.0.
- [Commits](golang/crypto@v0.44.0...v0.45.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.45.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* adding panic guard + tests

* lint

* chore(deps): bump github.com/weppos/publicsuffix-go

Bumps [github.com/weppos/publicsuffix-go](https://github.com/weppos/publicsuffix-go) from 0.50.0 to 0.50.1.
- [Changelog](https://github.com/weppos/publicsuffix-go/blob/main/CHANGELOG.md)
- [Commits](weppos/publicsuffix-go@v0.50.0...v0.50.1)

---
updated-dependencies:
- dependency-name: github.com/weppos/publicsuffix-go
  dependency-version: 0.50.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump the modules group with 11 updates

Bumps the modules group with 11 updates:

| Package | From | To |
| --- | --- | --- |
| [github.com/projectdiscovery/cdncheck](https://github.com/projectdiscovery/cdncheck) | `1.2.10` | `1.2.11` |
| [github.com/projectdiscovery/dsl](https://github.com/projectdiscovery/dsl) | `0.8.4` | `0.8.5` |
| [github.com/projectdiscovery/fastdialer](https://github.com/projectdiscovery/fastdialer) | `0.4.15` | `0.4.17` |
| [github.com/projectdiscovery/gologger](https://github.com/projectdiscovery/gologger) | `1.1.60` | `1.1.61` |
| [github.com/projectdiscovery/hmap](https://github.com/projectdiscovery/hmap) | `0.0.95` | `0.0.96` |
| [github.com/projectdiscovery/networkpolicy](https://github.com/projectdiscovery/networkpolicy) | `0.1.28` | `0.1.29` |
| [github.com/projectdiscovery/retryablehttp-go](https://github.com/projectdiscovery/retryablehttp-go) | `1.0.131` | `1.0.132` |
| [github.com/projectdiscovery/tlsx](https://github.com/projectdiscovery/tlsx) | `1.2.1` | `1.2.2` |
| [github.com/projectdiscovery/useragent](https://github.com/projectdiscovery/useragent) | `0.0.102` | `0.0.103` |
| [github.com/projectdiscovery/utils](https://github.com/projectdiscovery/utils) | `0.6.1` | `0.7.1` |
| [github.com/projectdiscovery/wappalyzergo](https://github.com/projectdiscovery/wappalyzergo) | `0.2.55` | `0.2.56` |


Updates `github.com/projectdiscovery/cdncheck` from 1.2.10 to 1.2.11
- [Release notes](https://github.com/projectdiscovery/cdncheck/releases)
- [Changelog](https://github.com/projectdiscovery/cdncheck/blob/main/.goreleaser.yaml)
- [Commits](projectdiscovery/cdncheck@v1.2.10...v1.2.11)

Updates `github.com/projectdiscovery/dsl` from 0.8.4 to 0.8.5
- [Release notes](https://github.com/projectdiscovery/dsl/releases)
- [Commits](projectdiscovery/dsl@v0.8.4...v0.8.5)

Updates `github.com/projectdiscovery/fastdialer` from 0.4.15 to 0.4.17
- [Release notes](https://github.com/projectdiscovery/fastdialer/releases)
- [Commits](projectdiscovery/fastdialer@v0.4.15...v0.4.17)

Updates `github.com/projectdiscovery/gologger` from 1.1.60 to 1.1.61
- [Release notes](https://github.com/projectdiscovery/gologger/releases)
- [Commits](projectdiscovery/gologger@v1.1.60...v1.1.61)

Updates `github.com/projectdiscovery/hmap` from 0.0.95 to 0.0.96
- [Release notes](https://github.com/projectdiscovery/hmap/releases)
- [Commits](projectdiscovery/hmap@v0.0.95...v0.0.96)

Updates `github.com/projectdiscovery/networkpolicy` from 0.1.28 to 0.1.29
- [Release notes](https://github.com/projectdiscovery/networkpolicy/releases)
- [Commits](projectdiscovery/networkpolicy@v0.1.28...v0.1.29)

Updates `github.com/projectdiscovery/retryablehttp-go` from 1.0.131 to 1.0.132
- [Release notes](https://github.com/projectdiscovery/retryablehttp-go/releases)
- [Commits](projectdiscovery/retryablehttp-go@v1.0.131...v1.0.132)

Updates `github.com/projectdiscovery/tlsx` from 1.2.1 to 1.2.2
- [Release notes](https://github.com/projectdiscovery/tlsx/releases)
- [Changelog](https://github.com/projectdiscovery/tlsx/blob/main/.goreleaser.yml)
- [Commits](projectdiscovery/tlsx@v1.2.1...v1.2.2)

Updates `github.com/projectdiscovery/useragent` from 0.0.102 to 0.0.103
- [Release notes](https://github.com/projectdiscovery/useragent/releases)
- [Commits](projectdiscovery/useragent@v0.0.102...v0.0.103)

Updates `github.com/projectdiscovery/utils` from 0.6.1 to 0.7.1
- [Release notes](https://github.com/projectdiscovery/utils/releases)
- [Changelog](https://github.com/projectdiscovery/utils/blob/main/CHANGELOG.md)
- [Commits](projectdiscovery/utils@v0.6.1...v0.7.1)

Updates `github.com/projectdiscovery/wappalyzergo` from 0.2.55 to 0.2.56
- [Release notes](https://github.com/projectdiscovery/wappalyzergo/releases)
- [Commits](projectdiscovery/wappalyzergo@v0.2.55...v0.2.56)

---
updated-dependencies:
- dependency-name: github.com/projectdiscovery/cdncheck
  dependency-version: 1.2.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/dsl
  dependency-version: 0.8.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/fastdialer
  dependency-version: 0.4.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/gologger
  dependency-version: 1.1.61
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/hmap
  dependency-version: 0.0.96
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/networkpolicy
  dependency-version: 0.1.29
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/retryablehttp-go
  dependency-version: 1.0.132
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/tlsx
  dependency-version: 1.2.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/useragent
  dependency-version: 0.0.103
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/utils
  dependency-version: 0.7.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/wappalyzergo
  dependency-version: 0.2.56
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix test

* chore(deps): bump github.com/JohannesKaufmann/html-to-markdown/v2

Bumps [github.com/JohannesKaufmann/html-to-markdown/v2](https://github.com/JohannesKaufmann/html-to-markdown) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/JohannesKaufmann/html-to-markdown/releases)
- [Commits](JohannesKaufmann/html-to-markdown@v2.4.0...v2.5.0)

---
updated-dependencies:
- dependency-name: github.com/JohannesKaufmann/html-to-markdown/v2
  dependency-version: 2.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump the modules group with 10 updates

Bumps the modules group with 10 updates:

| Package | From | To |
| --- | --- | --- |
| [github.com/projectdiscovery/cdncheck](https://github.com/projectdiscovery/cdncheck) | `1.2.11` | `1.2.12` |
| [github.com/projectdiscovery/dsl](https://github.com/projectdiscovery/dsl) | `0.8.5` | `0.8.6` |
| [github.com/projectdiscovery/fastdialer](https://github.com/projectdiscovery/fastdialer) | `0.4.17` | `0.4.18` |
| [github.com/projectdiscovery/gologger](https://github.com/projectdiscovery/gologger) | `1.1.61` | `1.1.62` |
| [github.com/projectdiscovery/hmap](https://github.com/projectdiscovery/hmap) | `0.0.96` | `0.0.97` |
| [github.com/projectdiscovery/networkpolicy](https://github.com/projectdiscovery/networkpolicy) | `0.1.29` | `0.1.30` |
| [github.com/projectdiscovery/retryablehttp-go](https://github.com/projectdiscovery/retryablehttp-go) | `1.0.132` | `1.0.133` |
| [github.com/projectdiscovery/useragent](https://github.com/projectdiscovery/useragent) | `0.0.103` | `0.0.104` |
| [github.com/projectdiscovery/utils](https://github.com/projectdiscovery/utils) | `0.7.1` | `0.7.3` |
| [github.com/projectdiscovery/wappalyzergo](https://github.com/projectdiscovery/wappalyzergo) | `0.2.56` | `0.2.57` |


Updates `github.com/projectdiscovery/cdncheck` from 1.2.11 to 1.2.12
- [Release notes](https://github.com/projectdiscovery/cdncheck/releases)
- [Commits](projectdiscovery/cdncheck@v1.2.11...v1.2.12)

Updates `github.com/projectdiscovery/dsl` from 0.8.5 to 0.8.6
- [Release notes](https://github.com/projectdiscovery/dsl/releases)
- [Commits](projectdiscovery/dsl@v0.8.5...v0.8.6)

Updates `github.com/projectdiscovery/fastdialer` from 0.4.17 to 0.4.18
- [Release notes](https://github.com/projectdiscovery/fastdialer/releases)
- [Commits](projectdiscovery/fastdialer@v0.4.17...v0.4.18)

Updates `github.com/projectdiscovery/gologger` from 1.1.61 to 1.1.62
- [Release notes](https://github.com/projectdiscovery/gologger/releases)
- [Commits](projectdiscovery/gologger@v1.1.61...v1.1.62)

Updates `github.com/projectdiscovery/hmap` from 0.0.96 to 0.0.97
- [Release notes](https://github.com/projectdiscovery/hmap/releases)
- [Commits](projectdiscovery/hmap@v0.0.96...v0.0.97)

Updates `github.com/projectdiscovery/networkpolicy` from 0.1.29 to 0.1.30
- [Release notes](https://github.com/projectdiscovery/networkpolicy/releases)
- [Commits](projectdiscovery/networkpolicy@v0.1.29...v0.1.30)

Updates `github.com/projectdiscovery/retryablehttp-go` from 1.0.132 to 1.0.133
- [Release notes](https://github.com/projectdiscovery/retryablehttp-go/releases)
- [Commits](projectdiscovery/retryablehttp-go@v1.0.132...v1.0.133)

Updates `github.com/projectdiscovery/useragent` from 0.0.103 to 0.0.104
- [Release notes](https://github.com/projectdiscovery/useragent/releases)
- [Commits](projectdiscovery/useragent@v0.0.103...v0.0.104)

Updates `github.com/projectdiscovery/utils` from 0.7.1 to 0.7.3
- [Release notes](https://github.com/projectdiscovery/utils/releases)
- [Changelog](https://github.com/projectdiscovery/utils/blob/main/CHANGELOG.md)
- [Commits](projectdiscovery/utils@v0.7.1...v0.7.3)

Updates `github.com/projectdiscovery/wappalyzergo` from 0.2.56 to 0.2.57
- [Release notes](https://github.com/projectdiscovery/wappalyzergo/releases)
- [Commits](projectdiscovery/wappalyzergo@v0.2.56...v0.2.57)

---
updated-dependencies:
- dependency-name: github.com/projectdiscovery/cdncheck
  dependency-version: 1.2.12
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/dsl
  dependency-version: 0.8.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/fastdialer
  dependency-version: 0.4.18
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/gologger
  dependency-version: 1.1.62
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/hmap
  dependency-version: 0.0.97
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/networkpolicy
  dependency-version: 0.1.30
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/retryablehttp-go
  dependency-version: 1.0.133
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/useragent
  dependency-version: 0.0.104
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/utils
  dependency-version: 0.7.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
- dependency-name: github.com/projectdiscovery/wappalyzergo
  dependency-version: 0.2.57
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: modules
...

Signed-off-by: dependabot[bot] <support@github.com>

* feat: update `-ldp` option to show default ports in CLI output (#2331)

feat: update -ldp option to show default ports in CLI output

- Modified URL formatting in runner.go to respect LeaveDefaultPorts option
- Fixed AddURLDefaultPort function to actually add default ports (80/443)
- When -ldp is used, CLI output now shows https://example.com:443 instead of https://example.com
- Maintains backward compatibility - default behavior unchanged

Fixes CLI output inconsistency where -ldp flag only affected Host headers
but not the displayed URL in console output.

* fix: HTML parser panic protection with multiple fallback (#2330)

fix: enhance HTML parser panic protection with multiple fallback strategies

- Add ultra-aggressive HTML sanitization to reduce nesting depth
- Implement size limiting (1MB) to prevent processing huge documents
- Add plain text extraction fallback for complex HTML structures
- Enhance panic recovery with comprehensive error handling
- Remove deeply nestable elements (div, span, ul, ol, li) from sanitizer
- Add comprehensive test coverage for edge cases

Resolves HTML parser panic: 'html: open stack of elements exceeds 512 nodes'
that occurred after switching to html-to-markdown/v2 library in PR #2255

* fix: host JSON field now returns hostname instead of IP (#2333)

- Changed 'host' field to return actual hostname (e.g., example.com)
- Added new 'host_ip' field for the resolved/dialed IP address
- Fixes semantic issue where 'host' was incorrectly returning IP

* version update

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: @GDATTACKER <37478652+GDATTACKER-RESEARCHER@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mzack9999 <mzack9999@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

html: open stack of elements exceeds 512 nodes

3 participants