Skip to content

Comments

chore(): improve new link checker#7547

Open
mhartington wants to merge 4 commits intomainfrom
improve-external-link-check
Open

chore(): improve new link checker#7547
mhartington wants to merge 4 commits intomainfrom
improve-external-link-check

Conversation

@mhartington
Copy link
Member

@mhartington mhartington commented Feb 23, 2026

Summary by CodeRabbit

  • New Features

    • Added a new External Links CI workflow that scans PRs, produces a detailed JSON report, formats a Markdown summary, comments on PRs, and fails the job if broken links are found.
    • CLI now supports emitting detailed JSON output for link scans.
  • Chores

    • Replaced the old link-check workflow with the new checker; removed the previous workflow.
    • Introduced configurable concurrency for file extraction and URL checks via environment variables and improved per-file aggregation and reporting.
  • Other

    • Minor CLI parsing adjustment (argument terminator).

@vercel
Copy link

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
blog Ready Ready Preview, Comment Feb 23, 2026 5:47pm
docs Ready Ready Preview, Comment Feb 23, 2026 5:47pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae60e40 and 4e70f2c.

📒 Files selected for processing (2)
  • .github/workflows/external-links.yml
  • turbo.json
✅ Files skipped from review due to trivial changes (1)
  • turbo.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/external-links.yml

Walkthrough

Refactors the external-links checker to a two-phase, concurrency-configurable flow with per-file aggregation and JSON reporting; adds a GitHub Actions workflow to run and format the checker output on PRs; removes the old Lychee workflow; and updates the npm script argument separator.

Changes

Cohort / File(s) Summary
External links linter script
apps/docs/scripts/lint-external-links.ts
Reworks link checker: splits single concurrency into DEFAULT_FILE_CONCURRENCY (15) and DEFAULT_URL_CHECK_CONCURRENCY (40); adds readPositiveIntEnv(); makes collectFileLinks async (uses fs.promises); introduces LinkCheckReport type; performs per-file occurrence merging (mergeOccurrences); implements two-phase flow (concurrent file extraction, then concurrent URL checks); emits detailed JSON with --json; respects EXTERNAL_LINKS_FILE_CONCURRENCY and EXTERNAL_LINKS_URL_CONCURRENCY; updates exit codes and some formatting.
New workflow: run external checker
.github/workflows/external-links.yml
Adds "External Links" GitHub Actions workflow on pull_request: checkout, setup pnpm/Node, install deps, run the external link checker (capture JSON and stderr), format a Markdown report (falls back to stderr on parse issues), append formatted report to workspace, post PR comment for non-forks, and fail the job if checker exit code indicates failures.
Removed workflow: lychee
.github/workflows/lychee.yml
Deletes the previous Lychee-based link-check workflow (all steps, retries, and report formatting removed).
npm script separator
package.json
Updates lint:external-links script to append -- (argument terminator) to the turbo invocation to change how subsequent CLI args are forwarded.
Minor file metadata
turbo.json
Only a trailing newline was added (no functional changes).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "chore(): improve new link checker" is vague and generic, using non-descriptive phrasing that doesn't convey what specifically was improved or which component was changed. Replace with a more specific title that describes the main improvement, such as "chore: refactor external link checker with configurable concurrency and JSON reporting" or "chore: replace Lychee with custom external link validation."
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

🍈 Lychee Link Check Report

3664 links: ✅ 2743 OK | 🚫 0 errors | 🔀 255 redirects | 👻 664 excluded

✅ All links are working!


Full Statistics Table
Status Count
✅ Successful 2743
🔀 Redirected 255
👻 Excluded 664
🚫 Errors 0
⛔ Unsupported 2
⏳ Timeouts 0
❓ Unknown 0

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/docs/scripts/lint-external-links.ts (2)

297-316: Two-phase design (collect then validate) is a solid architectural choice.

Separating link extraction from URL validation is cleaner and more efficient — you deduplicate URLs before hitting the network, and the two phases have independent concurrency knobs. Good stuff.

One note on lines 314–315: perFileOccurrences.push(...) and extractedLinks += ... mutate shared state from within concurrent workers. This is safe in Node.js because there's no await between the mutation and the next yield point (JS is cooperative, not preemptive). Just calling it out so future readers don't panic — a brief inline comment would help.

📝 Optional: add a clarifying comment
+    // Safe to mutate – no `await` between here and the next iteration,
+    // so no other worker can interleave.
     perFileOccurrences.push(localOccurrences);
     extractedLinks += links.length;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/docs/scripts/lint-external-links.ts` around lines 297 - 316, The
concurrent worker body mutates shared variables perFileOccurrences and
extractedLinks which can look unsafe; add a brief inline comment right above the
perFileOccurrences.push(localOccurrences) and extractedLinks += links.length
lines explaining that this mutation is safe because runWithConcurrency runs on
Node's single-threaded event loop (no preemptive multithreading) and there is no
await between the read/modify/write steps, so the operations are atomic in this
context; reference the worker function passed to runWithConcurrency and the
local variable localOccurrences in the comment so readers can locate the code
quickly.

267-279: mergeOccurrences aliases source arrays into target — consider cloning.

On line 277, when a URL isn't yet in the target map, you set the reference directly: target.set(url, occurrences). This means target and source now share the same array instance. If anything ever mutates the source map's arrays after merging, the target map silently reflects those changes too.

In your current usage this is safe because perFileOccurrences entries aren't touched after merge. But it's a subtle footgun for future maintainers. A shallow copy ([...occurrences]) on line 277 would eliminate this coupling at negligible cost.

♻️ Optional: break the shared reference
-    target.set(url, occurrences);
+    target.set(url, [...occurrences]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/docs/scripts/lint-external-links.ts` around lines 267 - 279, The
mergeOccurrences function currently aliases the source array into the target by
calling target.set(url, occurrences), which creates a shared mutable reference;
change the behavior so you insert a shallow copy of the occurrences array when
the URL is new (i.e., set the target entry to a new array containing the same
items instead of the original array) to avoid accidental future mutations of
source affecting target while preserving current semantics; update the logic in
mergeOccurrences where target.set is called for a non-existing key to store a
copied array rather than the original.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/docs/scripts/lint-external-links.ts`:
- Around line 297-316: The concurrent worker body mutates shared variables
perFileOccurrences and extractedLinks which can look unsafe; add a brief inline
comment right above the perFileOccurrences.push(localOccurrences) and
extractedLinks += links.length lines explaining that this mutation is safe
because runWithConcurrency runs on Node's single-threaded event loop (no
preemptive multithreading) and there is no await between the read/modify/write
steps, so the operations are atomic in this context; reference the worker
function passed to runWithConcurrency and the local variable localOccurrences in
the comment so readers can locate the code quickly.
- Around line 267-279: The mergeOccurrences function currently aliases the
source array into the target by calling target.set(url, occurrences), which
creates a shared mutable reference; change the behavior so you insert a shallow
copy of the occurrences array when the URL is new (i.e., set the target entry to
a new array containing the same items instead of the original array) to avoid
accidental future mutations of source affecting target while preserving current
semantics; update the logic in mergeOccurrences where target.set is called for a
non-existing key to store a copied array rather than the original.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1099089 and b6e4c0c.

📒 Files selected for processing (1)
  • apps/docs/scripts/lint-external-links.ts

coderabbitai[bot]
coderabbitai bot previously approved these changes Feb 23, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

External Link Check Report

❌ Failed to parse JSON output from checker.

Raw output
Attention:
Turborepo now collects completely anonymous telemetry regarding usage.
This information is used to shape the Turborepo roadmap and prioritize features.
You can learn more, including how to opt-out if you'd not like to participate in this anonymous program, by visiting the following URL:
https://turborepo.dev/docs/telemetry

::error::docs#lint:external-links: command (/home/runner/work/web/web/apps/docs) /home/runner/setup-pnpm/node_modules/.bin/pnpm run lint:external-links -- --json exited (1)
 ERROR  run failed: command  exited (1)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/docs/scripts/lint-external-links.ts (1)

153-153: MARKDOWN_IMAGE_REGEX.lastIndex = 0 is dead code — the regex is never iterated.

MARKDOWN_IMAGE_REGEX is reset here but has no corresponding while loop in collectFileLinks. Image links are already excluded by the (?<!!) negative lookbehind in MARKDOWN_LINK_REGEX (line 30), so this reset serves no purpose.

🧹 Proposed cleanup
-  MARKDOWN_IMAGE_REGEX.lastIndex = 0;
   MARKDOWN_LINK_REGEX.lastIndex = 0;
   AUTOLINK_REGEX.lastIndex = 0;
   HTML_ANCHOR_REGEX.lastIndex = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/docs/scripts/lint-external-links.ts` at line 153, Remove the dead reset
of the MARKDOWN_IMAGE_REGEX in collectFileLinks: the statement
"MARKDOWN_IMAGE_REGEX.lastIndex = 0" is unnecessary because MARKDOWN_IMAGE_REGEX
is never iterated and image links are already excluded by the negative
lookbehind in MARKDOWN_LINK_REGEX; delete that line (referencing
MARKDOWN_IMAGE_REGEX and the collectFileLinks function) to clean up the
codebase.
.github/workflows/external-links.yml (1)

33-42: Add a step-level timeout to prevent runaway link-checker runs.

Without a timeout-minutes cap, a large docs site with hundreds of slow-responding external URLs could stall the job indefinitely (the script-level TIMEOUT_MS only gates individual URL fetches, not the whole run).

⏱️ Proposed fix
     - name: Run external link checker
       id: external-links
+      timeout-minutes: 30
       shell: bash
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/external-links.yml around lines 33 - 42, The "Run external
link checker" step (id: external-links) needs a step-level timeout to avoid
runaway runs; add a timeout-minutes value (for example timeout-minutes: 30 or an
appropriate cap for your CI budget) to the step definition that runs "pnpm run
lint:external-links" so the whole step is terminated if it exceeds that limit,
keeping the existing commands and exit handling intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/external-links.yml:
- Around line 33-42: The "Run external link checker" step (id: external-links)
needs a step-level timeout to avoid runaway runs; add a timeout-minutes value
(for example timeout-minutes: 30 or an appropriate cap for your CI budget) to
the step definition that runs "pnpm run lint:external-links" so the whole step
is terminated if it exceeds that limit, keeping the existing commands and exit
handling intact.

In `@apps/docs/scripts/lint-external-links.ts`:
- Line 153: Remove the dead reset of the MARKDOWN_IMAGE_REGEX in
collectFileLinks: the statement "MARKDOWN_IMAGE_REGEX.lastIndex = 0" is
unnecessary because MARKDOWN_IMAGE_REGEX is never iterated and image links are
already excluded by the negative lookbehind in MARKDOWN_LINK_REGEX; delete that
line (referencing MARKDOWN_IMAGE_REGEX and the collectFileLinks function) to
clean up the codebase.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6e4c0c and ae60e40.

📒 Files selected for processing (4)
  • .github/workflows/external-links.yml
  • .github/workflows/lychee.yml
  • apps/docs/scripts/lint-external-links.ts
  • package.json
💤 Files with no reviewable changes (1)
  • .github/workflows/lychee.yml

coderabbitai[bot]
coderabbitai bot previously approved these changes Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant