Skip to content

Conversation

@roberth
Copy link

@roberth roberth commented May 15, 2025

Hi 👋

This implements the suggestion by markhildreth to implement such a setting, based on the conversation in #1238.
<link rel="canonical"> plays an important role in avoiding an SEO penalty when users deploy multiple versions of their site.

I've additionally infixed -site- to highlight the relationship with the site-url, and to distinguish it from the canonical URL as it occurs in a page.

This is a more user-friendly alternative to #2415 as it's an "end to end" solution with documentation. (It may have usages beyond this use case though)

Let me know if there's anything I should improve.
When this is done, I'd like to implement a setting for opting in to clean URLs without the .html suffix so that we can close the whole issue.

Changelog suggestion (not included due to repeated conflicts...)

- Added [`canonical-site-url`](https://rust-lang.github.io/mdBook/format/configuration/renderers.html?highlight=canonical-site-url#html-renderer-options) setting, to set `<link rel="canonical">` in the HTML output of each page.

@rustbot rustbot added the S-waiting-on-review Status: waiting on a review label May 15, 2025
@rustbot

This comment has been minimized.

@roberth roberth force-pushed the canonical-site-url branch from e421180 to 9c26fc4 Compare May 25, 2025 14:10
@rustbot

This comment has been minimized.

@roberth roberth force-pushed the canonical-site-url branch from 9c26fc4 to cbca16c Compare May 27, 2025 07:28
@roberth roberth force-pushed the canonical-site-url branch from cbca16c to fe29dc1 Compare July 24, 2025 15:53
@rustbot

This comment has been minimized.

Copy link

@pinage404 pinage404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally, it works great

I hope a solution will be merged

@whitequark
Copy link

I'd really like to have this functionality; is there a way I can help advance this PR? I want it for #1238.

@roberth roberth force-pushed the canonical-site-url branch from fe29dc1 to 2554834 Compare October 18, 2025 20:32
@rustbot

This comment has been minimized.

@roberth
Copy link
Author

roberth commented Oct 18, 2025

I've rebased and put my suggestion for a changelog entry in the PR description, because it kept causing conflicts.

Let us know if there's anything else we can do!

@rustbot

This comment has been minimized.

@roberth roberth force-pushed the canonical-site-url branch from 2554834 to 4bff912 Compare October 28, 2025 14:45
@rustbot

This comment has been minimized.

@roberth roberth force-pushed the canonical-site-url branch from 4bff912 to 1231a81 Compare October 28, 2025 14:50
<base href="{{ base_url }}">
{{/if}}
{{#if canonical_url}}
<link rel="canonical" href="{{ canonical_url }}">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding also the Open Graph URL ?

<meta property="og:url" content="{{ canonical_url }}" />

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my use case where we have versioned copies of the docs and latest, we wouldn't want social media to rewrite the links to point to latest, making it impossible to share this kind of "permalink", so I don't think this is automatically desirable just based on the SEO oriented canonical URL.

IIUC you could add it in the custom html below if it makes sense for your use case.

I guess if you have more equivalent locations in terms of content, you may want social media to rewrite those?
Maybe if your content is published by others too in places where you can't get them to redirect or something, but you can control (a bit) the files that get published. (Sounds almost adversarial, but then it would be pointless to try. Maybe this can happen with some systems/ecosystems that cause duplicates of stuff to be published everywhere? idk - I'll stop speculating)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the more interesting question is would you want to rewrite social media without also hinting the search engines?
If that's a use case, maybe we should only define canonical_url but leave the socials-vs-search decision up to the custom HTML template, to keep things simple for the mdBook implementation?
If we don't have such a use case, I'd like to keep this built-in behavior right here, because otherwise you have a weird setting without any effect unless you customize your theme. That seems like an step too much for me in terms of UX.
So right back at ya, I guess :)
What do you think of removing <link rel="canonical" .../>?

Copy link

@pinage404 pinage404 Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, i can add the <meta property="og:url" content="{{ canonical_url }}" /> in my own head.hbs

I think i would expect to have the <link rel="canonical" .../> by default when i provide a canonical_url, the UX is better, you are right too !

So, let's change nothing 😅

(I don't know how to close this thread)

@rustbot

This comment has been minimized.

@roberth roberth force-pushed the canonical-site-url branch from 1231a81 to df6f055 Compare November 7, 2025 00:44
@rustbot
Copy link
Collaborator

rustbot commented Nov 7, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

Based on the conversation in rust-lang#1238,
this implements the suggestion by markhildreth to implement such a setting.

I've additionally infixed `-site-` to highlight the relationship with the
`site-url`, and to distinguish it from the canonical URL as it occurs in a
page.
@roberth roberth force-pushed the canonical-site-url branch from df6f055 to a5ef49b Compare January 17, 2026 12:03
Copy link

@pinage404 pinage404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still works great

@GuillaumeGomez
Copy link
Member

Looks good to me, thanks!

cc @ehuss

@ehuss
Copy link
Contributor

ehuss commented Jan 17, 2026

Can you please explain the relationship and picture for how this relates with:

When might these be different? How do we avoid adding four different settings that specify the "url" of the site? If we do need to add multiple URLs, is there a way we can do that so it isn't too confusing?

Some other questions:

  • What are the consequences of setting this?
  • How would this deal with the first page being translated to index.html?

@GuillaumeGomez What is your interest in this feature? Do you want this for doc.rust-lang.org?

@GuillaumeGomez
Copy link
Member

No interest for docs.rs. Just a feature that seems useful for people hosting their books (like we do for askama for example).

@roberth
Copy link
Author

roberth commented Jan 17, 2026

Sitemap purpose: Tell search engines "here are all the pages that exist at this location"

Canonical purpose: Tell search engines "if you found this page, the authoritative version is at this other location"

site-url: The url where the book will be hosted. This is required to ensure navigation links and script/css imports in the 404 file work correctly, even when accessing urls in subdirectories. Defaults to /. If site-url is set, make sure to use document relative links for your assets, meaning they should not start with /. — mdbook docs


how this relates

site-url <-> canonical-site-url

The latter is optional. If unspecified, pages are generated as usual, and crawlers assume that

  • if the content is unique, it is canonical
  • if the contents are not unique, it is duplicated, and all sites that host it will be considered less valuable.

Sitemap URL <-> site-url

Not entirely clear to me; not the thing I've focused on.

  1. I assume each site would contain a correct sitemap file based on site-url.
  2. Each HTML page can reference a sitemap. I don't know what's best practice for that.
  • Maybe it's good SEO practice to point to the canonical sitemap instead of the local sitemap, but I don't really know.
  • I imagine for other tools that use the sitemap (browser extensions? accessibility? LLM harnesses?), those would prefer a reference to the local sitemap.

Link support of site-url

My assumption would be that any links that are not relative (to the page base directory), would be interpreted against site-url. I'm surprised that site-url does not have to specify a full URL with authority part.

My model for this

For the intrinsic structure of the site, in order: use relative paths, or site-url, preferably relative to /, alternatively relative to the absolute site URL.
I'm surprised that site-url can be relative, but I can see that be useful for e.g. having same content on staging and prod.

Only use a different URL where it matters for such things as SEO.
Sitemap content URL in the file itself: always the absolute site URL
Sitemap link in each page: default to absolute site URL or canonical URL + /sitemap.xml. Unsure, so maybe users should choose => add sitemap-link-url setting that defaults to either one (don't know which).
Canonical URL: separate URL, do not render unless specified.


If we do need to add multiple URLs, is there a way we can do that so it isn't too confusing?

It's intrinsic complexity of the web domain, but users with a simple one-version one-location deployment can perhaps be somewhat shielded from this by clearly marking the extra URL settings as "optional" or "for multi-version deployments" in the docs.


What are the consequences of setting this?

Higher ratings in search engines, and visitors are routed to your preferred version of the site.


How would this deal with the first page being translated to index.html?

Not sure what's special about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: waiting on a review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants