Skip to content

Handle xmllint versions that return zero even if validation fails #700

@climbfuji

Description

@climbfuji

From PR #691 (#691 (comment)):

I am not sure if I captured this correctly from my memory. I also don't know what the best way is to deal with that. I asked chatgpt for good (exit code /= 0 if xml validation fails) and bad (exit code == 0 if validation fails) versions of xmllint, but the answer isn't simple (see below).
It seems that the best way to deal with those xmllint differences is to not use call_command as it is but check for (a) the return code /= 0 and (b) text in the output stderr/stdout that suggest that the validation failed. This is definitely beyond the scope of this PR and should be addressed separately. The PR that addresses these xmllint issues must ensure that we have both good and bad versions of xmllint installed in GitHub actions so that we can test the solution. For now, since GitHub actions and my laptop both have xmllint versions that behave correctly (rc /= 0), the code in this PR works correctly.

chatgpt

Good question. I dug through public bug reports, mailing-lists and forum posts; unfortunately there is not a clean, authoritative “matrix” of every version of xmllint / libxml2 ⇢ exit-code behavior. But I can summarise the known confirmed cases — plus what is unknown. Use this as a working “table of suspicion” (not a guarantee).


✅ What we know (sources / reports)

libxml2 / xmllint version (or build) Observed “bad” behavior (validation or namespace error → exit code 0) Source / context
“libxml version 20912” (on Ubuntu 20.04) Namespace error (undefined prefix) — but exit code 0 Mailing-list report: user on Ubuntu 20.04 said xmllint --noout text.xml; echo $? produced 0 despite a “namespace error : Namespace prefix v … not defined” message. (GNOME Mail)
“libxml version 20510” (on some older distro) Schema validation failure (invalid content per XSD) — but exit code 0 A report on Unix.com: user ran xmllint --schema … myinput.xml and saw “fails to validate” message, but $? was 0. Their xmllint reported “using libxml version 20510” in that session. (Unix Linux Community)
(Various versions — unspecified) Inconsistent exit codes: sometimes non-zero, sometimes zero — depending on error type or build The upstream bug tracker has an issue titled “xmllint doesn’t always terminate with a non-zero exit status in case of bad usage”. (GitLab GNOME)
Unspecified builds (e.g. older Debian / SGML-/DTD-based validation) HTML / DTD validation sometimes fails but exit code remains 0 A Debian bug report “XHTML 1.0 validation is broken” includes commentary that “the return code of xmllint is 0 … don’t look at it; it is not reliable.” (Google Groups)

Summary of what’s reliably documented: there are definitely some builds of libxml2 / xmllint — including version 20510 and 20912 — that have been observed to treat validation or namespace errors as “no error” as far as the exit status is concerned.


⚠ What we don’t know (lack of evidence / no systematic listing)

  • There is no publicly maintained, comprehensive registry that maps every libxml2 release (or distro build) to “good” vs “bad” exit-code behavior.

  • Many bug reports or forum threads omit the exact version (or build metadata) of xmllint, or only mention a generic “xmllint” — so you can’t reliably attribute behavior to a version number.

  • There may well be additional “bad” versions beyond 20510 / 20912; but absent a public report with version info, they remain suspected, not confirmed.

  • Fixes or regressions may occur in particular platform/distribution builds (e.g. Debian, Ubuntu, Gentoo, MSYS2) — so even a version number doesn’t guarantee consistent behavior if compilation flags, patches, packaging differ.

Finally, note that some “errors” that a user considers as “validation failure” (e.g. namespace prefix undefined, missing DTD, missing schema import) may be treated internally differently — perhaps as “warnings” rather than validation errors — which may also affect exit codes.


🎯 What this means in practice

  • You cannot rely solely on xmllint’s exit code to guarantee that validation (schema, namespace, DTD) succeeded — at least not on all systems.

  • If you're writing scripts or CI-checks that must catch invalid XML, you should combine exit-code checking and parse stdout / stderr (look for “error”, “failed to validate”, etc.).

  • If you control your toolchain, it may be wise to upgrade to a recent libxml2 build, test that exit code works as expected, and possibly patch or wrap xmllint.

  • Before trusting xmllint in a new environment, you should run a minimal test: intentionally invalid XML (e.g. undefined namespace, schema violation), check exit code. That empirical test is more reliable than trusting version numbers.


If you like — I can try to survey (via open-source repositories and distributions) what the “latest” libxml2 versions (2024–2025) do — i.e. test whether exit-code behavior has been fixed upstream. That might help you decide if updating helps. Do you want me to start such a survey?

Good question. I dug through public bug reports, mailing-lists and forum posts; unfortunately there is *not* a clean, authoritative “matrix” of every version of xmllint / libxml2 ⇢ exit-code behavior. But I *can* summarise the **known confirmed cases** — plus what is *unknown*. Use this as a working “table of suspicion” (not a guarantee).

✅ What we know (sources / reports)

libxml2 / xmllint version (or build) Observed “bad” behavior (validation or namespace error → exit code 0) Source / context
“libxml version 20912” (on Ubuntu 20.04) Namespace error (undefined prefix) — but exit code 0 Mailing-list report: user on Ubuntu 20.04 said xmllint --noout text.xml; echo $? produced 0 despite a “namespace error : Namespace prefix v … not defined” message. ([GNOME Mail]1)
“libxml version 20510” (on some older distro) Schema validation failure (invalid content per XSD) — but exit code 0 A report on Unix.com: user ran xmllint --schema … myinput.xml and saw “fails to validate” message, but $? was 0. Their xmllint reported “using libxml version 20510” in that session. ([Unix Linux Community]2)
(Various versions — unspecified) Inconsistent exit codes: sometimes non-zero, sometimes zero — depending on error type or build The upstream bug tracker has an issue titled “xmllint doesn’t always terminate with a non-zero exit status in case of bad usage”. ([GitLab GNOME]3)
Unspecified builds (e.g. older Debian / SGML-/DTD-based validation) HTML / DTD validation sometimes fails but exit code remains 0 A Debian bug report “XHTML 1.0 validation is broken” includes commentary that “the return code of xmllint is 0 … don’t look at it; it is not reliable.” ([Google Groups]4)

Summary of what’s reliably documented: there are definitely some builds of libxml2 / xmllint — including version 20510 and 20912 — that have been observed to treat validation or namespace errors as “no error” as far as the exit status is concerned.


⚠ What we don’t know (lack of evidence / no systematic listing)

  • There is no publicly maintained, comprehensive registry that maps every libxml2 release (or distro build) to “good” vs “bad” exit-code behavior.
  • Many bug reports or forum threads omit the exact version (or build metadata) of xmllint, or only mention a generic “xmllint” — so you can’t reliably attribute behavior to a version number.
  • There may well be additional “bad” versions beyond 20510 / 20912; but absent a public report with version info, they remain suspected, not confirmed.
  • Fixes or regressions may occur in particular platform/distribution builds (e.g. Debian, Ubuntu, Gentoo, MSYS2) — so even a version number doesn’t guarantee consistent behavior if compilation flags, patches, packaging differ.

Finally, note that some “errors” that a user considers as “validation failure” (e.g. namespace prefix undefined, missing DTD, missing schema import) may be treated internally differently — perhaps as “warnings” rather than validation errors — which may also affect exit codes.


🎯 What this means in practice

  • You cannot rely solely on xmllint’s exit code to guarantee that validation (schema, namespace, DTD) succeeded — at least not on all systems.
  • If you're writing scripts or CI-checks that must catch invalid XML, you should combine exit-code checking and parse stdout / stderr (look for “error”, “failed to validate”, etc.).
  • If you control your toolchain, it may be wise to upgrade to a recent libxml2 build, test that exit code works as expected, and possibly patch or wrap xmllint.
  • Before trusting xmllint in a new environment, you should run a minimal test: intentionally invalid XML (e.g. undefined namespace, schema violation), check exit code. That empirical test is more reliable than trusting version numbers.

If you like — I can try to survey (via open-source repositories and distributions) what the “latest” libxml2 versions (2024–2025) do — i.e. test whether exit-code behavior has been fixed upstream. That might help you decide if updating helps. Do you want me to start such a survey?

Originally posted by @climbfuji in #691 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions