meshtls: log errors parsing client certs#2467
Conversation
This commit changes the `linkerd-meshtls-rustls` crate to use the upstream `rustls-webpki` crate, maintained by Rustls, rather than our fork of `briansmith/webpki` from GitHub. Since `rustls-webpki` includes the change which was the initial motivation for the `linkerd/webpki` fork (rustls/webpki#42), we can now depend on upstream.
This picks up the upstream fix for rustls/webpki#167
Currently, if errors occur while parsing a client identity from a TLS certificate, the `client_identity` function in `linkerd-meshtls-rustls` will simply discard the error and return `None`. This means that we cannot easily determine *why* a connection has no client identity --- there may have been no client cert, but we may also have failed to parse a client cert that was present. In order to make debugging these issues a little easier, I've changed this function to log any errors returned by `rustls-webpki` while parsing client certs.
olix0r
left a comment
There was a problem hiding this comment.
my understanding is that the libraries already have some onerous logging behavior in these cases. Is that true? Can you test what happens when this behavior is triggered?
I believe you're thinking of These errors would occur if a TLS handshake completed successfully, but we couldn't extract a valid DNS name from the client cert in that handshake. In that case, |
328826caa updated the balancer's discovery channel to prevent backing up into the discovery stream by dropping the discovery stream. This results in balancers becoming permanently stale (should they ever be used again). This change modifies the discovery stream so that these errors are fatal for the balancer. These errors are recorded distinctly by the error counters. To fix this, we replace the `DiscoverNew` module with a `discover::NewServices` module that wraps the buffering layer. The buffer now only holds target metadata, and services are only built as the entry is dequeued from channel. This has the (positive) side-effect that the proxy's stack_create_total metric will not be incremented before the balancer actually uses an endpoint stack. Previously, this metric would be incremented for all queued endpoint updates. We also now log at INFO the address of all additions and removals from a balancer. This should dramatically improve diagnostics in stale endpoint situations. --- * build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2460) * build(deps): bump tj-actions/changed-files from 36.2.1 to 39.0.2 (linkerd/linkerd2-proxy#2468) * build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.0 to 1.5.4 (linkerd/linkerd2-proxy#2448) * meshtls: log errors parsing client certs (linkerd/linkerd2-proxy#2467) * build(deps): bump actions/checkout from 3.5.0 to 4.1.0 (linkerd/linkerd2-proxy#2474) * build(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0 (linkerd/linkerd2-proxy#2475) * build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.4 to 1.5.5 (linkerd/linkerd2-proxy#2478) * build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2476) * build(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3 (linkerd/linkerd2-proxy#2479) * Render grpc_status metric label as number (linkerd/linkerd2-proxy#2480) * balance: Log and fail stuck discovery streams. (linkerd/linkerd2-proxy#2484) * build(deps): update `rustix` to v0.36.16/v0.37.7 (linkerd/linkerd2-proxy#2488) * balance: Fail the discovery stream on queue backup (linkerd/linkerd2-proxy#2486) Signed-off-by: Oliver Gould <ver@buoyant.io>
328826caa updated the balancer's discovery channel to prevent backing up into the discovery stream by dropping the discovery stream. This results in balancers becoming permanently stale (should they ever be used again). This change modifies the discovery stream so that these errors are fatal for the balancer. These errors are recorded distinctly by the error counters. To fix this, we replace the `DiscoverNew` module with a `discover::NewServices` module that wraps the buffering layer. The buffer now only holds target metadata, and services are only built as the entry is dequeued from channel. This has the (positive) side-effect that the proxy's stack_create_total metric will not be incremented before the balancer actually uses an endpoint stack. Previously, this metric would be incremented for all queued endpoint updates. We also now log at INFO the address of all additions and removals from a balancer. This should dramatically improve diagnostics in stale endpoint situations. --- * build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2460) * build(deps): bump tj-actions/changed-files from 36.2.1 to 39.0.2 (linkerd/linkerd2-proxy#2468) * build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.0 to 1.5.4 (linkerd/linkerd2-proxy#2448) * meshtls: log errors parsing client certs (linkerd/linkerd2-proxy#2467) * build(deps): bump actions/checkout from 3.5.0 to 4.1.0 (linkerd/linkerd2-proxy#2474) * build(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0 (linkerd/linkerd2-proxy#2475) * build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.4 to 1.5.5 (linkerd/linkerd2-proxy#2478) * build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2476) * build(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3 (linkerd/linkerd2-proxy#2479) * Render grpc_status metric label as number (linkerd/linkerd2-proxy#2480) * balance: Log and fail stuck discovery streams. (linkerd/linkerd2-proxy#2484) * build(deps): update `rustix` to v0.36.16/v0.37.7 (linkerd/linkerd2-proxy#2488) * balance: Fail the discovery stream on queue backup (linkerd/linkerd2-proxy#2486) Signed-off-by: Oliver Gould <ver@buoyant.io>
Depends on #2465
Currently, if errors occur while parsing a client identity from a TLS certificate, the
client_identityfunction inlinkerd-meshtls-rustlswill simply discard the error and returnNone. This means that we cannot easily determine why a connection has no client identity --- there may have been no client cert, but we may also have failed to parse a client cert that was present.In order to make debugging these issues a little easier, I've changed this function to log any errors returned by
rustls-webpkiwhile parsing client certs.