Skip to content

Conversation

@erfrimod
Copy link
Contributor

@erfrimod erfrimod commented Jan 9, 2026

Adding tracing spans to control-path operations in VF Manager to try and help answer two questions:

  1. If a VF Manager operation like ManaDeviceRemoved fails to complete in a timely fashion, which 'await' call are we stuck in? It could be try_notify_guest_and_revoke_vtl0_vf(), shutdown_vtl2_device(), or update_vtl2_device_bind_state() and tracing spans will provide entry and exit opcodes.
  2. If a VF Manager operation like ManaDeviceRemoved starts taking longer and missing SLAs, tracing spans can help us narrow down which call is taking up additional time.
  • Improves NetVSP observability by adding structured tracing spans around VF Manager operations: VTL2 device startup/shutdown, VTL0 VF arrival/removal/revoke, endpoint disconnects, bind-state updates.
  • Makes NetVSP device logging more actionable by adding instance_id and/or channel_idx to queue/channel errors: open/close/restore.
  • Minor consistency improvements to error tracing. err => error and making that field the first element.
  • Readability improvement to state transition logic in DataPathSwitchPending handler. Turning double-nested 'if' check into a 'match(a,b)'.
  • Fixes to minor typos found in comments in net_mana and tracelimit..

Copilot AI review requested due to automatic review settings January 9, 2026 00:22
@erfrimod erfrimod requested a review from a team as a code owner January 9, 2026 00:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances observability and tracing in the NetVSP device by adding structured tracing spans around VF Manager control-path operations and improving error logging consistency.

Key changes:

  • Adds tracing spans to VF Manager operations (VTL2 device startup/shutdown, VTL0 VF arrival/removal/revoke, endpoint disconnects, bind-state updates) to help identify which async operations are blocking or taking too long
  • Standardizes error field naming from err to error and positions error fields first in trace logs for consistency
  • Adds instance_id and/or channel_idx context to device and channel error logs for better diagnostics
  • Refactors nested conditionals to match expressions for improved readability
  • Fixes minor typos in comments

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
vm/devices/net/netvsp/src/lib.rs Standardizes error logging (err→error), adds instance_id/channel_idx context to device errors, refactors state transition logic, removes redundant imports, fixes typos
vm/devices/net/net_mana/src/lib.rs Fixes comment typos (Hardware→hardware, Encapsulation→encapsulation, affects grammar)
support/tracelimit/src/lib.rs Fixes typo in documentation example (custome→custom)
openhcl/underhill_core/src/emuplat/netvsp.rs Adds Display impl for Vtl0Bus, adds tracing spans to VF Manager operations, extracts update_vtl2_device_bind_state helper method, adds save_state tracing

chris-oo
chris-oo previously approved these changes Jan 12, 2026
Copy link
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but see comment if that's useful or not.

PrimaryChannelGuestVfState::DataPathSynthetic
} else {

match (to_guest, result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to log the actual matrix of errors here or is it not useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants