Skip to content

Conversation

@stunes-ms
Copy link
Contributor

For brownout recovery purposes, we want to avoid slamming MAA/AKV with traffic before trying to locally unseal CVMs. OpenHCL will currently make 10 attestation/SKR attempts before trying to use hardware unsealing.

This change moves the loop inside of request_vmgs_encryption_keys out to initialize_platform_security, looping over SKR and key derivation (which includes trying hardware unseal).

@stunes-ms stunes-ms requested a review from a team as a code owner January 9, 2026 19:12
Copilot AI review requested due to automatic review settings January 9, 2026 19:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the retry logic for VMGS encryption key retrieval in OpenHCL attestation. The retry loop is moved from request_vmgs_encryption_keys to initialize_platform_security to enable hardware unsealing attempts between SKR (Secure Key Release) attempts. This is intended to reduce load on MAA/AKV services during brownout recovery by trying local hardware unsealing before retrying remote attestation.

Key Changes:

  • Moved retry loop from request_vmgs_encryption_keys to initialize_platform_security
  • Changed return type of request_vmgs_encryption_keys to include a retry signal boolean
  • Created new unlock_vmgs helper function that attempts SKR and then key derivation (including hardware unsealing)
  • Updated error handling to propagate retry signals through the call stack

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
openhcl/underhill_attestation/src/secure_key_release.rs Removed retry loop logic, changed return type to include retry signal, removed LocalDriver parameter and MaximumAttemptsReached error variant
openhcl/underhill_attestation/src/lib.rs Added unlock_vmgs function, refactored initialize_platform_security with new retry loop, updated error handling to propagate retry signals, updated tests to reflect new attempt count

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

@stunes-ms stunes-ms changed the title Attempt hardware unseal between SKR attempts openhcl/underhill_attestation: Attempt hardware unseal between SKR attempts Jan 9, 2026
VecDeque::from([
IgvmAgentAction::RespondSuccess,
// initialize_platform_security will attempt SKR/unlock 10 times
IgvmAgentAction::RespondFailure,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see the 10 retries on your local test? I recalled that the test agent in GED currently returns retry signal as false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if sealing is available, shouldn't the boot succeed in the very first failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you're right -- let me play with this a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did confirm that if I modified the test agent to return retry=true, I see the retries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Not-blocking- We can have a follow up to let retry bit configurable. Maybe part of the plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did confirm that if I modified the test agent to return retry=true, I see the retries.

But if the hw resiliency kicks on, why do you see retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I should have specified: for init_sec_secure_key_release_no_hw_sealing_backup, where HW sealing fails, I see retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants