-
Notifications
You must be signed in to change notification settings - Fork 162
openhcl/underhill_attestation: Attempt hardware unseal between SKR attempts #2630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the retry logic for VMGS encryption key retrieval in OpenHCL attestation. The retry loop is moved from request_vmgs_encryption_keys to initialize_platform_security to enable hardware unsealing attempts between SKR (Secure Key Release) attempts. This is intended to reduce load on MAA/AKV services during brownout recovery by trying local hardware unsealing before retrying remote attestation.
Key Changes:
- Moved retry loop from
request_vmgs_encryption_keystoinitialize_platform_security - Changed return type of
request_vmgs_encryption_keysto include a retry signal boolean - Created new
unlock_vmgshelper function that attempts SKR and then key derivation (including hardware unsealing) - Updated error handling to propagate retry signals through the call stack
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| openhcl/underhill_attestation/src/secure_key_release.rs | Removed retry loop logic, changed return type to include retry signal, removed LocalDriver parameter and MaximumAttemptsReached error variant |
| openhcl/underhill_attestation/src/lib.rs | Added unlock_vmgs function, refactored initialize_platform_security with new retry loop, updated error handling to propagate retry signals, updated tests to reflect new attempt count |
| VecDeque::from([ | ||
| IgvmAgentAction::RespondSuccess, | ||
| // initialize_platform_security will attempt SKR/unlock 10 times | ||
| IgvmAgentAction::RespondFailure, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see the 10 retries on your local test? I recalled that the test agent in GED currently returns retry signal as false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if sealing is available, shouldn't the boot succeed in the very first failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you're right -- let me play with this a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did confirm that if I modified the test agent to return retry=true, I see the retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Not-blocking- We can have a follow up to let retry bit configurable. Maybe part of the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did confirm that if I modified the test agent to return retry=true, I see the retries.
But if the hw resiliency kicks on, why do you see retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I should have specified: for init_sec_secure_key_release_no_hw_sealing_backup, where HW sealing fails, I see retries.
For brownout recovery purposes, we want to avoid slamming MAA/AKV with traffic before trying to locally unseal CVMs. OpenHCL will currently make 10 attestation/SKR attempts before trying to use hardware unsealing.
This change moves the loop inside of request_vmgs_encryption_keys out to initialize_platform_security, looping over SKR and key derivation (which includes trying hardware unseal).