Skip to content

Conversation

@yamahata
Copy link
Contributor

TDX module supports timer service for L1. When L1 writes tdcs value with TDG.VP.WR(TSC DEADLINE), tdg.vp.enter() exits with timer preemption when deadline expires. (Not injecting timer interrupt to L2 guest).

Update tdx vp context shared between L1 kernel and L1 userspace so that openvmm can use TDX timer service.
Unless the userspace uses it, the L1 OHCL kernel behavior keeps the same behavior as before.

@yamahata yamahata force-pushed the ohcl-tdx-timer-service-2025-11-13 branch 4 times, most recently from 1e0e3f7 to 4b8ffbb Compare November 20, 2025 09:23
@yamahata yamahata changed the title Work-In-Progress: Ohcl tdx timer service support Ohcl tdx timer service support Nov 20, 2025
@yamahata
Copy link
Contributor Author

Now this kernel successfully boots L2 linux kernel. So I removed work-in-progress

@yamahata yamahata force-pushed the ohcl-tdx-timer-service-2025-11-13 branch 3 times, most recently from a6df45b to 8cce60e Compare November 20, 2025 19:33
@hargar19 hargar19 requested review from chris-oo and dcui November 24, 2025 19:47
*
* TDX TDVPS deadline:
* 0: immediate inject timer interrupt.
* -1: disarmed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this disarmed value present in the spec, or just an effect of setting an all Fs TSC value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the public spec,
Intel® Trust Domain Extensions (Intel® TDX) Module
TD Partitioning Architecture Specification
354807-005US
September 2025

23.13.2. L2 VM TSC Deadline Support

Setting TSC_DEADLINE to -1 disables its operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this reference on the next update.

};

/*
* The L1 VMM needs to tell wake up time from HLT emulation because The host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalization here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will fix. Do you mean "The" => "the" after "because". If not, please concretely point out which word to captlize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the because The here should be because the

}
raw_local_irq_enable();
} else {
enum TDX_HALT_TIMER armed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i'd want some other reviewers to chime in on how they want to manage this TDX specific code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this change would be intrusive, review from someone else would help.

Copy link
Contributor

@dcui dcui Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a lot of x86/TDX-specific code in drivers/hv/mshv_vtl_main.c:

root@decui:~/OHCL-Linux-Kernel# grep CONFIG_X86_64 drivers/hv/mshv_vtl_main.c  | wc -l
21
root@decui:~/OHCL-Linux-Kernel# grep  CONFIG_INTEL_TDX_GUEST  drivers/hv/mshv_vtl_main.c  | wc -l
11
root@decui:~/OHCL-Linux-Kernel# grep  CONFIG_SEV_GUEST  drivers/hv/mshv_vtl_main.c  | wc -l
2

This PR adds more x86/TDX-specific code:

root@decui:~/OHCL-Linux-Kernel# grep CONFIG_X86_64 drivers/hv/mshv_vtl_main.c  | wc -l
28
root@decui:~/OHCL-Linux-Kernel# grep  CONFIG_INTEL_TDX_GUEST  drivers/hv/mshv_vtl_main.c  | wc -l
19
root@decui:~/OHCL-Linux-Kernel# grep  CONFIG_SEV_GUEST  drivers/hv/mshv_vtl_main.c  | wc -l
2

IMO ideally the x86/TDX-specific code should be moved to "arch/x86/hyperv/hv_vtl.c", but that would require a major refactoring and I suppose we would not like to block this PR for too long...

struct mshv_vtl_per_cpu *per_cpu = this_cpu_ptr(&mshv_vtl_per_cpu);
u64 vm_idx = TDG_VP_ENTRY_VM_IDX(context->entry_rcx);

if (is_tdx_vm_idx_valid(vm_idx))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use the prev value here because there wasn't an update call on this run? This handles the case when the timer was disarmed or disabled by the guest (because we set a large value of 0xFFs), is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use the prev value here because there wasn't an update call on this run?

Yes.

This handles the case when the timer was disarmed or disabled by the guest (because we set a large value of 0xFFs), is that right?

Yes. There are several cases covered. The scenarios are

  • the L2(VTL0) guest updates timer => userspace openvmm sets deadline and update=1 and run L2 vCPU. This can be arming or disarming depending on the value.
  • optional: The kernel may run L2 vCPU and back to L1(VTL2) before expiring timer. Set case update = 0, and remember the value in previous value.
  • In L1 kernel, go to HLT emulation. In the sentence try to find the timer expiring value. in context if update = 1 or remembered previous value.

@yamahata yamahata force-pushed the ohcl-tdx-timer-service-2025-11-13 branch from ac07f44 to 49cea0c Compare November 26, 2025 22:11
TD partitioning provides a timer service for L1 (VTL2) guest to set
a preemption timer for L2 (VTL0) vCPUs.

Add members for a new timer service to the tdx_vp_context struct for
the L1 (VTL2) userspace to pass a timeout value down to the L1 (VTL2)
kernel.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Refactor __tdcall() for a dedicated wrapper for TDG.VP.WR() operation.
This prepares for additional calls of TDG.VP.WR() cleanly while avoiding
repeated open-coding.

No functional change intended.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Program the TD partitioning TSC deadline timer service for L2 (VTL0) vCPUs
when the L1 (VTL2) userspace requests.  Then, the TDX module sets
preemption timer for L2 vCPU.  If the timer expires, the L2 (VTL0) vCPU
exits with a VMX preemption timer exit reason.  The mshv_vtl driver then
exits to the userspace, and the userspace is notified of the exit.

The TDX module does not clear TDVPS deadline on a preemption timer exit.
Disarm the TSC deadline explicitly on the preemption timer exit.  Otherwise
the following TDG.VP.ENTER() immediately exits without executing the L2
guest.

Reported-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
As the tdcall is slow, cache the previously written TSC deadline value and
skip unnecessary tdg.vp.wr(TSC deadline) if the value doesn't change.  This
is also a preparation for hlt emulation case that requires the previously
written TSC deadline value.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
The TDX timer service sets a preemption timer for the L2 (VTL0) vCPU.
tdg.vp.enter() exits with preemption timer exit reason on timer expiry.
The HLT emulation path needs extra change where the L1 (VTL2) kernel issues
TDG.VP.VMCALL(HLT) because the host (L0) VMM doesn't know the L2 deadline
timer value.

When the L1 kernel issues TDG.VP.VMCALL(HLT), start per-CPU hrtimer to wake
up from the L0 HLT emulation by L1 getting timer interrupt.  Cancel the
hrtimer after it returns from the L0 VMM.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
On timer expiry path, it unconditionally issues
tdg.vp.wr(TSC deadline = disarm).  The following tdg.vp.enter() execution
path may overwrite tdg.vp.wr(new TSC deadline).  Delete the duplicated
tdg.vp.wr() call as optimization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Add an extension for the TDX timer service, so that the userspace can query
the feature before use.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
…akeup AP callback")

The commit df21bf3 ("arch/x86: Provide the CPU number in the wakeup AP
callback") changed the signature of struct apic::wakeup_secondary_cpu(),
but it did not update numachip_wakeup_secondary().  Update it to fix the
compile error.

arch/x86/kernel/apic/apic_numachip.c:228:43: error: initialization of 'int (*)(u32,  long unsigned int,  unsigned int)' {aka 'int (*)(unsigned int,  long unsigned int,  unsigned int)'} from incompatible pointer type 'int (*)(u32,  long unsigned int)' {aka 'int (*)(unsigned int,  long unsigned int)'} [-Wincompatible-pointer-types]
  228 |         .wakeup_secondary_cpu           = numachip_wakeup_secondary,
      |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: df21bf3 ("arch/x86: Provide the CPU number in the wakeup AP callback")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
@yamahata yamahata force-pushed the ohcl-tdx-timer-service-2025-11-13 branch from 49cea0c to 8bfc1c1 Compare December 10, 2025 18:22
Copy link
Contributor

@dcui dcui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me.

@hargar19 hargar19 merged commit f526808 into microsoft:product/hcl-main/6.12 Dec 10, 2025
5 of 6 checks passed
hargar19 added a commit that referenced this pull request Dec 11, 2025
Fix arm64 build failure caused by this PR #107

build failure:
drivers/hv/mshv_vtl_main.c: In function ‘mshv_tdx_setup_halt_timer’: drivers/hv/mshv_vtl_main.c:1163:15: error: implicit declaration of function ‘rdtsc’ [-Werror=implicit-function-declaration]
1163 | now = rdtsc();
| ^~~~~
drivers/hv/mshv_vtl_main.c:1170:73: error: ‘tsc_khz’ undeclared (first use in this function)
1170 | time = mul_u64_u64_div_u64(deadline - now, 1000 * 1000, tsc_khz);
| ^~~~~~~
drivers/hv/mshv_vtl_main.c:1170:73: note: each undeclared identifier is reported only once for each function it appears in
drivers/hv/mshv_vtl_main.c: In function ‘mshv_vtl_switch_to_vtl0_irqoff’:
drivers/hv/mshv_vtl_main.c:1242:49: error: ‘MSHV_VTL_RUN_FLAG_HALTED’ undeclared (first use in this function)
1242 | armed = mshv_tdx_halt_timer_pre(flags & MSHV_VTL_RUN_FLAG_HALTED);
| ^~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
smalis-msft pushed a commit to microsoft/openvmm that referenced this pull request Dec 16, 2025
Implements #2028

This PR implements hardware timer virtualization for lower VTLs on TDX
CVM usingL2-VM TSC deadline timer, an architectural capability provided
by TDX module. This improves the CVM performance by eliminating guest
exits to hypervisor for timer arming and expiry notifications for lower
VTL's timer requirements.

The related changes in OHCL-kernel is implemented by
microsoft/OHCL-Linux-Kernel#107

### Background - TDX L2-VM TSC Deadline Timer
This allows VTL2 to set an execution deadline for lower VTLs. If the
lower VTL is running when the deadline time arrives, it exits to VTL2
with exit reason `VmxExitBasic::TIMER_EXPIRED`.
If the TSC deadline is in the past during entry into lower VTL (i.e.,
TSC deadline value is lower than the current virtual TSC value), it will
immediately exit back to VTL2 with exit reason
`VmxExitBasic::TIMER_EXPIRED`.

The TSC deadline is set using `TDG.VP.WR` for `TDVPS.TSC_DEADLINE[L2-VM
Index]`.

### Implementation
- With these changes, openvmm evaluates earliest deadline across all
lower VTLs and sets it in a `tdx_vp_context ->tdx_l2_tsc_deadline_state
` that is shared with `mshv_vtl` driver.
- During entry into lower VTL, `mshv_vtl` driver makes the `TDG.VP.WR`
call to set the deadline when an update is needed.

### Changes
- Added `HardwareIsolatedGuestTimer` trait as an abstraction for
managing lower VTL timer deadlines.
- Moved current `VmTime` interface as default/fallback implementation
into this trait.
- Added `TdxTscDeadlineService` to implement the TDX specific timer
virtualization.
balajimc55 added a commit to balajimc55/openvmm that referenced this pull request Dec 16, 2025
Implements microsoft#2028

This PR implements hardware timer virtualization for lower VTLs on TDX
CVM usingL2-VM TSC deadline timer, an architectural capability provided
by TDX module. This improves the CVM performance by eliminating guest
exits to hypervisor for timer arming and expiry notifications for lower
VTL's timer requirements.

The related changes in OHCL-kernel is implemented by
microsoft/OHCL-Linux-Kernel#107

This allows VTL2 to set an execution deadline for lower VTLs. If the
lower VTL is running when the deadline time arrives, it exits to VTL2
with exit reason `VmxExitBasic::TIMER_EXPIRED`.
If the TSC deadline is in the past during entry into lower VTL (i.e.,
TSC deadline value is lower than the current virtual TSC value), it will
immediately exit back to VTL2 with exit reason
`VmxExitBasic::TIMER_EXPIRED`.

The TSC deadline is set using `TDG.VP.WR` for `TDVPS.TSC_DEADLINE[L2-VM
Index]`.

- With these changes, openvmm evaluates earliest deadline across all
lower VTLs and sets it in a `tdx_vp_context ->tdx_l2_tsc_deadline_state
` that is shared with `mshv_vtl` driver.
- During entry into lower VTL, `mshv_vtl` driver makes the `TDG.VP.WR`
call to set the deadline when an update is needed.

- Added `HardwareIsolatedGuestTimer` trait as an abstraction for
managing lower VTL timer deadlines.
- Moved current `VmTime` interface as default/fallback implementation
into this trait.
- Added `TdxTscDeadlineService` to implement the TDX specific timer
virtualization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants