drivers: hv: Support for SNP Secure AVIC#129
drivers: hv: Support for SNP Secure AVIC#129hargar19 merged 6 commits intomicrosoft:product/hcl-main/6.18from
Conversation
When Secure AVIC is enabled, VMBus driver should call x2apic Secure AVIC interface to allow Hyper-V to inject VMBus message interrupt. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
When Secure AVIC is enabled, call Secure AVIC function to allow Hyper-V to inject STIMER0 interrupt. Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
When Secure AVIC is available, the AMD x2apic Secure AVIC driver will be selected. In that case, have hv_apic_init() return immediately without doing anything. Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Hyper-V doesn't support auto-eoi with Secure AVIC. So set the HV_DEPRECATING_AEOI_RECOMMENDED flag to force writing the EOI register after handling an interrupt. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the Hyper-V VTL/MShv stack to better support hardware-isolated guests, adding SEV-SNP Secure AVIC related plumbing (including new ioctls and page handling) and additional TDX-oriented in-kernel acceleration paths (interrupt handling, timers, redirected interrupts), plus associated platform/boot changes.
Changes:
- Add new MSHV VTL ioctls and UAPI structures for SEV-SNP operations (pvalidate/rmpadjust/rmpquery/invlpgb/tlbsync) and Secure AVIC PFN reporting, plus redirected device interrupt mapping.
- Introduce a per-CPU “local mapping” mechanism to temporarily map PFNs for SNP operations, and expand mshv_vtl fast-path interrupt handling for SNP/TDX.
- Adjust Hyper-V/x86 platform initialization and CPU bring-up paths (realmode limits, wakeup mailbox DT support, Secure AVIC integration).
Reviewed changes
Copilot reviewed 48 out of 49 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| include/uapi/linux/mshv.h | New capability, structs, and ioctls for SNP/TDX/VTL features. |
| include/hyperv/hvhdk.h | Add x64 exception intercept message definitions. |
| include/hyperv/hvgdk_mini.h | Add new register IDs and helper structs for VP register operations/time restore. |
| include/asm-generic/mshyperv.h | Add hv_enable_coco_interrupt() prototype. |
| drivers/hv/mshv_vtl_main.c | Large expansion for SNP Secure AVIC + TDX fast paths, new ioctls, proxy interrupt handling, time restore, etc. |
| drivers/hv/mshv_vtl_local_maps.h | New header for SNP local PFN mapping facility. |
| drivers/hv/mshv_vtl_local_maps.c | New implementation for per-CPU VA “local maps” PFN mapping. |
| drivers/hv/mshv_vtl.h | Extend run-page structures and TDX state/offload flags. |
| drivers/hv/mshv_tdx_asm_offsets.c | New offsets generator for TDX assembly. |
| drivers/hv/mshv_tdx.S | New assembly helper for TDG.VP.ENTER. |
| drivers/hv/hv_common.c | Adjust VP assist page handling for CoCo guests and add weak interrupt hook. |
| drivers/hv/hv.c | Call hv_enable_coco_interrupt() when enabling/disabling SynIC regs. |
| drivers/hv/Makefile | Add build rules/objects for TDX asm + local maps, but reorganizes object lists. |
| drivers/hv/Kconfig | Select USER_RETURN_NOTIFIER for MSHV_VTL on x86. |
| block/blk.h | Add pgmap match check to physical mergeability path. |
| block/bio.c | Reorder/merge pgmap check into merge attempt path. |
| arch/x86/realmode/init.c | Make realmode trampoline allocation limit configurable via x86_init.resources.realmode_limit. |
| arch/x86/pci/init.c | Downgrade missing config-space access message to KERN_INFO. |
| arch/x86/kernel/x86_init.c | Set default realmode_limit to 1M. |
| arch/x86/kernel/e820.c | Downgrade PCI gap failure logs from pr_err to pr_info on x86_64. |
| arch/x86/kernel/devicetree.c | Add parsing for DT wakeup mailbox and adjust IOAPIC DT logging. |
| arch/x86/kernel/cpu/mshyperv.c | Export hv_save_sched_clock_state()/hv_restore_sched_clock_state(); tweak hints for Secure AVIC. |
| arch/x86/kernel/cpu/bugs.c | Downgrade some retbleed logs from pr_err to pr_warn. |
| arch/x86/kernel/cpu/amd.c | Add Hyper-V include, workarounds/cap changes (incl. INVLPGB cap clear). |
| arch/x86/kernel/apic/x2apic_savic.c | Integrate Secure AVIC backing page init and Hyper-V SNP path. |
| arch/x86/kernel/acpi/madt_wakeup.c | Export helpers to configure/read MP wakeup mailbox. |
| arch/x86/include/asm/x86_init.h | Add realmode_limit field to x86_init_resources. |
| arch/x86/include/asm/topology.h | Add linux/cache.h include under CONFIG_SCHED_MC_PRIO. |
| arch/x86/include/asm/tdx.h | Add tdx_safe_halt() declaration/stub. |
| arch/x86/include/asm/svm.h | Add V_INT_SHADOW and V_GUEST_BUSY flag definitions. |
| arch/x86/include/asm/sev.h | Add RMPADJUST flag bits, rmpquery helper, and SNP VTL return API. |
| arch/x86/include/asm/mshyperv.h | Add early input arg pointer, exported clock state funcs, Secure AVIC helper decls, and VTL-mode APIs. |
| arch/x86/include/asm/apic.h | Add x2apic_savic_init_backing_page() declaration/stub. |
| arch/x86/include/asm/acpi.h | Add MP wakeup mailbox helpers declarations/stubs. |
| arch/x86/hyperv/ivm.c | Add hv_set_savic_backing_page() and adjust SNP AP bring-up path for VTL mode. |
| arch/x86/hyperv/hv_vtl.c | Rework VTL platform init/CPU bring-up; handle TDX private MMIO and realmode limit. |
| arch/x86/hyperv/hv_trampoline.S | Remove old TDX trampoline stub. |
| arch/x86/hyperv/hv_init.c | Allocate decrypted early input pages for Secure AVIC and update CPU init for STIMER vector. |
| arch/x86/hyperv/hv_apic.c | Implement hv_enable_coco_interrupt(); skip hv_apic_init on Secure AVIC. |
| arch/x86/hyperv/Makefile | Drop hv_trampoline.o from build. |
| arch/x86/coco/sev/core.c | Add snp_mshv_vtl_return() implementation. |
| arch/x86/Kconfig | Relax TDX/AMD_MEM_ENCRYPT dependencies and UNACCEPTED_MEMORY selection. |
| arch/arm64/include/asm/mshyperv.h | Add VTL init APIs and mshv_vtl_return() stub. |
| arch/arm64/hyperv/hv_vtl.c | Minor whitespace change around mshv_vtl_return_call(). |
| Microsoft/x64-cvm.config | Enable ACPI and related options for CVM config. |
| Microsoft/hcl-x64.config | Disable multiple mitigations and add PCI quirk config toggle. |
| Microsoft/hcl-arm64.config | Disable several ARM64 hardening/features in config. |
| MSFT-Merge/config.json | Update merge branch reference. |
| Documentation/devicetree/bindings/reserved-memory/intel,wakeup-mailbox.yaml | New DT binding for Intel wakeup mailbox reserved memory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pte_t *pte = pte_offset_kernel(pmdp, vaddr); | ||
| if (!pte_present(*pte)) { | ||
| pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL); | ||
| if (!pte) | ||
| goto error; | ||
| page_tables_allocated++; | ||
|
|
||
| if (!mshv_vtl_local_map_list_add_entry(pte, PTE_PAGE, maps)) | ||
| goto error; | ||
| } |
There was a problem hiding this comment.
The PTE allocation logic looks incorrect: pte_offset_kernel() returns a pointer to a PTE entry within the already-populated PTE page (ptep), but on !pte_present(*pte) this code allocates a new page (get_zeroed_page) and never links it into the page tables, effectively leaking it. If the goal is just to have the PTE page present, the allocation should happen only once via pmd_populate_kernel(), and per-entry PTEs should remain zero until used.
a646613 to
1e7db3f
Compare
1e7db3f to
7a8ff1c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| u64 u64; | ||
| struct { | ||
| u64 enabled:1; | ||
| u64 reserved:11; | ||
| u64 pagenumber:52; | ||
| }; | ||
| } __packed; | ||
|
|
||
| struct hv_set_vp_registers_input { | ||
| struct { | ||
| u64 partitionid; | ||
| u32 vpindex; | ||
| u8 inputvtl; | ||
| u8 padding[3]; | ||
| } header; | ||
| struct { | ||
| u32 name; | ||
| u32 padding1; | ||
| u64 padding2; | ||
| union { | ||
| union hv_register_value value; | ||
| struct { | ||
| u64 valuelow; | ||
| u64 valuehigh; | ||
| }; | ||
| }; | ||
| } element[]; |
There was a problem hiding this comment.
The newly added hv_x64_register_sev_gpa_page / hv_set_vp_registers_input blocks use inconsistent indentation (spaces instead of tabs) and omit __packed on the inner bitfield struct (most similar unions in this header mark the bitfield struct as __packed). Please align formatting with surrounding code and ensure packing annotations are consistent to avoid layout surprises.
| u64 u64; | |
| struct { | |
| u64 enabled:1; | |
| u64 reserved:11; | |
| u64 pagenumber:52; | |
| }; | |
| } __packed; | |
| struct hv_set_vp_registers_input { | |
| struct { | |
| u64 partitionid; | |
| u32 vpindex; | |
| u8 inputvtl; | |
| u8 padding[3]; | |
| } header; | |
| struct { | |
| u32 name; | |
| u32 padding1; | |
| u64 padding2; | |
| union { | |
| union hv_register_value value; | |
| struct { | |
| u64 valuelow; | |
| u64 valuehigh; | |
| }; | |
| }; | |
| } element[]; | |
| u64 u64; | |
| struct { | |
| u64 enabled:1; | |
| u64 reserved:11; | |
| u64 pagenumber:52; | |
| } __packed; | |
| } __packed; | |
| struct hv_set_vp_registers_input { | |
| struct { | |
| u64 partitionid; | |
| u32 vpindex; | |
| u8 inputvtl; | |
| u8 padding[3]; | |
| } header; | |
| struct { | |
| u32 name; | |
| u32 padding1; | |
| u64 padding2; | |
| union { | |
| union hv_register_value value; | |
| struct { | |
| u64 valuelow; | |
| u64 valuehigh; | |
| }; | |
| }; | |
| } element[]; |
| u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_SET_VP_REGISTERS; | ||
| struct hv_set_vp_registers_input *input = | ||
| (struct hv_set_vp_registers_input *) | ||
| ((u8 *)hv_vp_early_input_arg + smp_processor_id() * PAGE_SIZE); | ||
| union hv_x64_register_sev_gpa_page value; | ||
| unsigned long flags; | ||
| int retry = 5; | ||
| u64 ret; | ||
|
|
||
| local_irq_save(flags); | ||
|
|
||
| value.enabled = 1; | ||
| value.reserved = 0; | ||
| value.pagenumber = gfn; | ||
|
|
||
| memset(input, 0, struct_size(input, element, 1)); | ||
| input->header.partitionid = HV_PARTITION_ID_SELF; | ||
| input->header.vpindex = HV_VP_INDEX_SELF; | ||
| input->header.inputvtl = ms_hyperv.vtl; | ||
| input->element[0].name = HV_X64_REGISTER_SEV_AVIC_GPA; | ||
| input->element[0].value.reg64 = value.u64; | ||
|
|
||
| do { | ||
| ret = hv_do_hypercall(control, input, NULL); | ||
| } while (ret == HV_STATUS_TIME_OUT && retry--); |
There was a problem hiding this comment.
hv_set_savic_backing_page() indexes hv_vp_early_input_arg using smp_processor_id() but only disables interrupts. On PREEMPT kernels this can still trigger smp_processor_id() warnings if preemption is enabled. Consider using get_cpu()/put_cpu() (or preempt_disable()/enable()) around the smp_processor_id() usage, or use a per-cpu pointer (this_cpu_ptr) instead of a global array indexed by CPU id.
| #define MSHV_VTL_RMPQUERY _IOW(MSHV_IOCTL, 0x35, struct mshv_rmpquery) | ||
| #define MSHV_VTL_INVLPGB _IOW(MSHV_IOCTL, 0x36, struct mshv_invlpgb) | ||
| #define MSHV_VTL_TLBSYNC _IO(MSHV_IOCTL, 0x37) | ||
| #define MSHV_VTL_SECURE_AVIC_VTL0_PFN _IOWR(MSHV_IOCTL, 0x40, __u64) |
There was a problem hiding this comment.
The new ioctl is declared as taking a single __u64, but the implementation uses the argument as an in/out buffer (input: cpu_id as u32, output: PFN as u64). This should be a dedicated UAPI struct (e.g., { __u32 cpu_id; __u32 pad; __u64 pfn; }) so the ABI is self-describing and works reliably for compat/ioctl decoding.
| static long mshv_vtl_ioctl_secure_avic_vtl0_pfn(void __user *user_arg) | ||
| { | ||
| u64 pfn; | ||
| u32 cpu_id; | ||
| long ret; | ||
|
|
||
| ret = copy_from_user(&cpu_id, user_arg, sizeof(cpu_id)) ? -EFAULT : 0; | ||
| if (ret) | ||
| return ret; | ||
|
|
||
| ret = smp_call_function_single(cpu_id, secure_avic_vtl0_this_cpu, &pfn, true); | ||
| if (ret) | ||
| return ret; | ||
| ret = (long)pfn; | ||
| if (ret < 0) | ||
| return ret; | ||
|
|
||
| ret = copy_to_user(user_arg, &pfn, sizeof(pfn)) ? -EFAULT : 0; |
There was a problem hiding this comment.
This handler reads only a u32 from user_arg, but the ioctl is defined as _IOWR(..., __u64) and later writes back an 8-byte PFN to the same pointer. This is an ABI mismatch and can break user space/compat handlers. Please switch to a proper UAPI struct (cpu_id in, pfn out) and copy the full struct to/from user space.
| static long mshv_vtl_ioctl_secure_avic_vtl0_pfn(void __user *user_arg) | |
| { | |
| u64 pfn; | |
| u32 cpu_id; | |
| long ret; | |
| ret = copy_from_user(&cpu_id, user_arg, sizeof(cpu_id)) ? -EFAULT : 0; | |
| if (ret) | |
| return ret; | |
| ret = smp_call_function_single(cpu_id, secure_avic_vtl0_this_cpu, &pfn, true); | |
| if (ret) | |
| return ret; | |
| ret = (long)pfn; | |
| if (ret < 0) | |
| return ret; | |
| ret = copy_to_user(user_arg, &pfn, sizeof(pfn)) ? -EFAULT : 0; | |
| struct mshv_vtl_secure_avic_vtl0_pfn_args { | |
| u32 cpu_id; | |
| u32 pad; | |
| u64 pfn; | |
| }; | |
| static long mshv_vtl_ioctl_secure_avic_vtl0_pfn(void __user *user_arg) | |
| { | |
| struct mshv_vtl_secure_avic_vtl0_pfn_args args; | |
| long ret; | |
| ret = copy_from_user(&args, user_arg, sizeof(args)) ? -EFAULT : 0; | |
| if (ret) | |
| return ret; | |
| ret = smp_call_function_single(args.cpu_id, secure_avic_vtl0_this_cpu, | |
| &args.pfn, true); | |
| if (ret) | |
| return ret; | |
| ret = (long)args.pfn; | |
| if (ret < 0) | |
| return ret; | |
| ret = copy_to_user(user_arg, &args, sizeof(args)) ? -EFAULT : 0; |
Secure AVIC provides backing page to aid the guest in limiting which interrupt vectors can be injected into the guest. Hyper-V has specific hvcall to set backing page and call it in Secure AVIC driver. Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
7a8ff1c to
4d255b3
Compare
No description provided.