Automatic rebase of branch 'aem-next-qubes' met a conflict.#32
Open
3mdeb-robot wants to merge 3137 commits into
Open
Automatic rebase of branch 'aem-next-qubes' met a conflict.#323mdeb-robot wants to merge 3137 commits into
3mdeb-robot wants to merge 3137 commits into
Conversation
Rename all instances of ECLAIR MISRA C:2012 service identifiers, identified by the prefix MC3R1, to use the prefix MC3A2, which refers to MISRA C:2012 Amendment 2 guidelines. This update is motivated by the need to upgrade ECLAIR GitLab runners that use the new naming scheme for MISRA C:2012 Amendment 2 guidelines. Changes to the docs/misra directory are needed in order to keep comment-based deviation up to date. Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> (cherry picked from commit 631f535)
Updating the Eclair runner has had knock-on effects with previously-clean rules now flagging violations: - x86: Rule 1.1, 1940 violations - ARM64: Rule 1.1, 725 violations, Rule 2.1, 255 violations Fixes: 631f535 ("xen: update ECLAIR service identifiers from MC3R1 to MC3A2.") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit 171cb31)
Linux 6.12-rc2 fails to decompress with the current 128MiB, contrary to
the code comment. It results in a failure like this:
domainbuilder: detail: xc_dom_kernel_file: filename="/var/lib/qubes/vm-kernels/6.12-rc2-1.1.fc37/vmlinuz"
domainbuilder: detail: xc_dom_malloc_filemap : 12104 kB
domainbuilder: detail: xc_dom_module_file: filename="/var/lib/qubes/vm-kernels/6.12-rc2-1.1.fc37/initramfs"
domainbuilder: detail: xc_dom_malloc_filemap : 7711 kB
domainbuilder: detail: xc_dom_boot_xen_init: ver 4.19, caps xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
domainbuilder: detail: xc_dom_parse_image: called
domainbuilder: detail: xc_dom_find_loader: trying multiboot-binary loader ...
domainbuilder: detail: loader probe failed
domainbuilder: detail: xc_dom_find_loader: trying HVM-generic loader ...
domainbuilder: detail: loader probe failed
domainbuilder: detail: xc_dom_find_loader: trying Linux bzImage loader ...
domainbuilder: detail: _xc_try_lzma_decode: XZ decompression error: Memory usage limit reached
xc: error: panic: xg_dom_bzimageloader.c:761: xc_dom_probe_bzimage_kernel unable to XZ decompress kernel: Invalid kernel
domainbuilder: detail: loader probe failed
domainbuilder: detail: xc_dom_find_loader: trying ELF-generic loader ...
domainbuilder: detail: loader probe failed
xc: error: panic: xg_dom_core.c:689: xc_dom_find_loader: no loader found: Invalid kernel
libxl: error: libxl_dom.c:566:libxl__build_dom: xc_dom_parse_image failed
The important part: XZ decompression error: Memory usage limit reached
This looks to be related to the following change in Linux:
8653c909922743bceb4800e5cc26087208c9e0e6 ("xz: use 128 MiB dictionary and force single-threaded mode")
Fix this by increasing the block size to 256MiB. And remove the
misleading comment (from lack of better ideas).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e6472d4
master date: 2024-12-19 17:33:54 +0000
The objdump output is fed to grep, so make sure it doesn't change with different user locales and break the grep parsing. This problem was identified while updating xen in Debian and the fix is needed for generating reproducible builds in varying environments. Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 0d72922 master date: 2024-12-30 21:40:37 +0000
AMD have updated the SRSO whitepaper[1] with further information. These
features exist on AMD Zen5 CPUs and are necessary for Xen to use.
The two features are in principle unrelated:
* SRSO_U/S_NO is an enumeration saying that SRSO attacks can't cross the
User(CPL3) / Supervisor(CPL<3) boundary. i.e. Xen don't need to use
IBPB-on-entry for PV64. PV32 guests are explicitly unsupported for
speculative issues, and excluded from consideration for simplicity.
* SRSO_MSR_FIX is an enumeration identifying that the BP_SPEC_REDUCE bit is
available in MSR_BP_CFG. When set, SRSO attacks can't cross the host/guest
boundary. i.e. Xen don't need to use IBPB-on-entry for HVM.
Extend ibpb_calculations() to account for these when calculating
opt_ibpb_entry_{pv,hvm} defaults. Add a `bp-spec-reduce=<bool>` option to
control the use of BP_SPEC_REDUCE, with it active by default.
Because MSR_BP_CFG is core-scoped with a race condition updating it, repurpose
amd_check_erratum_1485() into amd_check_bp_cfg() and calculate all updates at
once.
Xen also needs to to advertise SRSO_U/S_NO to guests to allow the guest kernel
to skip SRSO mitigations too:
* This is trivial for HVM guests. It is also is accurate for PV32 guests
too, but we have already excluded them from consideration, and do so again
here to simplify the policy logic.
* As written, SRSO_U/S_NO does not help for the PV64 user->kernel boundary.
However, after discussing with AMD, an implementation detail of having
BP_SPEC_REDUCE active causes the PV64 user->kernel boundary to have the
property described by SRSO_U/S_NO, so we can advertise SRSO_U/S_NO to
guests when the BP_SPEC_REDUCE precondition is met.
Finally, fix a typo in the SRSO_NO's comment.
[1] https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: a1746cd
master date: 2025-01-02 18:44:49 +0000
AMD have always used the architectural MSRs for LER. As the first processor to support LER was the K7 (which was 32bit), we can assume it's presence unconditionally in 64bit mode. Intel are about to run out of space in Family 6 and start using 19. It is only the Pentium 4 which uses non-architectural LER MSRs. percpu_traps_init(), which runs on every CPU, contains a lot of code which should be init-only, and is the only reason why opt_ler can't be in initdata. Write a brand new init_ler() which expects all future Intel and AMD CPUs to continue using the architectural MSRs, and does all setup together. Call it from trap_init(), and remove the setup logic percpu_traps_init() except for the single path configuring MSR_IA32_DEBUGCTLMSR. Leave behind a warning if the user asked for LER and Xen couldn't enable it. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: 555866c master date: 2025-01-06 12:24:05 +0000
Fam1Ah is similar to Fam19h in these regards. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> master commit: f29cc14 master date: 2025-01-06 18:01:32 +0000
IOW we shouldn't raise #UD in that case. Be on the safe side though and
only encode fully legitimate forms into the stub to be executed.
Things weren't quite right for VCVT{,U}SI2SD either, in the attempt to
be on the safe side: Clearing EVEX.L'L isn't useful; it's EVEX.b which
primarily needs clearing. Also reflect the somewhat improved doc
situation in the comment there.
Fixes: ed806f3 ("x86emul: support AVX512F legacy-equivalent packed int/FP conversion insns")
Fixes: baf4a37 ("x86emul: support AVX512F legacy-equivalent scalar int/FP conversion insns")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d3709d1
master date: 2025-01-08 11:01:17 +0100
All selector fields under ctxt->regs are (normally) poisoned in the HVM case, and the four ones besides CS and SS are potentially stale for PV. Avoid using them in the hypervisor incarnation of the emulator, when trying to cover for a missing ->read_segment() hook. To make sure there's always a valid ->read_segment() handler for all HVM cases, add a respective function to shadow code, even if it is not expected for FPU insns to be used to update page tables. Fixes: 0711b59 ("x86emul: correct FPU code/data pointers and opcode handling") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 645b8d4 master date: 2025-01-08 11:02:16 +0100
Addition of FLASK permission for this hypercall was overlooked in the original patch. Fix it. The only VUART operation is initialization that can occur only during domain creation. Fixes: 86039f2 ("xen/arm: vpl011: Add a new domctl API to initialize vpl011") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> master commit: 29daa72 master date: 2025-01-08 13:05:38 +0100
Addition of FLASK permission for this hypercall was overlooked in the original patch. Fix it. The only dt overlay operation is attaching that can happen only after the domain is created. Dom0 can attach overlay to itself as well. Fixes: 4c73387 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> master commit: 7fa1411 master date: 2025-01-08 13:05:50 +0100
There is a possible race scenario between set_global_virq_handler() and clear_global_virq_handlers() targeting the same domain, which might result in that domain ending as a zombie domain. In case set_global_virq_handler() is being called for a domain which is just dying, it might happen that clear_global_virq_handlers() is running first, resulting in set_global_virq_handler() taking a new reference for that domain and entering in the global_virq_handlers[] array afterwards. The reference will never be dropped, thus the domain will never be freed completely. This can be fixed by checking the is_dying state of the domain inside the region guarded by global_virq_handlers_lock. In case the domain is dying, handle it as if the domain wouldn't exist, which will be the case in near future anyway. Fixes: 8752158 ("xen: allow global VIRQ handlers to be delegated to other domains") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: 4d8acc9 master date: 2025-01-09 17:34:01 +0100
Let's make explicit what the compiler may or may not do on our behalf: The 2nd of the recursive invocations each can fall through rather than re-invoking the function. This will save us from adding yet another parameter (or more) to the function, just for the recursive invocations. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 1805305 master date: 2024-09-09 13:40:47 +0200
To avoid overrunning the internal buffer we need to take the offset into the buffer into account. Fixes: d95da91 ("x86/HVM: grow MMIO cache data size to 64 bytes") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> master commit: e5339bb master date: 2025-01-23 11:14:48 +0100
Both caches may need higher capacity, and the upper bound will need to be determined dynamically based on CPUID policy (for AMX'es TILELOAD / TILESTORE at least). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 23d60db master date: 2025-01-24 10:15:29 +0100
The MMIO cache is intended to have one entry used per independent memory access that an insn does. This, in particular, is supposed to be ignoring any page boundary crossing. Therefore when looking up a cache entry, the access'es starting (linear) address is relevant, not the one possibly advanced past a page boundary. In order for the same offset-into-buffer variable to be usable in hvmemul_phys_mmio_access() for both the caller's buffer and the cache entry's it is further necessary to have the un-adjusted caller buffer passed into there. Fixes: 2d527ba ("x86/hvm: split all linear reads and writes at page boundary") Reported-by: Manuel Andreas <manuel.andreas@tum.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 672894a master date: 2025-01-24 10:15:56 +0100
All hardware with VT-d/AMD-Vi has CMPXCHG16B support. Check this at initialisation time, and otherwise refuse to use the IOMMU. If the local APICs support x2APIC mode the IOMMU support for interrupt remapping will be checked earlier using a specific helper. If no support for CX16 is detected by that earlier hook disable the IOMMU at that point and prevent further poking for CX16 later in the boot process, which would also fail. There's a possible corner case when running virtualized, and the underlying hypervisor exposing an IOMMU but no CMPXCHG16B support. In which case ignoring the IOMMU is fine, albeit the most natural would be for the underlying hypervisor to also expose CMPXCHG16B support if an IOMMU is available to the VM. Note this change only introduces the checks, but doesn't remove the now stale checks for CX16 support sprinkled in the IOMMU code. Further changes will take care of that. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Teddy Astie <teddy.astie@vates.tech> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 2636fcd master date: 2025-01-27 13:05:11 +0100
Either when using a 32bit Interrupt Remapping Entry or a 128bit one update the entry atomically, by using cmpxchg unconditionally as IOMMU depends on it. No longer disable the entry by setting RemapEn = 0 ahead of updating it. As a consequence of not toggling RemapEn ahead of the update the Interrupt Remapping Table needs to be flushed after the entry update. This avoids a window where the IRTE has RemapEn = 0, which can lead to IO_PAGE_FAULT if the underlying interrupt source is not masked. There's no guidance in AMD-Vi specification about how IRTE update should be performed as opposed to DTE updating which has specific guidance. However DTE updating claims that reads will always be at least 128bits in size, and hence for the purposes here assume that reads and caching of the IRTE entries in either 32 or 128 bit format will be done atomically from the IOMMU. Note that as part of introducing a new raw128 field in the IRTE struct, the current raw field is renamed to raw64 to explicitly contain the size in the field name. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: b953a99 master date: 2025-01-27 13:05:11 +0100
…handling
In an entirely different context I came across Linux commit 428e3d08574b
("KVM: x86: Fix zero iterations REP-string"), which points out that
we're still doing things wrong: For one, there's no zero-extension at
all on AMD. And then while RCX is zero-extended from 32 bits uniformly
for all string instructions on newer hardware, RSI/RDI are only for MOVS
and STOS on the systems I have access to. (On an old family 0xf system
I've further found that for REP LODS even RCX is not zero-extended.)
While touching the lines anyway, replace two casts in get_rep_prefix().
Fixes: 79e996a ("x86emul: correct 64-bit mode repeated string insn handling with zero count")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 5310a04
master date: 2025-01-27 15:23:19 +0100
The original implementation has two issues: For one it doesn't preserve non-canonical-ness of inputs in the range 0x8000000000000000 through 0x80007fffffffffff. Bogus guest pointers in that range would not cause a (#GP) fault upon access, when they should. And then there is an AMD-specific aspect, where only the low 48 bits of an address are used for speculative execution; the architecturally mandated #GP for non-canonical addresses would be raised at a later execution stage. Therefore to prevent Xen controlled data to make it into any of the caches in a guest controllable manner, we need to additionally ensure that for non-canonical inputs bit 47 would be clear. See the code comment for how addressing both is being achieved. Fixes: 4dc1815 ("x86/PV: harden guest memory accesses against speculative abuse") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 8306d77 master date: 2025-01-27 15:23:59 +0100
Logic using performance counters needs to look at MSR_MISC_ENABLE.PERF_AVAILABLE before touching any other resources. When virtualised under ESX, Xen dies with a #GP fault trying to read MSR_CORE_PERF_GLOBAL_CTRL. Factor this logic out into a separate function (it's already too squashed to the RHS), and insert a check of MSR_MISC_ENABLE.PERF_AVAILABLE. This also avoids setting X86_FEATURE_ARCH_PERFMON if MSR_MISC_ENABLE says that PERF is unavailable, although oprofile (the only consumer of this flag) cross-checks too. Fixes: 6bdb965 ("x86/intel: ensure Global Performance Counter Control is setup correctly") Reported-by: Jonathan Katz <jonathan.katz@aptar.com> Link: https://xcp-ng.org/forum/topic/10286/nesting-xcp-ng-on-esx-8 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Jonathan Katz <jonathan.katz@aptar.com> master commit: dd05d26 master date: 2025-01-28 11:19:45 +0000
These were needed by TMEM only, which is long gone. The Linux original doesn't have such either. This effectively reverts one of the "Other changes" from 8dc6738 ("Update radix-tree.[ch] from upstream Linux to gain RCU awareness"). Positive side effect: Two cf_check go away. While there also convert xmalloc()+memset() to xzalloc(). Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 1275093 master date: 2025-02-07 09:59:11 +0100
... now that static initialization is possible. Use RADIX_TREE() for pci_segments and ivrs_maps. This then fixes an ordering issue on x86: With the call to radix_tree_init(), acpi_mmcfg_init()'s invocation of pci_segments_init() will zap the possible earlier introduction of segment 0 by amd_iommu_detect_one_acpi()'s call to pci_ro_device(), and thus the write-protection of the PCI devices representing AMD IOMMUs. Fixes: 3950f24 ("x86/x2APIC: defer probe until after IOMMU ACPI table parsing") Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 26fe09e master date: 2025-02-07 10:00:04 +0100
The current shutdown logic in smp_send_stop() will disable the APs while having interrupts enabled on the BSP or possibly other APs. On AMD systems this can lead to local APIC errors: APIC error on CPU0: 00(08), Receive accept error Such error message can be printed in a loop, thus blocking the system from rebooting. I assume this loop is created by the error being triggered by the console interrupt, which is further stirred by the ESR handler printing to the console. Intel SDM states: "Receive Accept Error. Set when the local APIC detects that the message it received was not accepted by any APIC on the APIC bus, including itself. Used only on P6 family and Pentium processors." So the error shouldn't trigger on any Intel CPU supported by Xen. However AMD doesn't make such claims, and indeed the error is broadcast to all local APICs when an interrupt targets a CPU that's already offline. To prevent the error from stalling the shutdown process perform the disabling of APs and the BSP local APIC with interrupts disabled on all CPUs in the system, so that by the time interrupts are unmasked on the BSP the local APIC is already disabled. This can still lead to a spurious: APIC error on CPU0: 00(00) As a result of an LVT Error getting injected while interrupts are masked on the CPU, and the vector only handled after the local APIC is already disabled. ESR reports 0 because as part of disable_local_APIC() the ESR register is cleared. Note the NMI crash path doesn't have such issue, because disabling of APs and the caller local APIC is already done in the same contiguous region with interrupts disabled. There's a possible window on the NMI crash path (nmi_shootdown_cpus()) where some APs might be disabled (and thus interrupts targeting them raising "Receive accept error") before others APs have interrupts disabled. However the shutdown NMI will be handled, regardless of whether the AP is processing a local APIC error, and hence such interrupts will not cause the shutdown process to get stuck. Remove the call to fixup_irqs() in smp_send_stop(): it doesn't achieve the intended goal of moving all interrupts to the BSP anyway. The logic in fixup_irqs() will move interrupts whose affinity doesn't overlap with the passed mask, but the movement of interrupts is done to any CPU set in cpu_online_map. As in the shutdown path fixup_irqs() is called before APs are cleared from cpu_online_map this leads to interrupts being shuffled around, but not assigned to the BSP exclusively. The Fixes tag is more of a guess than a certainty; it's possible the previous sleep window in fixup_irqs() allowed any in-flight interrupt to be delivered before APs went offline. However fixup_irqs() was still incorrectly used, as it didn't (and still doesn't) move all interrupts to target the provided cpu mask. Fixes: e2bb28d ('x86/irq: forward pending interrupts to new destination in fixup_irqs()') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: 1191ce9 master date: 2025-02-12 15:56:07 +0100
Move the disabling of interrupt sources so it's done ahead of the offlining of APs. This is to prevent AMD systems triggering "Receive accept error" when interrupts target CPUs that are no longer online. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: db6daa9 master date: 2025-02-12 15:56:07 +0100
Attempt to disable MSI(-X) capabilities on all PCI devices know by Xen at shutdown. Doing such disabling should facilitate kexec chained kernel from booting more reliably, as device MSI(-X) interrupt generation should be quiesced. Only attempt to disable MSI(-X) on all devices in the crash context if the PCI lock is not taken, otherwise the PCI device list could be in an inconsistent state. This requires introducing a new pcidevs_trylock() helper to check whether the lock is currently taken. Disabling MSI(-X) should prevent "Receive accept error" being raised as a result of non-disabled interrupts targeting offline CPUs. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: 7ab6951 master date: 2025-02-12 15:56:07 +0100
Add a new hook to inhibit interrupt generation by the IOMMU(s). Note the hook is currently only implemented for x86 IOMMUs. The purpose is to disable interrupt generation at shutdown so any kexec chained image finds the IOMMU(s) in a quiesced state. It would also prevent "Receive accept error" being raised as a result of non-disabled interrupts targeting offline CPUs. Note that the iommu_quiesce() call in nmi_shootdown_cpus() is still required even when there's a preceding iommu_crash_shutdown() call; the later can become a no-op depending on the setting of the "crash-disable" command line option. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: 819c3cb master date: 2025-02-12 15:56:07 +0100
The function's use from set_msi_source_id() is guaranteed to be in an IRQs-off region. While the invocation of that function could be moved ahead in msi_msg_to_remap_entry() (doesn't need to be in the IOMMU- intremap-locked region), the call tree from map_domain_pirq() holds an IRQ descriptor lock. Hence all use sites of the lock need become IRQ- safe ones. In find_upstream_bridge() do a tiny bit of tidying in adjacent code: Change a variable's type to unsigned and merge a redundant assignment into another variable's initializer. This is XSA-467 / CVE-2025-1713. Fixes: 476bbcc ("VT-d: fix MSI source-id of interrupt remapping") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit 39bc6af)
The panic() function uses a static buffer to format its arguments into, simply
to emit the result via printk("%s", buf). This buffer is not large enough for
some existing users in Xen. e.g.:
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Invalid device tree blob at physical address 0x46a00000.
(XEN) The DTB must be 8-byte aligned and must not exceed 2 MB in size.
(XEN)
(XEN) Plea****************************************
The remainder of this particular message is 'e check your bootloader.', but
has been inherited by RISC-V from ARM.
It is also pointless double buffering. Implement vprintk() beside printk(),
and use it directly rather than rendering into a local buffer, removing it as
one source of message limitation.
This marginally simplifies panic(), and drops a global used-once buffer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 81f8b1d
master date: 2025-02-18 14:15:58 +0000
This is actually what the caller acquire_resource() expects on any kind of error (the comment on top of resource_max_frames() also suggests that). Otherwise, the caller will treat -errno as a valid value and propagate incorrect nr_frames to the VM. As a possible consequence, a VM trying to query a resource size of an unknown type will get the success result from the hypercall and obtain nr_frames 4294967201. Also, add an ASSERT_UNREACHABLE() in the default case of _acquire_resource(), normally we won't get to this point, as an unknown type will always be rejected earlier in resource_max_frames(). Also, update test-resource app to verify that Xen can deal with invalid (unknown) resource type properly. Fixes: 9244528 ("xen/memory: Fix acquire_resource size semantics") Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 9b87082 master date: 2025-02-18 14:47:34 +0000
64-bit BAR memory address is truncated when removing a passthrough pci device from guest since it uses "unsigned int". So, change to use 64-bit type to fix this problem. This is XSA-476 / CVE-2025-58149. Fixes: b0a1af6 ("libxenlight: implement pci passthrough") Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech> (cherry picked from commit 421432b)
Currently, the URL where the ECLAIR MISRA C scan reports are saved is hardcoded; making it configurable allows multiple runners and storage servers to be used without resorting to publishing all artifacts to the same report server. Additionally, reports will be accessed publicly by using a proxy, therefore the address that needs to be printed in GitLab analysis logs is that of the public url, rather than the location where they are stored. Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> [stefano: remove unneeded exports] Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> (cherry picked from commit bea38de)
EIO is not the only error that ucode_ops.apply_microcode() can produce. EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is unhappy in some way with the proposed blob. Some of these can be bypassed with --force, which will cause the parallel load to be attempted. Fixes: 5ed1256 ("microcode: rendezvous CPUs in NMI handler and load ucode") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> (cherry picked from commit e0bb712) [Note: --force doesn't exist in this version of Xen]
For Zen3-5 microcode blobs signed with the updated signature scheme, the checksum field has been reused to be a min_revision field, referring to the microcode revision which fixed Entrysign (SB-7033, CVE-2024-36347). Cross-check this when trying to load microcode, but allow --force to override it. If the signature scheme is genuinely different, a #GP will occur. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> (cherry picked from commit b3f015b) [Note: --force doesn't exist in this version of Xen]
After initial publication, the SB-7033 / CVE-2024-36347 bulletin was updated to list Zen5 CPUs as vulnerable. Use Fam1ah as an upper bound, and adjust the command line documentation. When the Zen6 (also Fam1ah processors) model numbers are known, they'll want excluding from the range. Fixes: 630e887 ("x86/ucode: Perform extra SHA2 checks on AMD Fam17h/19h microcode") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> (cherry picked from commit c252949)
When Entrysign has been mitigated in firwmare, it is believed to be safe to rely on the CPU patchloader again. This avoids us needing to maintain the digest table for all new microcode indefinitely. Relax the digest check when firmware looks to be up to date, and leave behind a clear message when not. When the Zen6 (also Fam1ah processors) model numbers are known, they'll want excluding from the range. This is best-effort only. If a malicious microcode has been loaded prior to Xen running, then all bets are off. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> (cherry picked from commit ff8228a)
By observation GNU ld 2.25 may emit file symbols for .data.read_mostly when linking xen.efi. Due to the nature of file symbols in COFF symbol tables (see the code comment) the symbols_offsets[] entries for such symbols would cause assembler warnings regarding value truncation. Of course the resulting entries would also be both meaningless and useless. Add a heuristic to get rid of them, really taking effect only when --all-symbols is specified (otherwise these symbols are discarded anyway). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 2f21ce1 master date: 2025-10-21 14:10:46 +0200
A .disable handler can't typically be re-used for .ack: The latter needs
to deal with IRQ migration, while the former shouldn't. Furthermore
invoking just irq_complete_move() isn't enough; one of
move_{native,masked}_irq() also needs invoking.
Fixes: 487a1cf ("x86: Implement per-cpu vector for xen hypervisor")
Fixes: f821102 ("x86: IRQ Migration logic enhancement")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: aed6278
master date: 2025-10-21 14:11:45 +0200
Keeping channels enabled when they're unused is only causing problems: Extra interrupts harm performance, and extra nested interrupts could even have caused worse problems. However, on all Intel hardware I looked at closely, a 0->1 transition of the enable bit causes an immediate IRQ. Hence disabling channels isn't a good idea there. Set a "long" timeout instead. Along with that also "clear" the channel's "next event", for it to be properly written by whatever the next user is going to want (possibly avoiding too early an IRQ). Further, along the same lines, don't enable channels early when starting up an IRQ. This doesn't need to happen earlier than from set_channel_irq_affinity() (once a channel goes into use the very first time). This eliminates a single instance of (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0 (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request during boot. (Why exactly there's only one instance, when we use multiple counters and hence multiple IRQs, I can't tell. My understanding would be that this was due to __hpet_setup_msi_irq() being called only after request_irq() [and hence the .startup handler], yet that should have affected all channels.) Fixes: 3ba523f ("CPUIDLE: enable MSI capable HPET for timer broadcast") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 24f608d master date: 2025-10-27 15:51:03 +0100
Using dynamically allocated / maintained vectors has several downsides: - possible nesting of IRQs due to the effects of IRQ migration, - reduction of vectors available for devices, - IRQs not moving as intended if there's shortage of vectors, - higher runtime overhead. As the vector also doesn't need to be of any priority (first and foremost it really shouldn't be of higher or same priority as the timer IRQ, as that raises TIMER_SOFTIRQ anyway), simply use the lowest one above the legacy range. The vector needs reserving early, until it is known whether it actually is used. If it isn't, it's made available for general use. With a fixed vector, less updating is now necessary in set_channel_irq_affinity(); in particular channels don't need transiently masking anymore, as the necessary update is now atomic. To fully leverage this, however, we want to stop using hpet_msi_set_affinity() there. With the transient masking dropped, we're no longer at risk of missing events. AMD interrupt remapping code so far didn't "return" a consistent MSI address when translating an MSI message. Clear respective fields there, to keep the related assertion in set_channel_irq_affinity() from triggering. Fixes: 996576b ("xen: allow up to 16383 cpus") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 8ef3277 master date: 2025-10-27 15:51:42 +0100
If a SR-IOV card presents an I/O space inside a BAR the code will continue to loop on the same card. This is due to the missing increment of the cycle variable. Fixes: a1a6d59 ("pci: split code to size BARs from pci_add_device") Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: f7091c0 master date: 2025-10-27 15:52:20 +0100
With large NR_CPUS on-stack cpumask_t variables are problematic. Now that the IRQ handler can't be invoked in a nested manner anymore, we can instead use a per-CPU variable. While we can't use scratch_cpumask in code invoked from IRQ handlers, simply amend that one with a HPET-special form. (Note that only one of the two IRQ handling functions can come into play at any one time.) Fixes: 996576b ("xen: allow up to 16383 cpus") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> master commit: 80c7c67 master date: 2025-10-29 09:02:53 +0100
Otherwise it's not possible for device models to map IRQs of devices on segments different than 0. Keep the same function prototype and pass the segment in the high 16bits of the bus parameter, like it's done for the hypercall itself. Amends: 7620c0c ("PCI multi-seg: add new physdevop-s") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> master commit: b7838d1 master date: 2025-10-21 16:56:19 +0100
The default terminal settings in Linux will enable echo which interferes with these tests. Set the value in the script to avoid failure caused by a settings reset. Signed-off-by: Victor Lira <victorm.lira@amd.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> (cherry picked from commit 2f73ef4)
Link: https://git.kernel.org/tip/d23550efc6800841b4d1639784afaebdea946ae0 Fixes: ff8228a ("x86/ucode: Relax digest check when Entrysign is fixed in firmware") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> master commit: 3907ea4 master date: 2025-11-10 16:56:03 +0000
The stimer enlightment was removed from the defaults list in commit e83077a ("libxl: don't enable synthetic timers by default") but the corresponding docs change was not made. Removing from docs as enabling the enlightenment will hang Windows 10 guests. Fixes: e83077a ("libxl: don't enable synthetic timers by default") Signed-off-by: James Dingwall <james@dingwall.me.uk> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: d510f9c master date: 2025-11-10 16:56:03 +0000
These files 'docs/misc/kconfig{,-language}.txt' were deleted, but
references are still present in Xen. Remove them to clean-up.
Fixes: 044503f ("docs: Delete kconfig docs to fix licensing violation")
Fixes: f80fe2b ("xen: Update Kconfig to Linux v5.4")
Signed-off-by: Dmytro Prokopchuk <dmytro_prokopchuk1@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: c22b6dc
master date: 2025-11-11 14:44:47 +0100
... rather than leaking whomever created the tarball. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Many *.c files are symlinked while building, so along with generated *.h files they ought to be removed. Conversely $(TARGET) doesn't need removing twice. Fixes: cb4fcf7 ("x86emul: parallelize SIMD test code building") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 774c382 master date: 2025-11-13 09:08:27 +0100
Link: https://git.kernel.org/tip/dd14022a7ce96963aa923e35cf4bcc8c32f95840 Fixes: ff8228a ("x86/ucode: Relax digest check when Entrysign is fixed in firmware") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> master commit: 07e57af master date: 2025-11-17 14:46:40 +0000
Old binutils get confused about .buildid overlapping (in VA space) with
earlier section. That confusion results in weird errors down the road,
like this one:
objcopy: xen.efi: Data Directory size (1c) exceeds space left in section (8)
While the bug is fixed in later binutils version, force alignment of the
buildid to avoid overlapping and make it work with older versions too.
This can be reverted once toolchain base is raised at or above binutils
2.36.
Details at https://lore.kernel.org/xen-devel/3TMd7J2u5gCA8ouIG_Xfcw7s5JKMG06XsDIesEB3Fi9htUJ43Lfl057wXohlpCHcszqoCmicpIlneEDO26ZqT8QfC2Y39VxBuqD3nS1j5Q4=@trmm.net/T/#u
Fixes: eee5909 ("x86/EFI: use less crude a way of generating the build ID")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 26b111c
master date: 2025-11-20 14:29:57 +0000
…ning The error messages that the compiler may emit can be confusing. The check was also the wrong way round in case multiple make targets are specified: We want to do the check whenever targets other than the running and cleaning ones are specified. Fixes: 05f4cc2 ("x86emul: suppress default test harness build with incapable compiler") Fixes: d599739 ("x86emul: suppress "not built" warning for test harness'es run targets") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: c482d1f master date: 2025-11-24 11:28:47 +0100
The xen-ucode utility is sensitive to the overall error as -EEXIST is a special case for success, but the real error can get clobbered with -EBUSY. This can be demonstrated most easily by force loading an old microcode, which should yield -EIO but yields -EBUSY: # xen-ucode /lib/firmware/amd-ucode/microcode_amd_fam17h.bin --force Failed to update microcode. (err: Device or resource busy) (XEN) 256 cores are to update their microcode (XEN) microcode: CPU0 update rev 0x830107d to 0x830107c failed, result 0x830107d (XEN) Late loading aborted: CPU0 failed to update ucode: -5 wait_for_state() returns false on encountering LOADING_EXIT. Right now, this is always transformed into -EBUSY and passed back to callers. However, control_thread_fn() can move directly to this state in the case of an early error; it is not an error condition for APs, but the latest write into stopmachine_data.fn_result wins, causing the real error, -EIO, to get clobbered with -EBUSY. Drop all the -EBUSY's, and treat hitting LOADING_EXIT as a success case. This causes only a single error to be returned through stop_machine_run(), and preserves the -EIO # xen-ucode /lib/firmware/amd-ucode/microcode_amd_fam17h.bin --force Failed to update microcode. (err: Input/output error) (XEN) 256 cores are to update their microcode (XEN) microcode: CPU0 update rev 0x830107d to 0x830107c failed, result 0x830107d (XEN) Late loading aborted: CPU0 failed to update ucode: -5 Fixes: 5ed1256 ("microcode: rendezvous CPUs in NMI handler and load ucode") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: a405bf4 master date: 2025-11-24 15:17:39 +0000
The usage of atomic_dec_and_test() in msixtbl_pt_unregister() is inverted: the function will return true when the refcount reaches 0. The current code does the opposite and calls del_msixtbl_entry() when there are still refcounts held on the object. However all callers of msixtbl_pt_unregister() are serialized on the domctl lock, and hence there cannot be parallel calls to msixtbl_pt_unregister() that could lead to double freeing of the same object. The incorrect freeing with active msixtlb entries will result in a possible guest visible malfunction, but no internal Xen state corruption. While entries are leaked once the last pIRQ is unbound, the same entry would get re-used if the device has pIRQs bound again. The guest cannot exploit this incorrect refcount check to leak arbitrary amounts of memory by repeatedly enabling and disabling (binding and unbinding) MSI-X entries. Fixes: 34097f0 ('hvm: passthrough MSI-X mask bit acceleration') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: ea87662 master date: 2025-11-26 09:46:17 +0100
When setting a timer's config register, timer_sanitize_int_route will always reset the IRQ route value to what's valid corresponding to the !HPET_CFG_LEGACY case. This is applied even if the HPET is set to HPET_CFG_LEGACY. When some operating systems (e.g. Windows) try to write to a timer config, they will verify and rewrite the register if the values don't match what they expect. This causes an unnecessary write to HPET_Tn_CFG. Note, the HPET specification states that for the Tn_INT_ROUTE_CNF field: "If the value is not supported by this prarticular timer, then the value read back will not match what is written. [...] If the LegacyReplacement Route bit is set, then Timers 0 and 1 will have a different routing, and this bit field has no effect for those two timers." Therefore, Xen should not reset timer_int_route if legacy mode is enabled, regardless of what's in there. Fixes: ec40d3f ("x86/vhpet: check that the set interrupt route is valid") Signed-off-by: Tu Dinh <ngoc-tu.dinh@vates.tech> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: fb0e37d master date: 2025-11-26 12:10:21 +0100
This was potentially helpful when the chickenbit was the only mitigation and microcode had not been released, but that was two years ago. Zenbleed microcode has been avaialble since December 2023, and the subsequent Entrysign signature vulnerability means that firmware updates block OS-loading and more OS-loadable microcode will be produced for Zen2. i.e. the Zenbleed fix is not going to appear at runtime these days. No practical change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> master commit: 5cd1ac1 master date: 2025-11-27 18:22:20 +0000
We have two different functions explaining that DE_CFG is Core-scoped and that writes are racy but happen to be safe. This is only true when there's one of them. Introduce amd_init_de_cfg() to be the singular function which writes to DE_CFG, modelled after the logic we already have for BP_CFG. While reworking amd_check_zenbleed() into a simple predicate used by amd_init_de_cfg(), fix the microcode table. The 'good_rev' was specific to an individual stepping and not valid to be matched by model, let alone a range. The only CPUs incorrectly matched that I can locate appear to be pre-production, and probably didn't get Zenbleed microcode. Rework amd_init_lfence() to be amd_init_lfence_dispatch() with only the purpose of configuring X86_FEATURE_LFENCE_DISPATCH in the case that it needs synthesising. Run it on the BSP only and use setup_force_cpu_cap() to prevent the bit disappearing on a subseuqent CPUID rescan. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: d0c75dc master date: 2025-12-01 16:20:41 +0000
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
First repo: https://github.com/TrenchBoot/xen.git
First repo branch: aem-next-qubes
Second repo: https://github.com/TrenchBoot/xen.git
Second repo commit:
f1e7a46375f62600a0e5cf1e2f150403b6e1b6daBranch with the successfully rebased commits: aem-next-qubes-d86fb95cfd070ce77ae646d3f8a96ad452174b2c-conflict
The commit that introduced the conflict:
d86fb95cfd070ce77ae646d3f8a96ad452174b2cBefore relaunching the automatic rebase, please do the following to solve the conflict:
Fetch the remote repository:
Enter the repository.
Checkout the conflict branch created by the script:
Cherry-pick the commit that introduced the conflict
Solve the conflict and apply the commit after solving the conflict on top of the conflict branch. Important: if the conflict resolution resulted in an empty commit or you have decided not to resolve the conflict but to drop the commit - you must still add one commit to the
aem-next-qubes-d86fb95cfd070ce77ae646d3f8a96ad452174b2c-conflictbranch, even if it is an empty commit. Otherwise the automated rebase will not continue.Push the remote repository.
Rerun all jobs for the workflow https://github.com/TrenchBoot/xen/actions/runs/27455088949 to resume automated rebase.
If you want to start the automatic rebase from the beginning, then make sure to:
aem-next-qubes-d86fb95cfd070ce77ae646d3f8a96ad452174b2c-conflictfrom the remote repository.