NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/58561 (panic: kernel diagnostic assertion, "x86_read_psl() == 0" failed: file, "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)



> Date: Sat, 10 Jan 2026 23:33:30 +0100
> From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> 
> And this assembly work may have allowed me to find the problem.
> Basically, the CLI and STI macros are not atomic on Xen PV, and if preemption
> happens at a bad time we may end up updating the upcall_mask of our previous
> CPU. The attached patch should close this race condition; hopefully it's
> the last one.

Nice!  I fixed some of this a few years ago but I must have neglected
to search further for other uses like the assembly stubs using
EVTCHN_UPCALL_MASK and EVTCHN_UPCALL_PENDING:

https://mail-index.netbsd.org/source-changes/2023/02/25/msg143510.html

There are still a lot of suspicious cases that turn up when I grep -i
for evtchn_upcall:

- i386/spl.S -- might be fine because they're in the spllower and
  doreti paths, so maybe preemption is already blocked by virtue of
  having raised ipl, but needs another set of eyeballs to be sure:

   https://nxr.netbsd.org/xref/src/sys/arch/i386/i386/spl.S?r=1.58#218
   https://nxr.netbsd.org/xref/src/sys/arch/i386/i386/spl.S?r=1.58#291

- i386 CLI/STI macros:

  https://nxr.netbsd.org/xref/src/sys/arch/i386/include/frameasm.h?r=1.35#12

  (Maybe we can just nix XEN_BLOCK/UNBLOCK_EVENTS and XEN_TEST_PENDING
  and merge those into the CLI/STI macros?)

- amd64 profile stubs:

  https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/profile.h?r=1.21#82

  (Not sure why there's no obvious i386 analogue of that?  The i386
  code has no XENPV conditionals.)

- xen_init_amd64/i386_vcpuctxt -- guessing this runs before interrupts
  or preemption can happen, but unclear, needs more eyeballs:

  https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/cpu.c?r=1.145#850
  https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/cpu.c?r=1.145#947

- do_hypervisor_callback -- does this run with ipl raised so
  preemption is not possible?  If so, can we assert the ipl is raised?

  https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/hypervisor_machdep.c?r=1.46#301

- hypervisor_send_event -- this is #if 0 code, can we delete it and
  the one #if 0 caller in sys/arch/xen/xen/evtchn.c so we don't have
  to think about it?  If not, I think we should either
  KASSERT(kpreempt_disabled()) or assert the ipl is raised.

  https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/hypervisor_machdep.c?r=1.46#336
  https://nxr.netbsd.org/xref/src/sys/arch/xen/xen/evtchn.c?r=1.100#383

- xen_debug_handler -- this runs with ipl raised, right?  Maybe not
  worth asserting because this is a debug vector so dumping info is
  better than crashing.

  https://nxr.netbsd.org/xref/src/sys/arch/xen/xen/evtchn.c?r=1.100#1142

Also, I think we should either:

(a) delete the PUSHF/POPF macros altogether, if they're unused, or

(b) make sure they too block preemption in the critical path.


Home | Main Index | Thread Index | Old Index