NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/58561 (panic: kernel diagnostic assertion, "x86_read_psl() == 0" failed: file, "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)



The following reply was made to PR port-xen/58561; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: Konrad Schroder <perseant%hhhh.org@localhost>, gnats-bugs%NetBSD.org@localhost,
        port-xen-maintainer%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost,
        gnats-admin%NetBSD.org@localhost, cherry%NetBSD.org@localhost, port-xen%NetBSD.org@localhost
Subject: Re: port-xen/58561 (panic: kernel diagnostic assertion,
 "x86_read_psl() == 0" failed: file,
 "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)
Date: Mon, 12 Jan 2026 22:42:57 +0100

 On Sun, Jan 11, 2026 at 05:21:12PM +0000, Taylor R Campbell wrote:
 > > Date: Sat, 10 Jan 2026 23:33:30 +0100
 > > From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
 > > 
 > > And this assembly work may have allowed me to find the problem.
 > > Basically, the CLI and STI macros are not atomic on Xen PV, and if preemption
 > > happens at a bad time we may end up updating the upcall_mask of our previous
 > > CPU. The attached patch should close this race condition; hopefully it's
 > > the last one.
 > 
 > Nice!  I fixed some of this a few years ago but I must have neglected
 > to search further for other uses like the assembly stubs using
 > EVTCHN_UPCALL_MASK and EVTCHN_UPCALL_PENDING:
 > 
 > https://mail-index.netbsd.org/source-changes/2023/02/25/msg143510.html
 > 
 > There are still a lot of suspicious cases that turn up when I grep -i
 > for evtchn_upcall:
 > 
 > - i386/spl.S -- might be fine because they're in the spllower and
 >   doreti paths, so maybe preemption is already blocked by virtue of
 >   having raised ipl, but needs another set of eyeballs to be sure:
 > 
 >    https://nxr.netbsd.org/xref/src/sys/arch/i386/i386/spl.S?r=1.58#218
 >    https://nxr.netbsd.org/xref/src/sys/arch/i386/i386/spl.S?r=1.58#291
 > 
 > - i386 CLI/STI macros:
 > 
 >   https://nxr.netbsd.org/xref/src/sys/arch/i386/include/frameasm.h?r=1.35#12
 > 
 >   (Maybe we can just nix XEN_BLOCK/UNBLOCK_EVENTS and XEN_TEST_PENDING
 >   and merge those into the CLI/STI macros?)
 
 Yes, i386 needs a look too. Wlll do later.
 
 > 
 > - amd64 profile stubs:
 > 
 >   https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/profile.h?r=1.21#82
 > 
 >   (Not sure why there's no obvious i386 analogue of that?  The i386
 >   code has no XENPV conditionals.)
 
 Maybe a Xen/i386 profiling kernel has never been tried. Or maybe Xen emulates
 cli and related
 
 > 
 > - xen_init_amd64/i386_vcpuctxt -- guessing this runs before interrupts
 >   or preemption can happen, but unclear, needs more eyeballs:
 > 
 >   https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/cpu.c?r=1.145#850
 >   https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/cpu.c?r=1.145#947
 
 Yes, this runs before interrupts are enabled.
 
 > 
 > - do_hypervisor_callback -- does this run with ipl raised so
 >   preemption is not possible?  If so, can we assert the ipl is raised?
 > 
 >   https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/hypervisor_machdep.c?r=1.46#301
 
 It's called from interrupt context (with ci_idepth raised) so preemtion should
 not be possible.
 
 > 
 > - hypervisor_send_event -- this is #if 0 code, can we delete it and
 >   the one #if 0 caller in sys/arch/xen/xen/evtchn.c so we don't have
 >   to think about it?  If not, I think we should either
 >   KASSERT(kpreempt_disabled()) or assert the ipl is raised.
 > 
 >   https://nxr.netbsd.org/xref/src/sys/arch/xen/x86/hypervisor_machdep.c?r=1.46#336
 >   https://nxr.netbsd.org/xref/src/sys/arch/xen/xen/evtchn.c?r=1.100#383
 
 Sure, I deleted it.
 
 > 
 > - xen_debug_handler -- this runs with ipl raised, right?  Maybe not
 >   worth asserting because this is a debug vector so dumping info is
 >   better than crashing.
 > 
 >   https://nxr.netbsd.org/xref/src/sys/arch/xen/xen/evtchn.c?r=1.100#1142
 
 That's the idea; and also it's called from interrupt context so no
 preemtion possible
 
 > 
 > Also, I think we should either:
 > 
 > (a) delete the PUSHF/POPF macros altogether, if they're unused, or
 > 
 > (b) make sure they too block preemption in the critical path.
 
 AFAIK they're unused, I will remove them.
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index