Port-sparc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: panic at boot on 4/200
On Sat, Jan 10, 2026 at 6:48 AM David Brownlee <abs%absd.org@localhost> wrote:
>
> On Sat, 10 Jan 2026 at 03:16, foo bar <tokenalt%gmail.com@localhost> wrote:
> >
> > On Thu, Jan 8, 2026 at 2:25 AM Romain Dolbeau <romain%dolbeau.org@localhost> wrote:
> >
> > > When you say "everything else", did that include trying NetBSD-2.0 to
> > > narrow down the issue ?
> > > From the release note, 2.0 introduced SMP on SPARC, which /may/ have
> > > involved reworking the cache support a bit (for e.g. coherency).
> > > (Shot in the dark, and 4.1.1 was also pre-SMP support...).
> >
> > From memory I tried 1.5.1, 1.5.3, 2.0, 4.0, 6.0, 8.0, 9.0, and
> > current. I'll retest everything when I get back to the board.
>
> So based on the hypothesis we would expect 1.6.2 to be the last version to work
So because of all the interest I decided to setup the 4200 again and
retest kernels. 1.5.x all work, 1.6 and later do not. The errors each
version gets are different but consisten for that version even across
power cycles.
For example 1.6 fails with this right at the start.
Memory alignment error with PC 0xF00096AC. Instruction "0xC2274000".
2.0 fails with this.
panic: lockmgr: no context
syncing disks... done
Frame pointer is at 0xf0304ef8
Call traceback:
pc = 0xf027a43c args = (0xf035cd80, 0x5, 0x0, 0x0, 0xf0305018, 0x1,
0xf0304f60) fp = 0xf0304f60
pc = 0xf01a3f20 args = (0x100, 0x0, 0xf0367480, 0x0, 0xf04aa008,
0xf000a22c, 0xf0304fd0) fp = 0xf0304fd0
pc = 0xf0183790 args = (0xf02d7da0, 0x80, 0x18800020, 0xf0231a44,
0xf035ec00, 0x100, 0xf0305038) fp = 0xf0305038
pc = 0xf02179b0 args = (0xf035cbb4, 0x1, 0x0, 0x8000e6, 0x0,
0xf04a3600, 0xf03050a8) fp = 0xf03050a8
pc = 0xf0216460 args = (0xf0305220, 0x0, 0x2000, 0x0, 0xffeb8e14,
0xf0365000, 0xf0305110) fp = 0xf0305110
pc = 0xf02888a8 args = (0xf035cbb0, 0x1ff0000, 0x0, 0x1, 0x0,
0xf424, 0xf0305248) fp = 0xf0305248
pc = 0xf000857c args = (0x9, 0x80, 0x1ff0106, 0xf027f9d8, 0x10003,
0xf0305380, 0xf0305320) fp = 0xf0305320
pc = 0xf0194f54 args = (0xf035cf18, 0x1ff0100, 0x100, 0x8000e3,
0xf1ee0000, 0x200, 0xf03053d0) fp = 0xf03053d0
pc = 0xf0288824 args = (0x0, 0x18800000, 0xf0367480, 0x0,
0xf04aa008, 0xf000a22c, 0xf0305438) fp = 0xf0305438
pc = 0xf000857c args = (0x9, 0x80, 0x18800020, 0xf0231a44, 0x20003,
0xf0305570, 0xf0305510) fp = 0xf0305510
pc = 0xf0008a90 args = (0xf043bf00, 0xf0233020, 0x300, 0x8000e6,
0x0, 0xf04a3600, 0xf03055c0) fp = 0xf03055c0
pc = 0xf02330ac args = (0xf0195710, 0x0, 0x2000, 0x0, 0xffeb8e14,
0xf0308940, 0xf0305628) fp = 0xf0305628
pc = 0x0 args = (0xf0305688, 0x927c0, 0xf02a2800, 0x0, 0x0, 0xf424,
0x0) fp = 0x0
rebooting
3.0, 4.0, 5.0, 7.0, 8.0 get a watchdog reset after probing the SCSI bus.
6.0 fails with this after probing the SCSI bus.
trap type 0x2: pc=0xf2ae9e80 npc=0xf2ae9e84 psr=0x8000c2<S,PS>
kernel: illegal instruction trap
Stopped in pid 0.5 (system) at f2ae9e80: illtrap f2ae9e80
db> bt
0xf2ae9e80(0x40, 0xf040c6bc, 0x4, 0xf0510308, 0xea60, 0xf0564c80) at netbsd:vmei
ntr4+0x1c
vmeintr4(0x0, 0xf0313aa8, 0x300, 0x8000e5, 0x0, 0x280) at netbsd:sparc_interrupt
44c+0x134
sparc_interrupt44c(0xf03e7800, 0x0, 0xf0245b84, 0xf0564c80, 0x0,
0xf05aea00) at n
etbsd:mi_switch+0x1ec
mi_switch(0xf0577340, 0xfffffff8, 0x0, 0x0, 0xffeb8e14, 0xf0002800) at netbsd:so
ftint_thread+0x158
softint_thread(0xf2ae2080, 0xf0577340, 0x1, 0x18, 0x10e1, 0x1fe1) at netbsd:lwp_
setfunc_trampoline
9.0 fails with this after probing the SCSI bus.
[ 5.4900080] trap type 0x2: pc=0xf2c3de60 npc=0xf2c3de64 psr=0x8000c2<S,PS>
[ 5.5000080] kernel: illegal instruction trap
Stopped in pid 0.6 (system) at f2c3de60: bn,pn f2c40d78
db> bt
0xf2c3de60(0x40, 0x100000, 0xf06b2c58, 0x0, 0x300, 0xf2c3c000) at netbsd:vmeintr
4+0x1c
vmeintr4(0x0, 0xf000ce9c, 0x300, 0x8000e5, 0x40, 0xf070a010) at netbsd:sparc_int
errupt44c+0x134
sparc_interrupt44c(0xf0524c00, 0x1, 0x28f5c28, 0xf5c28f5c, 0x2eb, 0xf052a908) at
netbsd:mi_switch+0x2f8
mi_switch(0x0, 0x0, 0x0, 0xf051b800, 0xf0002000, 0xf06c4b20) at netbsd:softint_t
hread+0x2b4
softint_thread(0xf04949b0, 0x0, 0xf067ebb0, 0xf0002000, 0xf06c4b20, 0x2000) at n
etbsd:lwp_trampoline+0x8
10.0 fails with this after probing the SCSI bus.
[ 6.8900030] swwdog0: software watchdog initialized
[ 6.9300030] trap type 0x2: pc=0xf2bbde6c npc=0xf2bbde70 psr=0x8000c2<S,PS>
[ 6.9400030] kernel: illegal instruction trap
Stopped in pid 0.5 (system) at f2bbde6c: illtrap f2bbe29c
db> bt
0xf2bbde6c(0x0, 0x100000, 0xf0682410, 0x0, 0x300, 0xf2bbc000) at netbsd:vmeintr4
+0x1c
vmeintr4(0x0, 0xf000cc04, 0x300, 0x8000e5, 0x40, 0xf06623c0) at netbsd:sparc_int
errupt44c+0x134
sparc_interrupt44c(0xf0646880, 0x1, 0x28f5c28, 0xf5c28f5c, 0xc28f8e7d, 0xf04b308
8) at netbsd:mi_switch+0x1b4
mi_switch(0xf0646880, 0xf049d800, 0x1864, 0xf0002000, 0x0, 0xf049d800) at netbsd
:softint_thread+0x164
softint_thread(0xf0002000, 0xf0646880, 0x1000, 0x0, 0xf2bb0080, 0xf062ffc0) at n
etbsd:lwp_trampoline+0x8
11.0beta fails with this after probing the SCSI bus.
[ 6.8900030] swwdog0: software watchdog initialized
[ 6.9300030] trap type 0x2: pc=0xf30afe5c npc=0xf30afe60 psr=0x8000c2<S,PS>
[ 6.9400030] kernel: illegal instruction trap
Stopped in pid 0.5 (system) at f30afe5c: ldx [%g5 + 0x44], %i
0
db> bt
0xf30afe5c(0x40, 0x100000, 0xf0768950, 0x0, 0x300, 0xf30ae000) at netbsd:vmeintr
4+0x1c
vmeintr4(0x0, 0xf000cc68, 0x300, 0x8000e5, 0x40, 0xf0758b80) at netbsd:sparc_int
errupt44c+0x134
sparc_interrupt44c(0xf073a880, 0x1, 0xdeb88440, 0xf05a0608, 0x35e, 0xc829cfcf) a
t netbsd:mi_switch+0x318
mi_switch(0xf073a880, 0xf0594b80, 0x0, 0x80, 0xf0002000, 0xf058a8c0) at netbsd:s
oftint_thread+0x294
softint_thread(0xf04a6b80, 0xf04fb918, 0xf04b2168, 0x0, 0xf0002000, 0x1000) at n
etbsd:lwp_trampoline+0x8
So looking at all these I notice the more recent kernels run into
trouble when handling an interrupt.
> > > Alternatively to support your hypothesis, maybe it's possible to run a
> > > newer NetBSD kernel patched and recompiled to not enable the cache?
> >
> > That could work and it would probably be easier than trying to figure
> > out which of the 32 chips is bad.
>
> Should be able to do that by adjusting getcacheinfo_sun4() to treat a
> 4/200 the same as a 4/100
> https://nxr.netbsd.org/xref/src/sys/arch/sparc/sparc/cpu.c#1169
>
> If that works it might be interesting to try enabling just a small
> part of the start of the cache and see if it works (and if there is
> any discernible speedup :)
Because of your comments previously I decided to run the extended
memory tests on the board and all passed. I swapped in the 3200 CPU
board and ran the memory tests on it and they all passed with no
errors. I then booted into NetBSD and ran some benchmarks that stress
the memory and everything worked. I'm still confident the memory board
works and according to the PROM tests the memory controller on the
4200 works fine. As for the question of the 3200 running the memory
slower so not triggering the problem I don't think so. The 32MB memory
board and 4200 CPU board were introduced at the same time and should
work together, and Sun never mentioned any kind of timing or speed
related incompatibilities with these boards.
So after testing all this I've realized I've never tried using a VME
device on a 4300 board. It could be that bad cache or bad memory on
the 4200 is just a red herring and the real issue is broken VME
support that I never noticed before since the 4300 has everything
builtin. That wouldn't explain why SunOS couldn't boot on the 4200 but
that could just be bad luck. I'm going to spend some time testing VME
boards on the 4300 now as well as some other things.
Home |
Main Index |
Thread Index |
Old Index