Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: panic at boot on 4/200



On Sat, Jan 10, 2026 at 6:48 AM David Brownlee <abs%absd.org@localhost> wrote:
>
> On Sat, 10 Jan 2026 at 03:16, foo bar <tokenalt%gmail.com@localhost> wrote:
> >
> > On Thu, Jan 8, 2026 at 2:25 AM Romain Dolbeau <romain%dolbeau.org@localhost> wrote:
> >
> > > When you say "everything else", did that include trying NetBSD-2.0 to
> > > narrow down the issue ?
> > > From the release note, 2.0 introduced SMP on SPARC, which /may/ have
> > > involved reworking the cache support a bit (for e.g. coherency).
> > > (Shot in the dark, and  4.1.1 was also pre-SMP support...).
> >
> > From memory I tried 1.5.1, 1.5.3, 2.0, 4.0, 6.0, 8.0, 9.0, and
> > current. I'll retest everything when I get back to the board.
>
> So based on the hypothesis we would expect 1.6.2 to be the last version to work

So because of all the interest I decided to setup the 4200 again and
retest kernels. 1.5.x all work, 1.6 and later do not. The errors each
version gets are different but consisten for that version even across
power cycles.

For example 1.6 fails with this right at the start.
Memory alignment error with PC 0xF00096AC.  Instruction "0xC2274000".

2.0 fails with this.
panic: lockmgr: no context
syncing disks... done
Frame pointer is at 0xf0304ef8
Call traceback:
  pc = 0xf027a43c  args = (0xf035cd80, 0x5, 0x0, 0x0, 0xf0305018, 0x1,
0xf0304f60) fp = 0xf0304f60
  pc = 0xf01a3f20  args = (0x100, 0x0, 0xf0367480, 0x0, 0xf04aa008,
0xf000a22c, 0xf0304fd0) fp = 0xf0304fd0
  pc = 0xf0183790  args = (0xf02d7da0, 0x80, 0x18800020, 0xf0231a44,
0xf035ec00, 0x100, 0xf0305038) fp = 0xf0305038
  pc = 0xf02179b0  args = (0xf035cbb4, 0x1, 0x0, 0x8000e6, 0x0,
0xf04a3600, 0xf03050a8) fp = 0xf03050a8
  pc = 0xf0216460  args = (0xf0305220, 0x0, 0x2000, 0x0, 0xffeb8e14,
0xf0365000, 0xf0305110) fp = 0xf0305110
  pc = 0xf02888a8  args = (0xf035cbb0, 0x1ff0000, 0x0, 0x1, 0x0,
0xf424, 0xf0305248) fp = 0xf0305248
  pc = 0xf000857c  args = (0x9, 0x80, 0x1ff0106, 0xf027f9d8, 0x10003,
0xf0305380, 0xf0305320) fp = 0xf0305320
  pc = 0xf0194f54  args = (0xf035cf18, 0x1ff0100, 0x100, 0x8000e3,
0xf1ee0000, 0x200, 0xf03053d0) fp = 0xf03053d0
  pc = 0xf0288824  args = (0x0, 0x18800000, 0xf0367480, 0x0,
0xf04aa008, 0xf000a22c, 0xf0305438) fp = 0xf0305438
  pc = 0xf000857c  args = (0x9, 0x80, 0x18800020, 0xf0231a44, 0x20003,
0xf0305570, 0xf0305510) fp = 0xf0305510
  pc = 0xf0008a90  args = (0xf043bf00, 0xf0233020, 0x300, 0x8000e6,
0x0, 0xf04a3600, 0xf03055c0) fp = 0xf03055c0
  pc = 0xf02330ac  args = (0xf0195710, 0x0, 0x2000, 0x0, 0xffeb8e14,
0xf0308940, 0xf0305628) fp = 0xf0305628
  pc = 0x0  args = (0xf0305688, 0x927c0, 0xf02a2800, 0x0, 0x0, 0xf424,
0x0) fp = 0x0
rebooting

3.0, 4.0, 5.0, 7.0, 8.0 get a watchdog reset after probing the SCSI bus.

6.0 fails with this after probing the SCSI bus.
trap type 0x2: pc=0xf2ae9e80 npc=0xf2ae9e84 psr=0x8000c2<S,PS>
kernel: illegal instruction trap
Stopped in pid 0.5 (system) at  f2ae9e80:       illtrap         f2ae9e80
db> bt
0xf2ae9e80(0x40, 0xf040c6bc, 0x4, 0xf0510308, 0xea60, 0xf0564c80) at netbsd:vmei
ntr4+0x1c
vmeintr4(0x0, 0xf0313aa8, 0x300, 0x8000e5, 0x0, 0x280) at netbsd:sparc_interrupt
44c+0x134
sparc_interrupt44c(0xf03e7800, 0x0, 0xf0245b84, 0xf0564c80, 0x0,
0xf05aea00) at n
etbsd:mi_switch+0x1ec
mi_switch(0xf0577340, 0xfffffff8, 0x0, 0x0, 0xffeb8e14, 0xf0002800) at netbsd:so
ftint_thread+0x158
softint_thread(0xf2ae2080, 0xf0577340, 0x1, 0x18, 0x10e1, 0x1fe1) at netbsd:lwp_
setfunc_trampoline


9.0 fails with this after probing the SCSI bus.
[   5.4900080] trap type 0x2: pc=0xf2c3de60 npc=0xf2c3de64 psr=0x8000c2<S,PS>
[   5.5000080] kernel: illegal instruction trap
Stopped in pid 0.6 (system) at  f2c3de60:       bn,pn           f2c40d78
db> bt
0xf2c3de60(0x40, 0x100000, 0xf06b2c58, 0x0, 0x300, 0xf2c3c000) at netbsd:vmeintr
4+0x1c
vmeintr4(0x0, 0xf000ce9c, 0x300, 0x8000e5, 0x40, 0xf070a010) at netbsd:sparc_int
errupt44c+0x134
sparc_interrupt44c(0xf0524c00, 0x1, 0x28f5c28, 0xf5c28f5c, 0x2eb, 0xf052a908) at
 netbsd:mi_switch+0x2f8
mi_switch(0x0, 0x0, 0x0, 0xf051b800, 0xf0002000, 0xf06c4b20) at netbsd:softint_t
hread+0x2b4
softint_thread(0xf04949b0, 0x0, 0xf067ebb0, 0xf0002000, 0xf06c4b20, 0x2000) at n
etbsd:lwp_trampoline+0x8


10.0 fails with this after probing the SCSI bus.
[   6.8900030] swwdog0: software watchdog initialized
[   6.9300030] trap type 0x2: pc=0xf2bbde6c npc=0xf2bbde70 psr=0x8000c2<S,PS>
[   6.9400030] kernel: illegal instruction trap
Stopped in pid 0.5 (system) at  f2bbde6c:       illtrap         f2bbe29c
db> bt
0xf2bbde6c(0x0, 0x100000, 0xf0682410, 0x0, 0x300, 0xf2bbc000) at netbsd:vmeintr4
+0x1c
vmeintr4(0x0, 0xf000cc04, 0x300, 0x8000e5, 0x40, 0xf06623c0) at netbsd:sparc_int
errupt44c+0x134
sparc_interrupt44c(0xf0646880, 0x1, 0x28f5c28, 0xf5c28f5c, 0xc28f8e7d, 0xf04b308
8) at netbsd:mi_switch+0x1b4
mi_switch(0xf0646880, 0xf049d800, 0x1864, 0xf0002000, 0x0, 0xf049d800) at netbsd
:softint_thread+0x164
softint_thread(0xf0002000, 0xf0646880, 0x1000, 0x0, 0xf2bb0080, 0xf062ffc0) at n
etbsd:lwp_trampoline+0x8

11.0beta fails with this after probing the SCSI bus.
[   6.8900030] swwdog0: software watchdog initialized
[   6.9300030] trap type 0x2: pc=0xf30afe5c npc=0xf30afe60 psr=0x8000c2<S,PS>
[   6.9400030] kernel: illegal instruction trap
Stopped in pid 0.5 (system) at  f30afe5c:       ldx             [%g5 + 0x44], %i
0
db> bt
0xf30afe5c(0x40, 0x100000, 0xf0768950, 0x0, 0x300, 0xf30ae000) at netbsd:vmeintr
4+0x1c
vmeintr4(0x0, 0xf000cc68, 0x300, 0x8000e5, 0x40, 0xf0758b80) at netbsd:sparc_int
errupt44c+0x134
sparc_interrupt44c(0xf073a880, 0x1, 0xdeb88440, 0xf05a0608, 0x35e, 0xc829cfcf) a
t netbsd:mi_switch+0x318
mi_switch(0xf073a880, 0xf0594b80, 0x0, 0x80, 0xf0002000, 0xf058a8c0) at netbsd:s
oftint_thread+0x294
softint_thread(0xf04a6b80, 0xf04fb918, 0xf04b2168, 0x0, 0xf0002000, 0x1000) at n
etbsd:lwp_trampoline+0x8

So looking at all these I notice the more recent kernels run into
trouble when handling an interrupt.

> > > Alternatively to support your hypothesis, maybe it's possible to run a
> > > newer NetBSD kernel patched and recompiled to not enable the cache?
> >
> > That could work and it would probably be easier than trying to figure
> > out which of the 32 chips is bad.
>
> Should be able to do that by adjusting getcacheinfo_sun4() to treat a
> 4/200 the same as a 4/100
> https://nxr.netbsd.org/xref/src/sys/arch/sparc/sparc/cpu.c#1169
>
> If that works it might be interesting to try enabling just a small
> part of the start of the cache and see if it works (and if there is
> any discernible speedup :)

Because of your comments previously I decided to run the extended
memory tests on the board and all passed. I swapped in the 3200 CPU
board and ran the memory tests on it and they all passed with no
errors. I then booted into NetBSD and ran some benchmarks that stress
the memory and everything worked. I'm still confident the memory board
works and according to the PROM tests the memory controller on the
4200 works fine. As for the question of the 3200 running the memory
slower so not triggering the problem I don't think so. The 32MB memory
board and 4200 CPU board were introduced at the same time and should
work together, and Sun never mentioned any kind of timing or speed
related incompatibilities with these boards.

So after testing all this I've realized I've never tried using a VME
device on a 4300 board. It could be that bad cache or bad memory on
the 4200 is just a red herring and the real issue is broken VME
support that I never noticed before since the 4300 has everything
builtin. That wouldn't explain why SunOS couldn't boot on the 4200 but
that could just be bad luck. I'm going to spend some time testing VME
boards on the 4300 now as well as some other things.


Home | Main Index | Thread Index | Old Index