Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: panic at boot on 4/200



(Sorry for the top post!)

I assume that the VME backplane has active termination.  If so, locate the voltage regulators and make sure they’re generating the correct levels (should be 2.9V or so assuming the usual 330R upper and 470R used to set the reference voltage from the +5V rail).

-- thorpej
Sent from my iPhone.

> On Jan 12, 2026, at 6:36 PM, foo bar <tokenalt%gmail.com@localhost> wrote:
> 
> On Sat, Jan 10, 2026 at 6:48 AM David Brownlee <abs%absd.org@localhost> wrote:
>> 
>>> On Sat, 10 Jan 2026 at 03:16, foo bar <tokenalt%gmail.com@localhost> wrote:
>>> 
>>>> On Thu, Jan 8, 2026 at 2:25 AM Romain Dolbeau <romain%dolbeau.org@localhost> wrote:
>>> 
>>>> When you say "everything else", did that include trying NetBSD-2.0 to
>>>> narrow down the issue ?
>>>> From the release note, 2.0 introduced SMP on SPARC, which /may/ have
>>>> involved reworking the cache support a bit (for e.g. coherency).
>>>> (Shot in the dark, and  4.1.1 was also pre-SMP support...).
>>> 
>>> From memory I tried 1.5.1, 1.5.3, 2.0, 4.0, 6.0, 8.0, 9.0, and
>>> current. I'll retest everything when I get back to the board.
>> 
>> So based on the hypothesis we would expect 1.6.2 to be the last version to work
> 
> So because of all the interest I decided to setup the 4200 again and
> retest kernels. 1.5.x all work, 1.6 and later do not. The errors each
> version gets are different but consisten for that version even across
> power cycles.
> 
> For example 1.6 fails with this right at the start.
> Memory alignment error with PC 0xF00096AC.  Instruction "0xC2274000".
> 
> 2.0 fails with this.
> panic: lockmgr: no context
> syncing disks... done
> Frame pointer is at 0xf0304ef8
> Call traceback:
>  pc = 0xf027a43c  args = (0xf035cd80, 0x5, 0x0, 0x0, 0xf0305018, 0x1,
> 0xf0304f60) fp = 0xf0304f60
>  pc = 0xf01a3f20  args = (0x100, 0x0, 0xf0367480, 0x0, 0xf04aa008,
> 0xf000a22c, 0xf0304fd0) fp = 0xf0304fd0
>  pc = 0xf0183790  args = (0xf02d7da0, 0x80, 0x18800020, 0xf0231a44,
> 0xf035ec00, 0x100, 0xf0305038) fp = 0xf0305038
>  pc = 0xf02179b0  args = (0xf035cbb4, 0x1, 0x0, 0x8000e6, 0x0,
> 0xf04a3600, 0xf03050a8) fp = 0xf03050a8
>  pc = 0xf0216460  args = (0xf0305220, 0x0, 0x2000, 0x0, 0xffeb8e14,
> 0xf0365000, 0xf0305110) fp = 0xf0305110
>  pc = 0xf02888a8  args = (0xf035cbb0, 0x1ff0000, 0x0, 0x1, 0x0,
> 0xf424, 0xf0305248) fp = 0xf0305248
>  pc = 0xf000857c  args = (0x9, 0x80, 0x1ff0106, 0xf027f9d8, 0x10003,
> 0xf0305380, 0xf0305320) fp = 0xf0305320
>  pc = 0xf0194f54  args = (0xf035cf18, 0x1ff0100, 0x100, 0x8000e3,
> 0xf1ee0000, 0x200, 0xf03053d0) fp = 0xf03053d0
>  pc = 0xf0288824  args = (0x0, 0x18800000, 0xf0367480, 0x0,
> 0xf04aa008, 0xf000a22c, 0xf0305438) fp = 0xf0305438
>  pc = 0xf000857c  args = (0x9, 0x80, 0x18800020, 0xf0231a44, 0x20003,
> 0xf0305570, 0xf0305510) fp = 0xf0305510
>  pc = 0xf0008a90  args = (0xf043bf00, 0xf0233020, 0x300, 0x8000e6,
> 0x0, 0xf04a3600, 0xf03055c0) fp = 0xf03055c0
>  pc = 0xf02330ac  args = (0xf0195710, 0x0, 0x2000, 0x0, 0xffeb8e14,
> 0xf0308940, 0xf0305628) fp = 0xf0305628
>  pc = 0x0  args = (0xf0305688, 0x927c0, 0xf02a2800, 0x0, 0x0, 0xf424,
> 0x0) fp = 0x0
> rebooting
> 
> 3.0, 4.0, 5.0, 7.0, 8.0 get a watchdog reset after probing the SCSI bus.
> 
> 6.0 fails with this after probing the SCSI bus.
> trap type 0x2: pc=0xf2ae9e80 npc=0xf2ae9e84 psr=0x8000c2<S,PS>
> kernel: illegal instruction trap
> Stopped in pid 0.5 (system) at  f2ae9e80:       illtrap         f2ae9e80
> db> bt
> 0xf2ae9e80(0x40, 0xf040c6bc, 0x4, 0xf0510308, 0xea60, 0xf0564c80) at netbsd:vmei
> ntr4+0x1c
> vmeintr4(0x0, 0xf0313aa8, 0x300, 0x8000e5, 0x0, 0x280) at netbsd:sparc_interrupt
> 44c+0x134
> sparc_interrupt44c(0xf03e7800, 0x0, 0xf0245b84, 0xf0564c80, 0x0,
> 0xf05aea00) at n
> etbsd:mi_switch+0x1ec
> mi_switch(0xf0577340, 0xfffffff8, 0x0, 0x0, 0xffeb8e14, 0xf0002800) at netbsd:so
> ftint_thread+0x158
> softint_thread(0xf2ae2080, 0xf0577340, 0x1, 0x18, 0x10e1, 0x1fe1) at netbsd:lwp_
> setfunc_trampoline
> 
> 
> 9.0 fails with this after probing the SCSI bus.
> [   5.4900080] trap type 0x2: pc=0xf2c3de60 npc=0xf2c3de64 psr=0x8000c2<S,PS>
> [   5.5000080] kernel: illegal instruction trap
> Stopped in pid 0.6 (system) at  f2c3de60:       bn,pn           f2c40d78
> db> bt
> 0xf2c3de60(0x40, 0x100000, 0xf06b2c58, 0x0, 0x300, 0xf2c3c000) at netbsd:vmeintr
> 4+0x1c
> vmeintr4(0x0, 0xf000ce9c, 0x300, 0x8000e5, 0x40, 0xf070a010) at netbsd:sparc_int
> errupt44c+0x134
> sparc_interrupt44c(0xf0524c00, 0x1, 0x28f5c28, 0xf5c28f5c, 0x2eb, 0xf052a908) at
> netbsd:mi_switch+0x2f8
> mi_switch(0x0, 0x0, 0x0, 0xf051b800, 0xf0002000, 0xf06c4b20) at netbsd:softint_t
> hread+0x2b4
> softint_thread(0xf04949b0, 0x0, 0xf067ebb0, 0xf0002000, 0xf06c4b20, 0x2000) at n
> etbsd:lwp_trampoline+0x8
> 
> 
> 10.0 fails with this after probing the SCSI bus.
> [   6.8900030] swwdog0: software watchdog initialized
> [   6.9300030] trap type 0x2: pc=0xf2bbde6c npc=0xf2bbde70 psr=0x8000c2<S,PS>
> [   6.9400030] kernel: illegal instruction trap
> Stopped in pid 0.5 (system) at  f2bbde6c:       illtrap         f2bbe29c
> db> bt
> 0xf2bbde6c(0x0, 0x100000, 0xf0682410, 0x0, 0x300, 0xf2bbc000) at netbsd:vmeintr4
> +0x1c
> vmeintr4(0x0, 0xf000cc04, 0x300, 0x8000e5, 0x40, 0xf06623c0) at netbsd:sparc_int
> errupt44c+0x134
> sparc_interrupt44c(0xf0646880, 0x1, 0x28f5c28, 0xf5c28f5c, 0xc28f8e7d, 0xf04b308
> 8) at netbsd:mi_switch+0x1b4
> mi_switch(0xf0646880, 0xf049d800, 0x1864, 0xf0002000, 0x0, 0xf049d800) at netbsd
> :softint_thread+0x164
> softint_thread(0xf0002000, 0xf0646880, 0x1000, 0x0, 0xf2bb0080, 0xf062ffc0) at n
> etbsd:lwp_trampoline+0x8
> 
> 11.0beta fails with this after probing the SCSI bus.
> [   6.8900030] swwdog0: software watchdog initialized
> [   6.9300030] trap type 0x2: pc=0xf30afe5c npc=0xf30afe60 psr=0x8000c2<S,PS>
> [   6.9400030] kernel: illegal instruction trap
> Stopped in pid 0.5 (system) at  f30afe5c:       ldx             [%g5 + 0x44], %i
> 0
> db> bt
> 0xf30afe5c(0x40, 0x100000, 0xf0768950, 0x0, 0x300, 0xf30ae000) at netbsd:vmeintr
> 4+0x1c
> vmeintr4(0x0, 0xf000cc68, 0x300, 0x8000e5, 0x40, 0xf0758b80) at netbsd:sparc_int
> errupt44c+0x134
> sparc_interrupt44c(0xf073a880, 0x1, 0xdeb88440, 0xf05a0608, 0x35e, 0xc829cfcf) a
> t netbsd:mi_switch+0x318
> mi_switch(0xf073a880, 0xf0594b80, 0x0, 0x80, 0xf0002000, 0xf058a8c0) at netbsd:s
> oftint_thread+0x294
> softint_thread(0xf04a6b80, 0xf04fb918, 0xf04b2168, 0x0, 0xf0002000, 0x1000) at n
> etbsd:lwp_trampoline+0x8
> 
> So looking at all these I notice the more recent kernels run into
> trouble when handling an interrupt.
> 
>>>> Alternatively to support your hypothesis, maybe it's possible to run a
>>>> newer NetBSD kernel patched and recompiled to not enable the cache?
>>> 
>>> That could work and it would probably be easier than trying to figure
>>> out which of the 32 chips is bad.
>> 
>> Should be able to do that by adjusting getcacheinfo_sun4() to treat a
>> 4/200 the same as a 4/100
>> https://nxr.netbsd.org/xref/src/sys/arch/sparc/sparc/cpu.c#1169
>> 
>> If that works it might be interesting to try enabling just a small
>> part of the start of the cache and see if it works (and if there is
>> any discernible speedup :)
> 
> Because of your comments previously I decided to run the extended
> memory tests on the board and all passed. I swapped in the 3200 CPU
> board and ran the memory tests on it and they all passed with no
> errors. I then booted into NetBSD and ran some benchmarks that stress
> the memory and everything worked. I'm still confident the memory board
> works and according to the PROM tests the memory controller on the
> 4200 works fine. As for the question of the 3200 running the memory
> slower so not triggering the problem I don't think so. The 32MB memory
> board and 4200 CPU board were introduced at the same time and should
> work together, and Sun never mentioned any kind of timing or speed
> related incompatibilities with these boards.
> 
> So after testing all this I've realized I've never tried using a VME
> device on a 4300 board. It could be that bad cache or bad memory on
> the 4200 is just a red herring and the real issue is broken VME
> support that I never noticed before since the 4300 has everything
> builtin. That wouldn't explain why SunOS couldn't boot on the 4200 but
> that could just be bad luck. I'm going to spend some time testing VME
> boards on the 4300 now as well as some other things.


Home | Main Index | Thread Index | Old Index