Discussion:
Oops on linux 2.4.20-ac1
(too old to reply)
Orion Poplawski
2002-12-10 17:49:16 UTC
Permalink
I've been having a number of issues, mostly system lockups, with a
machine of ours - a dual proc athlon. I've removed some hardware and I
haven't seen a hand recently. However, we got an Oops message recently.
I lost that message because it wasn't written to any log (any way I can
fix that?). So, I upgraded the kernel to 2.4.20-ac1. Under that I
started getting Oops quite frequently. Here is my first attemp at
processing the message. Note that I switched back to the previous
kernel, but it's running the same module list and I tried to point
ksymoops to the correct pieces. I also typed the oops message in from
what I wrote down from the screen. Please let me know if I made a
mistake there.

ksymoops 2.4.1 on i686 2.4.19. Options used
-v /usr/src/linux-2.4.20-ac1/vmlinux (specified)
-k /var/log/ksyms.1 (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.20-ac1 (specified)
-m /boot/System.map-2.4.20-ac1 (specified)

Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sym53c8xx.o) for sym53c8xx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/net/ipv4/netfilter/netfilter.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/net/fc/fc.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/sound/sounddrivers.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/cdrom/driver.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/ide/raid/idedriver-raid.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/ide/ppc/idedriver-ppc.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/ide/legacy/idedriver-legacy.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/ide/arm/idedriver-arm.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/misc/misc.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/parport/driver.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/media/radio/radio.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/media/video/video.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/media/media.o
Warning (read_object): no symbols in
/lib/modules/2.4.20-ac1/build/drivers/hotplug/vmlinux-obj.o
Warning (compare_ksyms_lsmod): module 3c59x is in lsmod but not in
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module autofs is in lsmod but not in
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module binfmt_misc is in lsmod but not in
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module lockd is in lsmod but not in
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module nfs is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module nfsd is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module sunrpc is in lsmod but not in
ksyms, probably no symbols exported
Warning (map_ksym_to_module): cannot match loaded module ext3 to a
unique module object. Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module jbd to a unique
module object. Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module sym53c8xx to a
unique module object. Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module sd_mod to a
unique module object. Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module scsi_mod to a
unique module object. Trace may not be reliable.
Oops: 0002
CPU: 0
EIP: 0010:[<f89641eb>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010216
eax: 00000040 ebx: 00000028 ecx: 00000060 edx: f590a960
esi: f590a800 edi: f590a960 ebp: c4650ed2 esp: f4a45eb0
ds: 0018 es: 0018 ss: 0018
Stack: 0000003c 0000001f 00000000 0000b800 00000060 f590a960 f5e4d62c
00000082
00000001 00000082 f5dd3fd8 00000001 00000086 00000001 00000086
f5a98000
00000000 0000000e f4a45f08 00000000 f5a4d600 f7ff0340 00000040
c01beaff
Call Trace: [<c01beaff>] [<c0120593>] [<f896382a>]
[<c011f92d>] [<c0109b89>] [<c0109cf8>]
Code: b8 01 30 00 00 83 c2 0e 66 ef 8b 7c 24 14 8b 87 dc 00 00 00
EIP; f89641eb <END_OF_CODE+10598c/????> <=====
Trace; c01beaff <netif_receive_skb+ff/130>
Trace; c0120593 <send_sig_info+73/90>
Trace; f896382a <END_OF_CODE+104fcb/????>
Trace; c011f92d <do_timer+3d/70>
Trace; c0109b89 <handle_IRQ_event+39/60>
Trace; c0109cf8 <do_IRQ+68/b0>
Code; f89641eb <END_OF_CODE+10598c/????>
00000000 <_EIP>:
Code; f89641eb <END_OF_CODE+10598c/????> <=====
0: b8 01 30 00 00 mov $0x3001,%eax <=====
Code; f89641f0 <END_OF_CODE+105991/????>
5: 83 c2 0e add $0xe,%edx
Code; f89641f3 <END_OF_CODE+105994/????>
8: 66 ef out %ax,(%dx)
Code; f89641f5 <END_OF_CODE+105996/????>
a: 8b 7c 24 14 mov 0x14(%esp,1),%edi
Code; f89641f9 <END_OF_CODE+10599a/????>
e: 8b 87 dc 00 00 00 mov 0xdc(%edi),%eax


26 warnings and 5 errors issued. Results may not be reliable.
Alan Cox
2002-12-10 21:00:18 UTC
Permalink
Post by Orion Poplawski
I've been having a number of issues, mostly system lockups, with a
machine of ours - a dual proc athlon. I've removed some hardware and I
Random lockups on dual athlons are a notorious problem under all OS's.
Start by checking it passes memtest86, that will verify the RAM is ok -
and the AMD is -very- picky about RAM.

If thats ok then let me know which board you have, what is plugged into
it and what PSU you are using.
scott thomason
2002-12-11 02:00:54 UTC
Permalink
Post by Alan Cox
Random lockups on dual athlons are a notorious problem under all
OS's. Start by checking it passes memtest86, that will verify the
RAM is ok - and the AMD is -very- picky about RAM.
If thats ok then let me know which board you have, what is plugged
into it and what PSU you are using.
I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning
my memory for new chips the store owner memtest86'd, my combo of cpus
and mobo was finding the occasional error. I finally ended up
resolving it by simply underclocking the bus about 6Mhz :(

Next time, I'm buying ECC memory.
---scott
Orion Poplawski
2002-12-11 16:10:03 UTC
Permalink
Post by scott thomason
I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning
my memory for new chips the store owner memtest86'd, my combo of cpus
and mobo was finding the occasional error. I finally ended up
resolving it by simply underclocking the bus about 6Mhz :(
Next time, I'm buying ECC memory.
---scott
Underclocking has been my "solution" to these lockups as well. Would
ECC memory actually help in this case though?
Orion Poplawski
2002-12-11 16:08:34 UTC
Permalink
Post by scott thomason
Post by Alan Cox
Random lockups on dual athlons are a notorious problem under all
OS's. Start by checking it passes memtest86, that will verify the
RAM is ok - and the AMD is -very- picky about RAM.
If thats ok then let me know which board you have, what is plugged
into it and what PSU you are using.
I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning
my memory for new chips the store owner memtest86'd, my combo of cpus
and mobo was finding the occasional error. I finally ended up
resolving it by simply underclocking the bus about 6Mhz :(
Next time, I'm buying ECC memory.
---scott
Is there a good site for pointers towards assembling reliable Linux
machines? It seems to me the trickiest part of the whole operation is
choosing good hardware in the first place. I just started a new job and
inherited a buch of new but flakey machines, and I'd like to avoid doing
that in the future.
John Bradford
2002-12-11 16:33:47 UTC
Permalink
Post by scott thomason
Post by Alan Cox
Random lockups on dual athlons are a notorious problem under all
OS's. Start by checking it passes memtest86, that will verify the
RAM is ok - and the AMD is -very- picky about RAM.
If thats ok then let me know which board you have, what is plugged
into it and what PSU you are using.
I have two AMD MP 2000+ cpus in an ASUS A7M266-D. Even after returning
my memory for new chips the store owner memtest86'd, my combo of cpus
and mobo was finding the occasional error. I finally ended up
resolving it by simply underclocking the bus about 6Mhz :(
Next time, I'm buying ECC memory.
Why? ECC memory guards against a single bit error in the RAM, nothing
else, (except that it also reports double bit errors).

John.
Alan Cox
2002-12-11 17:01:48 UTC
Permalink
Post by Orion Poplawski
Is there a good site for pointers towards assembling reliable Linux
machines? It seems to me the trickiest part of the whole operation is
choosing good hardware in the first place. I just started a new job and
inherited a buch of new but flakey machines, and I'd like to avoid doing
that in the future.
The AMD duals have been a disaster in my experience. Its a shame because
when they do go they really are very fast boxes. The biggest factor I've
found is chipsets.
Jason L Tibbitts III
2002-12-11 17:21:17 UTC
Permalink
AC> The AMD duals have been a disaster in my experience.

I do have a bunch of these running reliably (RH 7.3 plus the latest
OpenMosix kernel). I had to go through a few combinations of
motherboard and RAM (four different manufacturers of RAM) before I got
something that works. Processors are MP 1900+ or 2000+, boards are
Tyan S2466, memory is in PC2100 ECC registered 512MB sticks from
Corsair. Case and power supply are PC Power and Cooling, mid tower,
450W PS, every fan bay filled. These machines have been rock
stable for months except for a failed IBM deathstar drive and an
over-temp shutdown when the room AC failed.

I still have a couple of the 760MP boards (as opposed to the MPX
boards) which I just can't get to run properly with two processors.

- J<
Patrick Finnegan
2002-12-11 23:35:44 UTC
Permalink
Post by Alan Cox
Post by Orion Poplawski
Is there a good site for pointers towards assembling reliable Linux
machines? It seems to me the trickiest part of the whole operation is
choosing good hardware in the first place. I just started a new job and
inherited a buch of new but flakey machines, and I'd like to avoid doing
that in the future.
The AMD duals have been a disaster in my experience. Its a shame because
when they do go they really are very fast boxes. The biggest factor I've
found is chipsets.
Which chipset - the new or the old one? I've got an ASUS A7M266D (or
something) that's based on the AMD 760MPX chipset and has 512MB of
Registered ECC memory, and a pair of XP 1800+'s... and it works just
beautifuly. Truely rock solid.

Pat
--
Purdue Universtiy ITAP/RCS
Information Technology at Purdue
Research Computing and Storage
http://www-rcd.cc.purdue.edu
Alan Cox
2002-12-12 01:24:48 UTC
Permalink
Post by Patrick Finnegan
Which chipset - the new or the old one? I've got an ASUS A7M266D (or
something) that's based on the AMD 760MPX chipset and has 512MB of
Registered ECC memory, and a pair of XP 1800+'s... and it works just
beautifuly. Truely rock solid.
Same board you have.

Orion Poplawski
2002-12-11 23:00:06 UTC
Permalink
Post by Alan Cox
Random lockups on dual athlons are a notorious problem under all OS's.
Start by checking it passes memtest86, that will verify the RAM is ok -
and the AMD is -very- picky about RAM.
If thats ok then let me know which board you have, what is plugged into
it and what PSU you are using.
memtest86 completed 3 passes with no errors, so:

MB:
Asus A7M266-D w/ Dual Athlon 2100 MP and 4 x 512MB PC2100 ECC Dimms
AMD 762 Chipset
RAM clocking is "normal"

Cards:
PCI 3com 3c905-TX ethernet
PCI Tekram DC-390U3W SCSI Controller
PCI ATI 3d Rage II Video

1 IDE Hard disk
1 external SCSI disk

PSU is a Turbo-Cool 475 ATX-PFC (appears to be 460W)
Continue reading on narkive:
Loading...