Discussion:
laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)
Matthew
2007-12-29 19:57:14 UTC
Permalink
Hi everybody,

since yesterday my laptop kept on hard-locking when launching 32bit
binaries / apps
I didn't know what to do but

miguel bot=F3n was the one pointing me in the right direction, namely b=
isect :)

kudos to him & the others involved in his zen-sources project:
http://repo.or.cz/w/linux-2.6/zen-sources.git

bisect said the following is the causer:

bfba91b199b0e67497db81f05dd1105c269712cb is first bad commit
commit bfba91b199b0e67497db81f05dd1105c269712cb
Author: Roland McGrath <***@redhat.com>
Date: Sun Dec 23 12:47:41 2007 +0100

x86 user_regset math_emu

This converts the ptrace/signal accessors for i387 math_emu
state to the user_regset interface style, and calls these
from the old interfaces.

It also cleans up math_emulate's ptrace check to be a
single-step check, which is what it really wants.

Signed-off-by: Roland McGrath <***@redhat.com>
Signed-off-by: Ingo Molnar <***@elte.hu>
Signed-off-by: Thomas Gleixner <***@linutronix.de>

:040000 040000 829c61799b4618522fabf435b2e1b7f4b338cebe
859f184810d1f504af20ba9919819fd41dbcd37c M arch

I'm now waiting for others to confirm it (another user of that
kernel-tree has also reported the same behavior):
http://forums.gentoo.org/viewtopic-p-4667387.html#4667387

architecture:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 11

system:
Portage 2.1.4_rc11 (default-linux/amd64/2007.0, gcc-4.2.2,
glibc-2.7-r1, 2.6.24-rc6-ga25ef5f6-dirty x86_64
app-shells/bash: 3.2_p17-r1
dev-java/java-config: 1.3.7, 2.1.3
dev-lang/python: 2.4.4-r7
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 2.0.0_rc6
sys-apps/sandbox: 1.2.18.1-r2
sys-devel/autoconf: 2.13, 2.61-r1
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, =
1.10
sys-devel/binutils: 2.16.1-r3, 2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool: 1.5.24
virtual/os-headers: 2.6.22-r2
ACCEPT_KEYWORDS=3D"amd64 ~amd64"

keep up the good work !

it's my first run of bisect so sorry for any false-alarm caused by
wrong handling - in advance ;)

Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Matthew
2007-12-29 23:04:17 UTC
Permalink
so I was wrong XD

sorry,

the error was found in the meantime:

see: http://forums.gentoo.org/viewtopic-p-4667858.html#4667858

Don't need to do more testing. The culprit is the unification of the
x86 i387 code.

The culprit is 57c3da2f5bb3fafedc31284117ae43bc593b65ab or
f10c1cfd359660c01446807b6c2bc8ce3aee919a

see http://forums.gentoo.org/viewtopic-p-4667906.html#4667906 and next post

Greetings

Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Miguel Botón
2007-12-30 00:28:35 UTC
Permalink
Post by Matthew
so I was wrong XD
sorry,
see: http://forums.gentoo.org/viewtopic-p-4667858.html#4667858
Don't need to do more testing. The culprit is the unification of the
x86 i387 code.
The culprit is 57c3da2f5bb3fafedc31284117ae43bc593b65ab or
f10c1cfd359660c01446807b6c2bc8ce3aee919a
see http://forums.gentoo.org/viewtopic-p-4667906.html#4667906 and nex=
t post
Post by Matthew
Greetings
Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kerne=
l" in
Post by Matthew
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
These hardlocks start to appear with commit=20
f10c1cfd359660c01446807b6c2bc8ce3aee919a

--=20
Miguel Bot=F3n
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-02 08:40:02 UTC
Permalink
Post by Miguel Botón
These hardlocks start to appear with commit=20
f10c1cfd359660c01446807b6c2bc8ce3aee919a
thanks, that's really useful! I dont see anything obviously wrong with=20
the commit though, and cannot (yet) reproduce it, so to help us track i=
t=20
down further, could you try to figure out the lockup site? Does the NMI=
=20
watchdog work on your box?

if the NMI watchdog does not work, can you reproduce it on a VGA=20
console? If yes, then running the app that does the hard lockup via=20
"strace -f <app>" could perhaps show us the last system call that=20
happened before the box locked up. If we are lucky it's something=20
specific.

here's the guilty commit from latest x86.git#mm:

commit 81376371f1ca371de54aa4488edc541580d95a01
Author: Roland McGrath <***@redhat.com>
Date: Tue Jan 1 21:55:28 2008 +0100

x86 i387 user_regset

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Matthew
2008-01-08 11:40:54 UTC
Permalink
Hi everyone,

sorry for the long delay

- I first had to get home & set up my rig to reproduce this hardlock
(repeatedly hardlocking / shutting down the laptop doesn't do too good
to the new hdd ;) )

and fortunately I was successful :)

sorry for the bad quality of the pics (they were taken with my phone):

Loading Image...
Loading Image...

steps to reproduce:
1.) log on
2.) startx
3.) opening some pure 64bit apps == working, no locks
4.) opening 32bit-apps (such as firefox-bin, thunderbird-bin) == hard
lock, only pulling power cord (on laptop) or reset button (rig) works,
magic sysrq key doesn't (keyboard & mouse == dead)

I'm currently writing from my "rescue system" (winxp ;) )
so if you need my kernel-config or some more info of the system please tell

Cheers

Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
H. Peter Anvin
2008-01-09 00:58:41 UTC
Permalink
Post by Matthew
Hi everyone,
sorry for the long delay
- I first had to get home & set up my rig to reproduce this hardlock
(repeatedly hardlocking / shutting down the laptop doesn't do too good
to the new hdd ;) )
and fortunately I was successful :)
http://omploader.org/vYWU1/moto_0025.jpg
http://omploader.org/vYWU2/moto_0026.jpg
1.) log on
2.) startx
3.) opening some pure 64bit apps == working, no locks
4.) opening 32bit-apps (such as firefox-bin, thunderbird-bin) == hard
lock, only pulling power cord (on laptop) or reset button (rig) works,
magic sysrq key doesn't (keyboard & mouse == dead)
I'm currently writing from my "rescue system" (winxp ;) )
so if you need my kernel-config or some more info of the system please tell
I have been unable to reproduce your problem here, and I notice you have
the proprietary, highly invasive and closed-source Nvidia driver
installed in your kernel.

Can you try using the "nv" or "vesa" (unaccelerated) Xorg drivers and
reproduce the problem that way?

If you *do* reproduce the problem that way, it would be extremely
helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
(not vmlinuz/bzImage) file that goes with the crash dump screenshot.

Thanks!

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Matthew
2008-01-10 09:05:54 UTC
Permalink
Post by H. Peter Anvin
I have been unable to reproduce your problem here, and I notice you have
the proprietary, highly invasive and closed-source Nvidia driver
installed in your kernel.
Can you try using the "nv" or "vesa" (unaccelerated) Xorg drivers and
reproduce the problem that way?
If you *do* reproduce the problem that way, it would be extremely
helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
(not vmlinuz/bzImage) file that goes with the crash dump screenshot.
Thanks!
I was able to reproduce it with removed nvidia module (rmmod nvidia) &
nv driver, and will post the pictures later if I find some time (it
was the same function if I recall right)
do you also need: CONFIG_DEBUG_BUGVERBOSE enabled ?
Post by H. Peter Anvin
-hpa
Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-10 09:42:19 UTC
Permalink
Post by Matthew
Post by H. Peter Anvin
I have been unable to reproduce your problem here, and I notice you have
the proprietary, highly invasive and closed-source Nvidia driver
installed in your kernel.
Can you try using the "nv" or "vesa" (unaccelerated) Xorg drivers and
reproduce the problem that way?
If you *do* reproduce the problem that way, it would be extremely
helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
(not vmlinuz/bzImage) file that goes with the crash dump screenshot.
Thanks!
I was able to reproduce it with removed nvidia module (rmmod nvidia) &
nv driver, and will post the pictures later if I find some time (it
CONFIG_DEBUG_BUGVERBOSE enabled ?
really, that module does all sorts of nasty stuff when inserted (and
then removed), so just to make sure (because you are about to crash your
box again to take a picture), could you try to boot up without never
even once loading the nvidia module?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Matthew
2008-01-10 12:43:50 UTC
Permalink
Post by Ingo Molnar
really, that module does all sorts of nasty stuff when inserted (and
then removed), so just to make sure (because you are about to crash your
box again to take a picture), could you try to boot up without never
even once loading the nvidia module?
and it still happens ;(
I un-emerged nvidia-drivers & checked via dmesg |grep nv -> it wasn't
loaded, but the box also hanged

it's a little tricky to reproduce it:
I tried it with root-account: firefox-bin, thunderbird-bin wouldn't trigger
user-account (with used account-directory of both apps):
thunderbird-bin triggers it more reliably
probably it has to do with the x86 compatibility apps of gentoo ?
gentoo amd64-users with 32bit firefox & thunderbird - anyone able to
reproduce it ?
it seemingly is being caused by softirq (see pictures; the zen-sources
is also using parts of rt-kernel); approx 1 minute later there also
was a spinlock lockup by syslog-ng (?)

I'll recompile the newest git-sources and see if it's still triggered
with hardirq & softirq disabled ...

http://www.kerneloftruth.neucode.org/other/crash_ia32_64/ (<--
omploader is down so I'll host the picture somewhere else)
hope there's everything revelant to see / read ...

I'll recompile the kernel in question with debug-info probably this
evening - if I find some time, you guys also need frame-pointers set ?

this also happens with rc7-based kernels, btw
Post by Ingo Molnar
Ingo
Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-10 12:48:10 UTC
Permalink
Post by Matthew
this also happens with rc7-based kernels, btw
hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you? Or
any patches ontop of it? (x86.git perhaps?)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Matthew
2008-01-10 12:59:35 UTC
Permalink
Post by Matthew
this also happens with rc7-based kernels, btw
hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you? =
Or
any patches ontop of it? (x86.git perhaps?)
see first post / mail (there are a few additional patches / trees
included: badram, wireless, alsa, tuxonice, madwifi, reiser4,
sched-devel, realtime-lsm, powertop, mactel)
since yesterday my laptop kept on hard-locking when launching 32bit
binaries / apps
I didn't know what to do but
miguel bot=F3n was the one pointing me in the right direction, namely =
bisect :)
http://repo.or.cz/w/linux-2.6/zen-sources.git
so I guess I need to counter-check it against your realtime-tree:
is it the following ?
http://git.eu.kernel.org/?p=3Dlinux/kernel/git/cloos/rt-2.6.git;a=3Dsum=
mary
(it's currently at rc5 ?)

or is hardirq / softirq also included in your sched-devel tree ?
http://git.eu.kernel.org/?p=3Dlinux/kernel/git/mingo/linux-2.6-sched-de=
vel.git;a=3Dsummary
Ingo
Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-10 13:24:48 UTC
Permalink
Post by Matthew
Post by Ingo Molnar
Post by Matthew
this also happens with rc7-based kernels, btw
hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you?
Or any patches ontop of it? (x86.git perhaps?)
<