Discussion:
2.6.24-rc8 hangs at mfgpt-timer
Arnd Hannemann
2008-01-16 17:44:07 UTC
Permalink
Hi,

I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
and it hangs during boot:

[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...

full output attached.
Config available here:
http://www.umic-mesh.net/~hannemann/config-2.6.24-rc8

Best regards,
Arnd Hannemann
Andres Salomon
2008-01-16 21:19:12 UTC
Permalink
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andres Salomon
2008-01-16 21:56:06 UTC
Permalink
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
Also, could you provide a log with the following (untested) patch? I'm
curious how many MFGPTs we're actually detecting as being available, and
the existing code is backwards.
Post by Andres Salomon
From 0ea825fd564056ac653507517beb5cea9034e464 Mon Sep 17 00:00:00 2001
From: Andres Salomon <***@debian.org>
Date: Wed, 16 Jan 2008 16:53:16 -0500
Subject: [PATCH] x86: geode: fix up ordering of messages during MFGPT detection

We're printing out informational messages in the wrong order while
probing for MFGPTs. This changes the order so that we first display
the number of available timers that were detected before displaying
messages about how we actually are using those timers.

Signed-off-by: Andres Salomon <***@debian.org>
---
arch/x86/kernel/geode_32.c | 5 +----
arch/x86/kernel/mfgpt_32.c | 7 +++----
include/asm-x86/geode.h | 2 +-
3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/geode_32.c b/arch/x86/kernel/geode_32.c
index f12d8c5..00b45ae 100644
--- a/arch/x86/kernel/geode_32.c
+++ b/arch/x86/kernel/geode_32.c
@@ -145,14 +145,11 @@ EXPORT_SYMBOL_GPL(geode_gpio_setup_event);

static int __init geode_southbridge_init(void)
{
- int timers;
-
if (!is_geode())
return -ENODEV;

init_lbars();
- timers = geode_mfgpt_detect();
- printk(KERN_INFO "geode: %d MFGPT timers available.\n", timers);
+ geode_mfgpt_detect();
return 0;
}

diff --git a/arch/x86/kernel/mfgpt_32.c b/arch/x86/kernel/mfgpt_32.c
index 0ab680f..d11cba3 100644
--- a/arch/x86/kernel/mfgpt_32.c
+++ b/arch/x86/kernel/mfgpt_32.c
@@ -70,14 +70,14 @@ __setup("nomfgpt", mfgpt_disable);
* In other cases (such as with VSAless OpenFirmware), the system firmware
* leaves timers available for us to use.
*/
-int __init geode_mfgpt_detect(void)
+void __init geode_mfgpt_detect(void)
{
int count = 0, i;
u16 val;

if (disable) {
printk(KERN_INFO "geode-mfgpt: Skipping MFGPT setup\n");
- return 0;
+ return;
}

for (i = 0; i < MFGPT_MAX_TIMERS; i++) {
@@ -87,11 +87,10 @@ int __init geode_mfgpt_detect(void)
count++;
}
}
+ printk(KERN_INFO "geode-mfgpt: %d MFGPT timers available.\n", count);

/* set up clock event device, if desired */
i = mfgpt_timer_setup();
Arnd Hannemann
2008-01-17 09:54:30 UTC
Permalink
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
Also note when I do enable the mysterios "MFGPT workaround" option in
the bios the machine hangs directly after:
[ 36.780990] NET: Registered protocol family 16
Post by Andres Salomon
Also, could you provide a log with the following (untested) patch? I'm
curious how many MFGPTs we're actually detecting as being available, and
the existing code is backwards.
The relevant part would be:
[ 23.092507] NET: Registered protocol family 16
[ 23.105875] geode-mfgpt: 8 MFGPT timers available
[ 23.120247] geode-mfgpt: Registered timer 0
[ 23.133076] mfgpt-timer: registering the MFGT timer as a clock event.

Best regards,
Arnd Hannemann

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andres Salomon
2008-01-17 18:40:32 UTC
Permalink
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Post by Arnd Hannemann
Also note when I do enable the mysterios "MFGPT workaround" option in
[ 36.780990] NET: Registered protocol family 16
"MFGPT workaround"? That sounds a bit frightening.

Presumably, the BIOS is using the MFGPTs, but we're not detecting them as
being in use.

I'm assuming that booting with 'nomfgpt' works for you?
Post by Arnd Hannemann
Post by Andres Salomon
Also, could you provide a log with the following (untested) patch? I'm
curious how many MFGPTs we're actually detecting as being available, and
the existing code is backwards.
[ 23.092507] NET: Registered protocol family 16
[ 23.105875] geode-mfgpt: 8 MFGPT timers available
[ 23.120247] geode-mfgpt: Registered timer 0
[ 23.133076] mfgpt-timer: registering the MFGT timer as a clock event.
Best regards,
Arnd Hannemann
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Hannemann
2008-01-17 19:53:57 UTC
Permalink
Post by Andres Salomon
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Its an ALIX board from PCEngines, they have their own BIOS
implementation (tinyBios).
http://www.pcengines.ch/alix.htm
Post by Andres Salomon
Post by Arnd Hannemann
Also note when I do enable the mysterios "MFGPT workaround" option in
[ 36.780990] NET: Registered protocol family 16
"MFGPT workaround"? That sounds a bit frightening.
Presumably, the BIOS is using the MFGPTs, but we're not detecting them as
being in use.
Yes I think so too, for the fun of it I compiled a 2.6.16.29 kernel with
the attached patch from fi4l.

relevant output is this:
[ 31.015425] geode-mfgpt: 7 timers available.
...
[ 31.245875] geode-mfgpt: Registered timer 0

So the above kernel detects only 7 timers not 8, and it works. But note
that timer 0 is not used as a clock event source but as a watchdog,
which btw actually works fine :-)
The funny thing is the #define workaround part of this dubious patch and
its interaction with the bios:

#ifdef WORKAROUND:
I have to turn the "MFPGT workaround" option in the bios ON, to boot
the kernel probably.

#ifndef WORKAROUND:
I have to turn the "MFPGT workaround" option in the bios OFF, to boot
the kernel probably.

All other combinations will hang the system.

Doest it make sense to try to boot older release candidates of 2.6.24 ?
Post by Andres Salomon
I'm assuming that booting with 'nomfgpt' works for you?
In fact it does.
It just says:
geode-mfgpt: Skipping MFGPT setup
and boots.
Post by Andres Salomon
Post by Arnd Hannemann
Post by Andres Salomon
Also, could you provide a log with the following (untested) patch? I'm
curious how many MFGPTs we're actually detecting as being available, and
the existing code is backwards.
[ 23.092507] NET: Registered protocol family 16
[ 23.105875] geode-mfgpt: 8 MFGPT timers available
[ 23.120247] geode-mfgpt: Registered timer 0
[ 23.133076] mfgpt-timer: registering the MFGT timer as a clock event.
Andres Salomon
2008-01-17 20:42:22 UTC
Permalink
On Thu, 17 Jan 2008 20:53:57 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Its an ALIX board from PCEngines, they have their own BIOS
implementation (tinyBios).
http://www.pcengines.ch/alix.htm
Ah, okay. I couldn't find anything about MFGPTs in the available
source, but I did find the following:

http://article.gmane.org/gmane.linux.distributions.voyage.general/1824

It would appear that something between 0.90 and 0.92 changed that
has broken things. There's also a changelog here:

http://forum.pfsense.org/index.php?topic=6729.15

Note two MFGPT-related entries in 0.92:

- disable CS5536 diverse device power management to avoid
MFGPT /
interrupt issues.

- MFGPT issues: please observe AMD CS5536 data book section
5.16.3,
incorrect initialization sequence can HANG the system.


I'm assuming the first is messing w/ DIVIL_GLD_MSR_PM...


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jordan Crouse
2008-01-17 21:19:17 UTC
Permalink
Post by Arnd Hannemann
Post by Andres Salomon
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Its an ALIX board from PCEngines, they have their own BIOS
implementation (tinyBios).
http://www.pcengines.ch/alix.htm
Post by Andres Salomon
Post by Arnd Hannemann
Also note when I do enable the mysterios "MFGPT workaround" option in
[ 36.780990] NET: Registered protocol family 16
"MFGPT workaround"? That sounds a bit frightening.
Presumably, the BIOS is using the MFGPTs, but we're not detecting them as
being in use.
Yes I think so too, for the fun of it I compiled a 2.6.16.29 kernel with
the attached patch from fi4l.
Okay - thats an MFPGT patch from pre-OLPC days. I am the guilty and
dubious party. We changed the API to work better with the timer tick,
and thats the version that ended up in the kernel.

I really wish I could take back this patch, because it keeps coming back
to torment me. We must, as a people, put it behind us and forgot it. :)
Post by Arnd Hannemann
[ 31.015425] geode-mfgpt: 7 timers available.
...
[ 31.245875] geode-mfgpt: Registered timer 0
So the above kernel detects only 7 timers not 8, and it works. But note
that timer 0 is not used as a clock event source but as a watchdog,
which btw actually works fine :-)
It detects 7 timers because of a bug in the code - there really are 8
timers, which the current code correctly identifies.
Post by Arnd Hannemann
The funny thing is the #define workaround part of this dubious patch and
I have to turn the "MFPGT workaround" option in the bios ON, to boot
the kernel probably.
I have to turn the "MFPGT workaround" option in the bios OFF, to boot
the kernel probably.
So the workaround works around the workaround. Fun. I think that Mitch
Bradley verified that if you write the magic MSR when all the clocks are
already clear that bad things happen. The workaround probably adds a
dummy clock in. Notice that the "magic MSR" no longer is in the vanilla
code, and thats the way it should be. If the BIOS doesn't allow use of
the clocks, then we have to live with that.

So, based on everything you are saying, I think its clear that our
problem isn't in the MFGPT, but rather in the timer tick (because, as
you said, the watchdog works). We try to use IRQ 7 for the tick, which
Andres and I totally plucked out of thin air based on what we had to work
with on OLPC. Its totally possible that the TinyBIOS had other ideas.
Please try to boot with nomfgpt, and see which interrupts are free, and
use mfgpt_irq= to change it to something else if 7 is in use. Based on
your findings above, you'll probably need to leave the MFGPT workaround
off from now on.

I'll port the watchdog timer to the new API, and we can use that instead
of the timer tick to just make sure that it isn't the timer that is broken.
Also, hopefully that will cease the stream of angry emails asking me why
the ancient patch doesn't work on a current kernel... :)

Jordan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Hannemann
2008-01-17 21:50:46 UTC
Permalink
Post by Jordan Crouse
Post by Arnd Hannemann
Post by Andres Salomon
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Its an ALIX board from PCEngines, they have their own BIOS
implementation (tinyBios).
http://www.pcengines.ch/alix.htm
Post by Andres Salomon
Post by Arnd Hannemann
Also note when I do enable the mysterios "MFGPT workaround" option in
[ 36.780990] NET: Registered protocol family 16
"MFGPT workaround"? That sounds a bit frightening.
Presumably, the BIOS is using the MFGPTs, but we're not detecting them as
being in use.
Yes I think so too, for the fun of it I compiled a 2.6.16.29 kernel with
the attached patch from fi4l.
Okay - thats an MFPGT patch from pre-OLPC days. I am the guilty and
dubious party. We changed the API to work better with the timer tick,
and thats the version that ended up in the kernel.
I really wish I could take back this patch, because it keeps coming back
to torment me. We must, as a people, put it behind us and forgot it. :)
Post by Arnd Hannemann
[ 31.015425] geode-mfgpt: 7 timers available.
...
[ 31.245875] geode-mfgpt: Registered timer 0
So the above kernel detects only 7 timers not 8, and it works. But note
that timer 0 is not used as a clock event source but as a watchdog,
which btw actually works fine :-)
It detects 7 timers because of a bug in the code - there really are 8
timers, which the current code correctly identifies.
Yes I can confirm this, changed MFGPT_MAX_TIMERS from 7 to 8 in the old
kernel and it still works.
Post by Jordan Crouse
Post by Arnd Hannemann
The funny thing is the #define workaround part of this dubious patch and
I have to turn the "MFPGT workaround" option in the bios ON, to boot
the kernel probably.
I have to turn the "MFPGT workaround" option in the bios OFF, to boot
the kernel probably.
So the workaround works around the workaround. Fun. I think that Mitch
Bradley verified that if you write the magic MSR when all the clocks are
already clear that bad things happen. The workaround probably adds a
dummy clock in. Notice that the "magic MSR" no longer is in the vanilla
code, and thats the way it should be. If the BIOS doesn't allow use of
the clocks, then we have to live with that.
So, based on everything you are saying, I think its clear that our
problem isn't in the MFGPT, but rather in the timer tick (because, as
you said, the watchdog works). We try to use IRQ 7 for the tick, which
Andres and I totally plucked out of thin air based on what we had to work
with on OLPC. Its totally possible that the TinyBIOS had other ideas.
Please try to boot with nomfgpt, and see which interrupts are free, and
use mfgpt_irq= to change it to something else if 7 is in use. Based on
your findings above, you'll probably need to leave the MFGPT workaround
off from now on.
Great analysis! I think I can confirm this too. I tried the following:

First in mfgpt_timer_setup I commented out "clockevents_register_device"
result: the system still hangs with "registering the MFGT timer as a
clock event" !

Then I also commented out "ret = setup_irq(irq, &mfgptirq)".
result: system boots, voila!

However the vendor claims that 7 should be used (from the bios changelog):
"v0.90 (IRQ7 is no longer directed to the LPC bus, used as a default
interrupt for MFGPT high resolution timer."

There is also a interrupt map in the bios[0] readme:

IRQ0 timer
IRQ1 KBD (LPC)
IRQ2 cascade
IRQ3 COM1 serial (internal / LPC)
IRQ4 COM2 serial (LPC)
IRQ5 audio (CS5536)
IRQ6 FDC (LPC)
IRQ7 spare, used for MFGPT high resolution timer

IRQ8 RTC
IRQ9 PCI INTA
IRQ10 PCI INTB
IRQ11 PCI INTC
IRQ12 PCI INTD
IRQ13 floating point
IRQ14 IDE HDD
IRQ15 USB (CS5536)

/proc/interrupts on a running system looks like this:
CPU0
0: 12329 XT-PIC timer
2: 0 XT-PIC cascade
4: 240 XT-PIC serial
8: 3 XT-PIC rtc
9: 558 XT-PIC wifi0
10: 67591 XT-PIC eth0
11: 622 XT-PIC wifi1
NMI: 0
ERR: 0
Post by Jordan Crouse
I'll port the watchdog timer to the new API, and we can use that instead
of the timer tick to just make sure that it isn't the timer that is broken.
Also, hopefully that will cease the stream of angry emails asking me why
the ancient patch doesn't work on a current kernel... :)
Watchdog for the new API would be great :-)
Post by Jordan Crouse
Jordan
Thanks for your effort!

[0] http://www.pcengines.ch/file/alixb099.zip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jordan Crouse
2008-01-17 22:36:44 UTC
Permalink
Post by Arnd Hannemann
Post by Jordan Crouse
Post by Arnd Hannemann
Post by Andres Salomon
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Its an ALIX board from PCEngines, they have their own BIOS
implementation (tinyBios).
http://www.pcengines.ch/alix.htm
Post by Andres Salomon
Post by Arnd Hannemann
Also note when I do enable the mysterios "MFGPT workaround" option in
[ 36.780990] NET: Registered protocol family 16
"MFGPT workaround"? That sounds a bit frightening.
Presumably, the BIOS is using the MFGPTs, but we're not detecting them as
being in use.
Yes I think so too, for the fun of it I compiled a 2.6.16.29 kernel with
the attached patch from fi4l.
Okay - thats an MFPGT patch from pre-OLPC days. I am the guilty and
dubious party. We changed the API to work better with the timer tick,
and thats the version that ended up in the kernel.
I really wish I could take back this patch, because it keeps coming back
to torment me. We must, as a people, put it behind us and forgot it. :)
Post by Arnd Hannemann
[ 31.015425] geode-mfgpt: 7 timers available.
...
[ 31.245875] geode-mfgpt: Registered timer 0
So the above kernel detects only 7 timers not 8, and it works. But note
that timer 0 is not used as a clock event source but as a watchdog,
which btw actually works fine :-)
It detects 7 timers because of a bug in the code - there really are 8
timers, which the current code correctly identifies.
Yes I can confirm this, changed MFGPT_MAX_TIMERS from 7 to 8 in the old
kernel and it still works.
Post by Jordan Crouse
Post by Arnd Hannemann
The funny thing is the #define workaround part of this dubious patch and
I have to turn the "MFPGT workaround" option in the bios ON, to boot
the kernel probably.
I have to turn the "MFPGT workaround" option in the bios OFF, to boot
the kernel probably.
So the workaround works around the workaround. Fun. I think that Mitch
Bradley verified that if you write the magic MSR when all the clocks are
already clear that bad things happen. The workaround probably adds a
dummy clock in. Notice that the "magic MSR" no longer is in the vanilla
code, and thats the way it should be. If the BIOS doesn't allow use of
the clocks, then we have to live with that.
So, based on everything you are saying, I think its clear that our
problem isn't in the MFGPT, but rather in the timer tick (because, as
you said, the watchdog works). We try to use IRQ 7 for the tick, which
Andres and I totally plucked out of thin air based on what we had to work
with on OLPC. Its totally possible that the TinyBIOS had other ideas.
Please try to boot with nomfgpt, and see which interrupts are free, and
use mfgpt_irq= to change it to something else if 7 is in use. Based on
your findings above, you'll probably need to leave the MFGPT workaround
off from now on.
First in mfgpt_timer_setup I commented out "clockevents_register_device"
result: the system still hangs with "registering the MFGT timer as a
clock event" !
Then I also commented out "ret = setup_irq(irq, &mfgptirq)".
result: system boots, voila!
Hmmm - not sure whats happening here. I wonder if we're stuck in an
interrupt storm of some sort as soon as you register the interrupt handler.
But I would think that whatever was causing the interrupt storm would be
running well before we hit setup_irq(), and you would be recording "nobody
cared" interrupts left and right.

The thing that scares me is that the TinyBIOS seems to know that we want
to use the MFGPT timers, and I wonder if they did anything behind the scenes
to "help us out" even though we didn't ask for it.

I don't know how easy it would be for you - but can you try reading
MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
you can use rdmsr from here:

http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Hannemann
2008-01-17 22:52:02 UTC
Permalink
Post by Jordan Crouse
Post by Arnd Hannemann
Post by Jordan Crouse
Post by Arnd Hannemann
Post by Andres Salomon
On Thu, 17 Jan 2008 10:54:30 +0100
Post by Arnd Hannemann
Post by Andres Salomon
On Wed, 16 Jan 2008 16:19:12 -0500
Post by Andres Salomon
On Wed, 16 Jan 2008 18:44:07 +0100
Post by Arnd Hannemann
Hi,
I'm trying to boot 2.6.24-rc8 on a GEODE LX board (ALIX.3),
[ 12.689971] NET: Registered protocol family 16
[ 12.703329] geode-mfgpt: Registered timer 0
[ 12.716149] mfgpt-timer: registering the MFGT timer as a clock event...
What BIOS are you using? It's possible that our detection code is
failing to detect in-use timers.
I'm using v0.99 (latest available).
v0.99 of what? Jordan seems to think it's an Award BIOS, but I'd like
to make sure.
Its an ALIX board from PCEngines, they have their own BIOS
implementation (tinyBios).
http://www.pcengines.ch/alix.htm
Post by Andres Salomon
Post by Arnd Hannemann
Also note when I do enable the mysterios "MFGPT workaround" option in
[ 36.780990] NET: Registered protocol family 16
"MFGPT workaround"? That sounds a bit frightening.
Presumably, the BIOS is using the MFGPTs, but we're not detecting them as
being in use.
Yes I think so too, for the fun of it I compiled a 2.6.16.29 kernel with
the attached patch from fi4l.
Okay - thats an MFPGT patch from pre-OLPC days. I am the guilty and
dubious party. We changed the API to work better with the timer tick,
and thats the version that ended up in the kernel.
I really wish I could take back this patch, because it keeps coming back
to torment me. We must, as a people, put it behind us and forgot it. :)
Post by Arnd Hannemann
[ 31.015425] geode-mfgpt: 7 timers available.
...
[ 31.245875] geode-mfgpt: Registered timer 0
So the above kernel detects only 7 timers not 8, and it works. But note
that timer 0 is not used as a clock event source but as a watchdog,
which btw actually works fine :-)
It detects 7 timers because of a bug in the code - there really are 8
timers, which the current code correctly identifies.
Yes I can confirm this, changed MFGPT_MAX_TIMERS from 7 to 8 in the old
kernel and it still works.
Post by Jordan Crouse
Post by Arnd Hannemann
The funny thing is the #define workaround part of this dubious patch and
I have to turn the "MFPGT workaround" option in the bios ON, to boot
the kernel probably.
I have to turn the "MFPGT workaround" option in the bios OFF, to boot
the kernel probably.
So the workaround works around the workaround. Fun. I think that Mitch
Bradley verified that if you write the magic MSR when all the clocks are
already clear that bad things happen. The workaround probably adds a
dummy clock in. Notice that the "magic MSR" no longer is in the vanilla
code, and thats the way it should be. If the BIOS doesn't allow use of
the clocks, then we have to live with that.
So, based on everything you are saying, I think its clear that our
problem isn't in the MFGPT, but rather in the timer tick (because, as
you said, the watchdog works). We try to use IRQ 7 for the tick, which
Andres and I totally plucked out of thin air based on what we had to work
with on OLPC. Its totally possible that the TinyBIOS had other ideas.
Please try to boot with nomfgpt, and see which interrupts are free, and
use mfgpt_irq= to change it to something else if 7 is in use. Based on
your findings above, you'll probably need to leave the MFGPT workaround
off from now on.
First in mfgpt_timer_setup I commented out "clockevents_register_device"
result: the system still hangs with "registering the MFGT timer as a
clock event" !
Then I also commented out "ret = setup_irq(irq, &mfgptirq)".
result: system boots, voila!
Hmmm - not sure whats happening here. I wonder if we're stuck in an
interrupt storm of some sort as soon as you register the interrupt handler.
But I would think that whatever was causing the interrupt storm would be
running well before we hit setup_irq(), and you would be recording "nobody
cared" interrupts left and right.
Interesting thing is that it hangs not in setup_irq() but later, right
after printing the newline of the printk.
Post by Jordan Crouse
The thing that scares me is that the TinyBIOS seems to know that we want
to use the MFGPT timers, and I wonder if they did anything behind the scenes
to "help us out" even though we didn't ask for it.
I don't know how easy it would be for you - but can you try reading
MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
MSR register 0x51400020 => b7:ef:5f:f4:bf:d1:95:68
MSR register 0x51400021 => b7:fd:1f:f4:bf:cf:5a:d8
MSR register 0x51400022 => b7:f3:bf:f4:bf:f5:fb:a8
MSR register 0x51400023 => b7:fb:9f:f4:bf:fd:d9:f8
Post by Jordan Crouse
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.
Jordan
Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jordan Crouse
2008-01-17 22:57:44 UTC
Permalink
On 17/01/08 23:52 +0100, Arnd Hannemann wrote:

<snip>
Post by Arnd Hannemann
Post by Jordan Crouse
Hmmm - not sure whats happening here. I wonder if we're stuck in an
interrupt storm of some sort as soon as you register the interrupt handler.
But I would think that whatever was causing the interrupt storm would be
running well before we hit setup_irq(), and you would be recording "nobody
cared" interrupts left and right.
Interesting thing is that it hangs not in setup_irq() but later, right
after printing the newline of the printk.
THat makes me think interrupt storm even more.
Post by Arnd Hannemann
Post by Jordan Crouse
The thing that scares me is that the TinyBIOS seems to know that we want
to use the MFGPT timers, and I wonder if they did anything behind the scenes
to "help us out" even though we didn't ask for it.
I don't know how easy it would be for you - but can you try reading
MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
MSR register 0x51400020 => b7:ef:5f:f4:bf:d1:95:68
MSR register 0x51400021 => b7:fd:1f:f4:bf:cf:5a:d8
MSR register 0x51400022 => b7:f3:bf:f4:bf:f5:fb:a8
MSR register 0x51400023 => b7:fb:9f:f4:bf:fd:d9:f8
Hmmm - those look wrong. Is /dev/cpu/0/msr there? The applet on the
wiki has a bug that doesn't check for it.

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Hannemann
2008-01-17 23:39:31 UTC
Permalink
Post by Jordan Crouse
<snip>
Post by Arnd Hannemann
Post by Jordan Crouse
Hmmm - not sure whats happening here. I wonder if we're stuck in an
interrupt storm of some sort as soon as you register the interrupt handler.
But I would think that whatever was causing the interrupt storm would be
running well before we hit setup_irq(), and you would be recording "nobody
cared" interrupts left and right.
Interesting thing is that it hangs not in setup_irq() but later, right
after printing the newline of the printk.
THat makes me think interrupt storm even more.
Post by Arnd Hannemann
Post by Jordan Crouse
The thing that scares me is that the TinyBIOS seems to know that we want
to use the MFGPT timers, and I wonder if they did anything behind the scenes
to "help us out" even though we didn't ask for it.
I don't know how easy it would be for you - but can you try reading
MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
MSR register 0x51400020 => b7:ef:5f:f4:bf:d1:95:68
MSR register 0x51400021 => b7:fd:1f:f4:bf:cf:5a:d8
MSR register 0x51400022 => b7:f3:bf:f4:bf:f5:fb:a8
MSR register 0x51400023 => b7:fb:9f:f4:bf:fd:d9:f8
Hmmm - those look wrong. Is /dev/cpu/0/msr there? The applet on the
wiki has a bug that doesn't check for it.
I'm sorry, I should have checked: I didn't execute rdmsr as root.
The correct ones:

MSR register 0x51400020 => 00:00:00:00:00:00:0f:00
MSR register 0x51400021 => 00:00:00:00:04:00:00:00
MSR register 0x51400022 => 00:00:00:00:00:00:00:00
MSR register 0x51400023 => 00:00:00:00:00:0c:ba:90

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jordan Crouse
2008-01-18 00:40:53 UTC
Permalink
Post by Arnd Hannemann
Post by Jordan Crouse
<snip>
Post by Arnd Hannemann
Post by Jordan Crouse
Hmmm - not sure whats happening here. I wonder if we're stuck in an
interrupt storm of some sort as soon as you register the interrupt handler.
But I would think that whatever was causing the interrupt storm would be
running well before we hit setup_irq(), and you would be recording "nobody
cared" interrupts left and right.
Interesting thing is that it hangs not in setup_irq() but later, right
after printing the newline of the printk.
THat makes me think interrupt storm even more.
Post by Arnd Hannemann
Post by Jordan Crouse
The thing that scares me is that the TinyBIOS seems to know that we want
to use the MFGPT timers, and I wonder if they did anything behind the scenes
to "help us out" even though we didn't ask for it.
I don't know how easy it would be for you - but can you try reading
MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
MSR register 0x51400020 => b7:ef:5f:f4:bf:d1:95:68
MSR register 0x51400021 => b7:fd:1f:f4:bf:cf:5a:d8
MSR register 0x51400022 => b7:f3:bf:f4:bf:f5:fb:a8
MSR register 0x51400023 => b7:fb:9f:f4:bf:fd:d9:f8
Hmmm - those look wrong. Is /dev/cpu/0/msr there? The applet on the
wiki has a bug that doesn't check for it.
I'm sorry, I should have checked: I didn't execute rdmsr as root.
MSR register 0x51400020 => 00:00:00:00:00:00:0f:00
MSR register 0x51400021 => 00:00:00:00:04:00:00:00
MSR register 0x51400022 => 00:00:00:00:00:00:00:00
MSR register 0x51400023 => 00:00:00:00:00:0c:ba:90
Okay - those are sane. Those are the IRQ routing MSRs - each nibble is an
IRQ for something in the southbridge - since 7 doesn't appear, a rogue
interrupt isn't likely. Also, [0:31] in 0x51400022 are the MFGPT interrupts
specifically, and they are 0. The timer tick code would set the first nibble
to 7 (in a sane system).

So, everything _seems_ okay from the hardware side. The next step would be
to comment out the setup_irq() function and write a C app or something to
verify that all the MFGPT registers are set up like we expect them to be.
However, I think that maybe we can accomplish the same thing with the
forthcoming watchdog timer, since it will prove the MFGPT is behaving
well (though it doesn't use the interrupt wrapper, but one step at a time).

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jordan Crouse
2008-01-21 23:27:09 UTC
Permalink
Okay - I've been exploring a little bit more. I talked to the TinyBIOS
developer, and he verified that TinyBIOS shouldn't use any MFGPT timers.
He also told me that the mysterious "MFGPT workaround" was in fact the
magic MFGPT erasing MSR that was in the old kernel driver.

So with the "MFGPT workaround" turned off, TinyBIOS should be acting like
the OLPC firmware with regards to timers, yet it is not. So that is
curious. I think I might have identified a race condition in the code,
but I'm not 100% sure thats the same problem that the ALIX platform is
seeing.

Anrd and others - will you please try the attached patch on your platform
with the "MFGPT workaround" turned off and mfgpts enabled, and send out
the dmesg?

This will give us some debug information that I can use to ensure that
the interrupts are set up correctly. You can leave the timer tick disabled
if you want.

Thanks,
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Willy Tarreau
2008-01-21 23:32:26 UTC
Permalink
Hi Jordan,
Post by Jordan Crouse
Okay - I've been exploring a little bit more. I talked to the TinyBIOS
developer, and he verified that TinyBIOS shouldn't use any MFGPT timers.
He also told me that the mysterious "MFGPT workaround" was in fact the
magic MFGPT erasing MSR that was in the old kernel driver.
So with the "MFGPT workaround" turned off, TinyBIOS should be acting like
the OLPC firmware with regards to timers, yet it is not. So that is
curious. I think I might have identified a race condition in the code,
but I'm not 100% sure thats the same problem that the ALIX platform is
seeing.
Anrd and others - will you please try the attached patch on your platform
with the "MFGPT workaround" turned off and mfgpts enabled, and send out
the dmesg?
Yes of course :

[ 44.013100] NET: Registered protocol family 16
[ 44.066308] geode-mfgpt: IRQ MSR=0:0
[ 44.110161] geode-mfgpt: NMI MSR=0:0
[ 44.154037] geode-mfgpt: Unrestricted sources=0

Then it hangs here.
In another mail I sent privately to Andres, I noticed that it was the
following line which hangs at first iteration (i == 0) :

val = geode_mfgpt_read(i, MFGPT_REG_SETUP);

I have a tinybios 0.98 with no workaround option configurable. I also
have CONFIG_GEODE_MFGPT_TIMER=n.

Hoping this helps,
Willy
Willy Tarreau
2008-01-22 20:15:24 UTC
Permalink
Hi guys,
Post by Willy Tarreau
[ 44.013100] NET: Registered protocol family 16
[ 44.066308] geode-mfgpt: IRQ MSR=0:0
[ 44.110161] geode-mfgpt: NMI MSR=0:0
[ 44.154037] geode-mfgpt: Unrestricted sources=0
Then it hangs here.
In another mail I sent privately to Andres, I noticed that it was the
val = geode_mfgpt_read(i, MFGPT_REG_SETUP);
I have a tinybios 0.98 with no workaround option configurable. I also
have CONFIG_GEODE_MFGPT_TIMER=n.
Good news! I read the mfgpt patch for 2.6.22 and saw what the workaround
consisted in (writing 0xff at MSR 0x5140002B). So I tried adding the
following on top of 2.6.24-rc8 :

static int __init mfgpt_fix(char *s)
{
u32 val, dummy;

/* The following udocumented bit resets the MFGPT timers */
val = 0xFF;
wrmsr(0x5140002B, val, dummy);

return 1;
}
__setup("mfgptfix", mfgpt_fix);

and booted with the newly added option (mfgptfix). It worked like a charm :

[ 29.173796] NET: Registered protocol family 16
[ 29.226986] geode: lbars[0].base = 0x9d00
[ 29.275003] geode: lbars[1].base = 0x9c00
[ 29.323042] geode: lbars[2].base = 0x6100
[ 29.371082] geode: lbars[3].base = 0x6200
[ 29.419121] geode-mfgpt: IRQ MSR=0:0
[ 29.463000] geode-mfgpt: NMI MSR=0:0
[ 29.506881] geode-mfgpt: Unrestricted sources=0
[ 29.562202] geode_mfgpt: reading 0x6200 + 0x6 + (0x0 * 8) = 0x6206
[ 29.636237] geode_mfgpt: reading 0x6200 + 0x6 + (0x1 * 8) = 0x620e
[ 29.710270] geode_mfgpt: reading 0x6200 + 0x6 + (0x2 * 8) = 0x6216
[ 29.784305] geode_mfgpt: reading 0x6200 + 0x6 + (0x3 * 8) = 0x621e
[ 29.858338] geode_mfgpt: reading 0x6200 + 0x6 + (0x4 * 8) = 0x6226
[ 29.932376] geode_mfgpt: reading 0x6200 + 0x6 + (0x5 * 8) = 0x622e
[ 30.006409] geode_mfgpt: reading 0x6200 + 0x6 + (0x6 * 8) = 0x6236
[ 30.080444] geode_mfgpt: reading 0x6200 + 0x6 + (0x7 * 8) = 0x623e
[ 30.154475] geode: 8 MFGPT timers available.

So it seems like applying the workaround on top of TinyBIOS 0.98 undoes
this BIOS's workaround. I'm now wondering how we could detect whether
the workaround was applied or not :-/

Next, I retried with MFGPT_TIMER=y + latest fix you posted moving the
initialization race, and here's what I get now :

***@ALOHA-500:~# cat /proc/interrupts
CPU0
0: 77 XT-PIC-XT timer
2: 0 XT-PIC-XT cascade
4: 348 XT-PIC-XT serial
7: 12273 XT-PIC-XT mfgpt-timer
8: 0 XT-PIC-XT rtc
10: 80 XT-PIC-XT eth0
11: 0 XT-PIC-XT eth1
12: 1 XT-PIC-XT eth2
14: 13614 XT-PIC-XT ide0
NMI: 0 Non-maskable interrupts
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0

Interestingly, during the boot, I got thousands of the following line :
geode_mfgpt: reading 0x6200 + 0x6 + (0x0 * 8) = 0x6206

It's a debug line I added in geode_mfgpt_read() which writes the timer and
reg being read. It slowed the boot down due to being written to a serial
console, but fortunately stopped when syslogd had changed the console log
level. Just checking...

# grep mfgpt /proc/interrupts;sleep 10;grep mfgpt /proc/interrupts
7: 82320 XT-PIC-XT mfgpt-timer
7: 84882 XT-PIC-XT mfgpt-timer

It delivers 256 IRQs/s. That makes me think about this comment in mfgpt_32.c :

* We are using the 32Khz input clock - its the only one that has the
* ranges we find desirable. The following table lists the suitable
* divisors and the associated hz, minimum interval
* and the maximum interval:
*
* Divisor Hz Min Delta (S) Max Delta (S)
* 1 32000 .0005 2.048
* 2 16000 .001 4.096
* 4 8000 .002 8.192
...

When you say a 32kHz clock, you mean 32000 Hz. Are you really sure ? Most
32 kHz clocks everywhere are really 32768 Hz (the watch quartz). BTW, I'm
seeing a 32.768 kHz xtal close to the CS5536, and the numbers above seem
to support this suggestion too.

So right now that I've found what caused old kernel to unexpectedly work,
I'm planning a BIOS upgrade. I'm still just wondering what we can do to
detect that the workaround should be needed. I suspect nothing, of course,
but just in case... Maybe we can detect the effects of the workaround ?

Best regards,
Willy
Jordan Crouse
2008-01-22 21:08:58 UTC
Permalink
Post by Willy Tarreau
Good news! I read the mfgpt patch for 2.6.22 and saw what the workaround
consisted in (writing 0xff at MSR 0x5140002B). So I tried adding the
static int __init mfgpt_fix(char *s)
{
u32 val, dummy;
/* The following udocumented bit resets the MFGPT timers */
val = 0xFF;
wrmsr(0x5140002B, val, dummy);
return 1;
}
__setup("mfgptfix", mfgpt_fix);
<snip>
Post by Willy Tarreau
So it seems like applying the workaround on top of TinyBIOS 0.98 undoes
this BIOS's workaround. I'm now wondering how we could detect whether
the workaround was applied or not :-/
Actually, the TinyBIOS "workaround" is the same thing.

Like I may have said before, there is a reason why this MSR is undocumented.
It works, but the behavior is unvalidated, and obviously erratic. When we
first developed this code, we were using the Geode GX on the OLPC with VSA,
so using the MSR was nessesary if we wanted to get our hands on any
timers at all. Mitch Bradley (author of OpenFirmware) determined through
testing that the MSR was erratic, especially when you ran it when all the
timers were already clear. I suspect thats the problem that we see here -
writing the MSR in TinyBIOS unstablizied the registers, and writing the
MSR again kicked it back. Of course, the unfortunate corollary is that
when the workaround isn't in v0.99, if you run the MSR in the kernel, then
you will end up destabilizing everything.

Furthermore, using the MSR is okay with TinyBIOS, but not okay with the
other Geode BIOSen (Insyde, General Software, and for the moment LinuxBIOS)
because VSA (the SMM handler) _does_ use some of the timers. So needless
to say, I'm concerned.
Post by Willy Tarreau
geode_mfgpt: reading 0x6200 + 0x6 + (0x0 * 8) = 0x6206
It's a debug line I added in geode_mfgpt_read() which writes the timer and
reg being read. It slowed the boot down due to being written to a serial
console, but fortunately stopped when syslogd had changed the console log
level. Just checking...
We control the timer and the status bits for the timer in the setup
register, which is what you are showing above. Thats fine.
Post by Willy Tarreau
# grep mfgpt /proc/interrupts;sleep 10;grep mfgpt /proc/interrupts
7: 82320 XT-PIC-XT mfgpt-timer
7: 84882 XT-PIC-XT mfgpt-timer
Hmm - are you running with nohz? I ran the same thing on the OLPC
and I'm getting 81 IRQ/s which is okay, considering that sugar was running
in the background.
Post by Willy Tarreau
* We are using the 32Khz input clock - its the only one that has the
* ranges we find desirable. The following table lists the suitable
* divisors and the associated hz, minimum interval
*
* Divisor Hz Min Delta (S) Max Delta (S)
* 1 32000 .0005 2.048
* 2 16000 .001 4.096
* 4 8000 .002 8.192
...
When you say a 32kHz clock, you mean 32000 Hz. Are you really sure ? Most
32 kHz clocks everywhere are really 32768 Hz (the watch quartz). BTW, I'm
seeing a 32.768 kHz xtal close to the CS5536, and the numbers above seem
to support this suggestion too.
Hmm - yeah, my math is off there - it is a 32768 Hz clock. That
shouldn't affect things too negatively - if it does we can adjust the
divisor we use - currently we use 16.
Post by Willy Tarreau
So right now that I've found what caused old kernel to unexpectedly work,
I'm planning a BIOS upgrade. I'm still just wondering what we can do to
detect that the workaround should be needed. I suspect nothing, of course,
but just in case... Maybe we can detect the effects of the workaround ?
I'm not sure if we can - all we can tell is if the registers are zero or
not. Like I said, running the MSR is probably dangerous in 9 out of 10
situations, the one good use being the one you determined. I would
support adding the mfgptfix bit though - just as long as it isn't
automagic.

Thank you very much for your help!
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Willy Tarreau
2008-01-22 21:15:30 UTC
Permalink
Post by Jordan Crouse
Post by Willy Tarreau
So it seems like applying the workaround on top of TinyBIOS 0.98 undoes
this BIOS's workaround. I'm now wondering how we could detect whether
the workaround was applied or not :-/
Actually, the TinyBIOS "workaround" is the same thing.
Like I may have said before, there is a reason why this MSR is undocumented.
It works, but the behavior is unvalidated, and obviously erratic. When we
first developed this code, we were using the Geode GX on the OLPC with VSA,
so using the MSR was nessesary if we wanted to get our hands on any
timers at all. Mitch Bradley (author of OpenFirmware) determined through
testing that the MSR was erratic, especially when you ran it when all the
timers were already clear. I suspect thats the problem that we see here -
writing the MSR in TinyBIOS unstablizied the registers, and writing the
MSR again kicked it back. Of course, the unfortunate corollary is that
when the workaround isn't in v0.99, if you run the MSR in the kernel, then
you will end up destabilizing everything.
Indeed, that looks exactly like what's happening.
Post by Jordan Crouse
Furthermore, using the MSR is okay with TinyBIOS, but not okay with the
other Geode BIOSen (Insyde, General Software, and for the moment LinuxBIOS)
because VSA (the SMM handler) _does_ use some of the timers. So needless
to say, I'm concerned.
I certainly can understand.
Post by Jordan Crouse
Post by Willy Tarreau
geode_mfgpt: reading 0x6200 + 0x6 + (0x0 * 8) = 0x6206
It's a debug line I added in geode_mfgpt_read() which writes the timer and
reg being read. It slowed the boot down due to being written to a serial
console, but fortunately stopped when syslogd had changed the console log
level. Just checking...
We control the timer and the status bits for the timer in the setup
register, which is what you are showing above. Thats fine.
OK.
Post by Jordan Crouse
Post by Willy Tarreau
# grep mfgpt /proc/interrupts;sleep 10;grep mfgpt /proc/interrupts
7: 82320 XT-PIC-XT mfgpt-timer
7: 84882 XT-PIC-XT mfgpt-timer
Hmm - are you running with nohz?
No but I have CONFIG_HZ=250. That makes sense then :

#define MFGPT_HZ (32000 / MFGPT_DIVISOR)
#define MFGPT_PERIODIC (MFGPT_HZ / HZ)

=> MFGPT_PERIODIC = (32000 / 16) / 250 = 8
Given that it's 32.768 kHz in fact, we get 32768/16/8 = 256 Hz
Post by Jordan Crouse
Post by Willy Tarreau
* We are using the 32Khz input clock - its the only one that has the
* ranges we find desirable. The following table lists the suitable
* divisors and the associated hz, minimum interval
*
* Divisor Hz Min Delta (S) Max Delta (S)
* 1 32000 .0005 2.048
* 2 16000 .001 4.096
* 4 8000 .002 8.192
...
When you say a 32kHz clock, you mean 32000 Hz. Are you really sure ? Most
32 kHz clocks everywhere are really 32768 Hz (the watch quartz). BTW, I'm
seeing a 32.768 kHz xtal close to the CS5536, and the numbers above seem
to support this suggestion too.
Hmm - yeah, my math is off there - it is a 32768 Hz clock. That
shouldn't affect things too negatively - if it does we can adjust the
divisor we use - currently we use 16.
Why not just ajust the constant in MFGPT_HZ since it's the only place (aside
the comments) where this is referenced ?

If you want I can send you a patch and adjust the comments at the same time.
Post by Jordan Crouse
Post by Willy Tarreau
So right now that I've found what caused old kernel to unexpectedly work,
I'm planning a BIOS upgrade. I'm still just wondering what we can do to
detect that the workaround should be needed. I suspect nothing, of course,
but just in case... Maybe we can detect the effects of the workaround ?
I'm not sure if we can - all we can tell is if the registers are zero or
not. Like I said, running the MSR is probably dangerous in 9 out of 10
situations, the one good use being the one you determined. I would
support adding the mfgptfix bit though - just as long as it isn't
automagic.
OK I perfectly understand. While I see the current behaviour as a regression
compared to 2.6.22-mainline (since 2.6.24-rc8 does not boot off from this
board without nomfgpt), the first problem is in the BIOS and it requires an
upgrade. Maybe we should extend the scope of CONFIG_GEODE_MFGPT_TIMER to
allow it to completely disable the detection logic when it's off ? I certainly
can work a clean patch for "mfgptfix", but for users who will get caught with
this board not booting at all anymore, adding an option in their boot loaders
may be as problematic as upgrading the BIOS. I'm also thinking about the ones
preparing new software updates from a recent boards and deploying them miles
away without knowing they are running a 0.98 BIOS which will prevent their
new kernel from booting.
Post by Jordan Crouse
Thank you very much for your help!
You're welcome, thanks to you too :-)
Willy
Jordan Crouse
2008-01-23 16:36:46 UTC
Permalink
Post by Willy Tarreau
Post by Jordan Crouse
Post by Willy Tarreau
So it seems like applying the workaround on top of TinyBIOS 0.98 undoes
this BIOS's workaround. I'm now wondering how we could detect whether
the workaround was applied or not :-/
Actually, the TinyBIOS "workaround" is the same thing.
Like I may have said before, there is a reason why this MSR is undocumented.
It works, but the behavior is unvalidated, and obviously erratic. When we
first developed this code, we were using the Geode GX on the OLPC with VSA,
so using the MSR was nessesary if we wanted to get our hands on any
timers at all. Mitch Bradley (author of OpenFirmware) determined through
testing that the MSR was erratic, especially when you ran it when all the
timers were already clear. I suspect thats the problem that we see here -
writing the MSR in TinyBIOS unstablizied the registers, and writing the
MSR again kicked it back. Of course, the unfortunate corollary is that
when the workaround isn't in v0.99, if you run the MSR in the kernel, then
you will end up destabilizing everything.
Indeed, that looks exactly like what's happening.
Post by Jordan Crouse
Furthermore, using the MSR is okay with TinyBIOS, but not okay with the
other Geode BIOSen (Insyde, General Software, and for the moment LinuxBIOS)
because VSA (the SMM handler) _does_ use some of the timers. So needless
to say, I'm concerned.
I certainly can understand.
Post by Jordan Crouse
Post by Willy Tarreau
geode_mfgpt: reading 0x6200 + 0x6 + (0x0 * 8) = 0x6206
It's a debug line I added in geode_mfgpt_read() which writes the timer and
reg being read. It slowed the boot down due to being written to a serial
console, but fortunately stopped when syslogd had changed the console log
level. Just checking...
We control the timer and the status bits for the timer in the setup
register, which is what you are showing above. Thats fine.
OK.
Post by Jordan Crouse
Post by Willy Tarreau
# grep mfgpt /proc/interrupts;sleep 10;grep mfgpt /proc/interrupts
7: 82320 XT-PIC-XT mfgpt-timer
7: 84882 XT-PIC-XT mfgpt-timer
Hmm - are you running with nohz?
#define MFGPT_HZ (32000 / MFGPT_DIVISOR)
#define MFGPT_PERIODIC (MFGPT_HZ / HZ)
=> MFGPT_PERIODIC = (32000 / 16) / 250 = 8
Given that it's 32.768 kHz in fact, we get 32768/16/8 = 256 Hz
Post by Jordan Crouse
Post by Willy Tarreau
* We are using the 32Khz input clock - its the only one that has the
* ranges we find desirable. The following table lists the suitable
* divisors and the associated hz, minimum interval
*
* Divisor Hz Min Delta (S) Max Delta (S)
* 1 32000 .0005 2.048
* 2 16000 .001 4.096
* 4 8000 .002 8.192
...
When you say a 32kHz clock, you mean 32000 Hz. Are you really sure ? Most
32 kHz clocks everywhere are really 32768 Hz (the watch quartz). BTW, I'm
seeing a 32.768 kHz xtal close to the CS5536, and the numbers above seem
to support this suggestion too.
Hmm - yeah, my math is off there - it is a 32768 Hz clock. That
shouldn't affect things too negatively - if it does we can adjust the
divisor we use - currently we use 16.
Why not just ajust the constant in MFGPT_HZ since it's the only place (aside
the comments) where this is referenced ?
If you want I can send you a patch and adjust the comments at the same time.
That would be great.
Post by Willy Tarreau
Post by Jordan Crouse
Post by Willy Tarreau
So right now that I've found what caused old kernel to unexpectedly work,
I'm planning a BIOS upgrade. I'm still just wondering what we can do to
detect that the workaround should be needed. I suspect nothing, of course,
but just in case... Maybe we can detect the effects of the workaround ?
I'm not sure if we can - all we can tell is if the registers are zero or
not. Like I said, running the MSR is probably dangerous in 9 out of 10
situations, the one good use being the one you determined. I would
support adding the mfgptfix bit though - just as long as it isn't
automagic.
OK I perfectly understand. While I see the current behaviour as a regression
compared to 2.6.22-mainline (since 2.6.24-rc8 does not boot off from this
board without nomfgpt), the first problem is in the BIOS and it requires an
upgrade. Maybe we should extend the scope of CONFIG_GEODE_MFGPT_TIMER to
allow it to completely disable the detection logic when it's off ? I certainly
can work a clean patch for "mfgptfix", but for users who will get caught with
this board not booting at all anymore, adding an option in their boot loaders
may be as problematic as upgrading the BIOS. I'm also thinking about the ones
preparing new software updates from a recent boards and deploying them miles
away without knowing they are running a 0.98 BIOS which will prevent their
new kernel from booting.
I think not allowing MFGPT timers for broken BIOSen is the right way to go.
We can do just-in-time detection the first time one of the drivers (timer
tick or watchdog) asks for a timer, instead of the current way of probing
automatically.

That way, you can go merrily on your way if you don't try to use any of
the MFGPTs. TinyBIOS users can compile out CONFIG_GEODE_MFGPT_TIMER and
boot.

I'll write up a patch.

Thanks,
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Willy Tarreau
2008-01-23 16:10:44 UTC
Permalink
Post by Jordan Crouse
Post by Willy Tarreau
Why not just ajust the constant in MFGPT_HZ since it's the only place (aside
the comments) where this is referenced ?
If you want I can send you a patch and adjust the comments at the same time.
That would be great.
OK will do ASAP.
Post by Jordan Crouse
I think not allowing MFGPT timers for broken BIOSen is the right way to go.
We can do just-in-time detection the first time one of the drivers (timer
tick or watchdog) asks for a timer, instead of the current way of probing
automatically.
That way, you can go merrily on your way if you don't try to use any of
the MFGPTs. TinyBIOS users can compile out CONFIG_GEODE_MFGPT_TIMER and
boot.
I like this method a lot. At least, people with broken bioses would have
trouble only if they modprobe geodewdt or do such things. They will easily
find the relation between their action and the problem. Right now, we only
know that it hangs after "NET: ...".
Post by Jordan Crouse
I'll write up a patch.
Fine.

Thanks,
Willy
Arnd Hannemann
2008-01-22 09:03:08 UTC
Permalink
Post by Jordan Crouse
Okay - I've been exploring a little bit more. I talked to the TinyBIOS
developer, and he verified that TinyBIOS shouldn't use any MFGPT timers.
He also told me that the mysterious "MFGPT workaround" was in fact the
magic MFGPT erasing MSR that was in the old kernel driver.
So with the "MFGPT workaround" turned off, TinyBIOS should be acting like
the OLPC firmware with regards to timers, yet it is not. So that is
curious. I think I might have identified a race condition in the code,
but I'm not 100% sure thats the same problem that the ALIX platform is
seeing.
Anrd and others - will you please try the attached patch on your platform
with the "MFGPT workaround" turned off and mfgpts enabled, and send out
the dmesg?
Of course, tinyBios version v0.99, "MFGPT workaround" turned off,
CONFIG_GEODE_MFGPT_TIMER=n:

[ 67.369697] NET: Registered protocol family 16
[ 67.383059] geode-mfgpt: IRQ MSR=0:0
[ 67.394058] geode-mfgpt: NMI MSR=0:0
[ 67.405049] geode-mfgpt: Unrestricted sources=0
[ 67.418909] geode: 8 MFGPT timers available.
[ 67.433211] PCI: PCI BIOS revision 2.10 entry at 0xfcd03, last bus=0

same with CONFIG_GEODE_MFGPT_TIMER=y (sorry, without move printk patch):

[ 22.289349] NET: Registered protocol family 16
[ 22.302716] geode-mfgpt: IRQ MSR=0:0
[ 22.313716] geode-mfgpt: NMI MSR=0:0
[ 22.324704] geode-mfgpt: Unrestricted sources=0
[ 22.338566] geode-mfgpt: Registered timer 0
[ 22.351393] mfgpt-timer: registering the MFGT timer as a clock event.
^^^^ Hangs here
Post by Jordan Crouse
This will give us some debug information that I can use to ensure that
the interrupts are set up correctly. You can leave the timer tick disabled
if you want.
Thanks,
Jordan
Arnd
Lars Heete
2008-01-22 10:11:20 UTC
Permalink
Hello,
Post by Arnd Hannemann
Post by Jordan Crouse
Okay - I've been exploring a little bit more. I talked to the TinyBIOS
developer, and he verified that TinyBIOS shouldn't use any MFGPT timers.
He also told me that the mysterious "MFGPT workaround" was in fact the
magic MFGPT erasing MSR that was in the old kernel driver.
So with the "MFGPT workaround" turned off, TinyBIOS should be acting like
the OLPC firmware with regards to timers, yet it is not. So that is
curious. I think I might have identified a race condition in the code,
but I'm not 100% sure thats the same problem that the ALIX platform is
seeing.
Anrd and others - will you please try the attached patch on your platform
with the "MFGPT workaround" turned off and mfgpts enabled, and send out
the dmesg?
Of course, tinyBios version v0.99, "MFGPT workaround" turned off,
[ 67.369697] NET: Registered protocol family 16
[ 67.383059] geode-mfgpt: IRQ MSR=0:0
[ 67.394058] geode-mfgpt: NMI MSR=0:0
[ 67.405049] geode-mfgpt: Unrestricted sources=0
[ 67.418909] geode: 8 MFGPT timers available.
[ 67.433211] PCI: PCI BIOS revision 2.10 entry at 0xfcd03, last bus=0
[ 22.289349] NET: Registered protocol family 16
[ 22.302716] geode-mfgpt: IRQ MSR=0:0
[ 22.313716] geode-mfgpt: NMI MSR=0:0
[ 22.324704] geode-mfgpt: Unrestricted sources=0
[ 22.338566] geode-mfgpt: Registered timer 0
[ 22.351393] mfgpt-timer: registering the MFGT timer as a clock event.
^^^^ Hangs here
I had the same problem with MFGPT Timers and alix (BIOS v0.99). I found that
if you use a different interrupt than the default 7 for MFGPT (just append
mfgpt_irq=8 to the kernel commandline), the timer seems to work.

CPU0
0: 33 XT-PIC-XT timer
2: 0 XT-PIC-XT cascade
4: 1662 XT-PIC-XT serial
8: 766 XT-PIC-XT mfgpt-timer
14: 473 XT-PIC-XT libata
NMI: 0
ERR: 0
Post by Arnd Hannemann
Post by Jordan Crouse
This will give us some debug information that I can use to ensure that
the interrupts are set up correctly. You can leave the timer tick
disabled if you want.
Thanks,
Jordan
Arnd
Lars
Arnd Hannemann
2008-01-22 11:18:56 UTC
Permalink
Hi Lars,
Post by Lars Heete
Hello,
Post by Arnd Hannemann
Post by Jordan Crouse
Okay - I've been exploring a little bit more. I talked to the TinyBIOS
developer, and he verified that TinyBIOS shouldn't use any MFGPT timers.
He also told me that the mysterious "MFGPT workaround" was in fact the
magic MFGPT erasing MSR that was in the old kernel driver.
So with the "MFGPT workaround" turned off, TinyBIOS should be acting like
the OLPC firmware with regards to timers, yet it is not. So that is
curious. I think I might have identified a race condition in the code,
but I'm not 100% sure thats the same problem that the ALIX platform is
seeing.
Anrd and others - will you please try the attached patch on your platform
with the "MFGPT workaround" turned off and mfgpts enabled, and send out
the dmesg?
Of course, tinyBios version v0.99, "MFGPT workaround" turned off,
[ 67.369697] NET: Registered protocol family 16
[ 67.383059] geode-mfgpt: IRQ MSR=0:0
[ 67.394058] geode-mfgpt: NMI MSR=0:0
[ 67.405049] geode-mfgpt: Unrestricted sources=0
[ 67.418909] geode: 8 MFGPT timers available.
[ 67.433211] PCI: PCI BIOS revision 2.10 entry at 0xfcd03, last bus=0
[ 22.289349] NET: Registered protocol family 16
[ 22.302716] geode-mfgpt: IRQ MSR=0:0
[ 22.313716] geode-mfgpt: NMI MSR=0:0
[ 22.324704] geode-mfgpt: Unrestricted sources=0
[ 22.338566] geode-mfgpt: Registered timer 0
[ 22.351393] mfgpt-timer: registering the MFGT timer as a clock event.
^^^^ Hangs here
I had the same problem with MFGPT Timers and alix (BIOS v0.99). I found that
if you use a different interrupt than the default 7 for MFGPT (just append
mfgpt_irq=8 to the kernel commandline), the timer seems to work.
Indeed.
Strange, it works at least with mfgpt_irq=8 (rtc) and mfgpt_irq=5 (audio):

[ 21.805129] geode-mfgpt: IRQ MSR=0:0
[ 21.816129] geode-mfgpt: NMI MSR=0:0
[ 21.827116] geode-mfgpt: Unrestricted sources=0
[ 21.840979] geode-mfgpt: Registered timer 0
[ 21.853806] mfgpt-timer: registering the MFGT timer as a clock event.
[ 21.873576] geode: 8 MFGPT timers available.
[ 21.887962] PCI: PCI BIOS revision 2.10 entry at 0xfcd03, last bus=0


/proc/interrupts with mfgpt_irq=5:
CPU0
0: 48 XT-PIC-XT timer
2: 0 XT-PIC-XT cascade
4: 674 XT-PIC-XT serial
5: 13706 XT-PIC-XT mfgpt-timer
8: 3 XT-PIC-XT rtc
10: 57137 XT-PIC-XT eth0
15: 1 XT-PIC-XT ehci_hcd:usb1, ohci_hcd:usb2
NMI: 0 Non-maskable interrupts
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0

/proc/interrupts with mfgpt_irq=8:
CPU0
0: 48 XT-PIC-XT timer
2: 0 XT-PIC-XT cascade
4: 2511 XT-PIC-XT serial
8: 29061 XT-PIC-XT mfgpt-timer
10: 60701 XT-PIC-XT eth0
15: 1 XT-PIC-XT ehci_hcd:usb1, ohci_hcd:usb2
NMI: 0 Non-maskable interrupts
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0

I noted that "rtc" disappeared, do I have any drawback of this?
I assume that the mfgpt-timer cannot be shared?
Post by Lars Heete
CPU0
0: 33 XT-PIC-XT timer
2: 0 XT-PIC-XT cascade
4: 1662 XT-PIC-XT serial
8: 766 XT-PIC-XT mfgpt-timer
14: 473 XT-PIC-XT libata
NMI: 0
ERR: 0
Post by Arnd Hannemann
Post by Jordan Crouse
This will give us some debug information that I can use to ensure that
the interrupts are set up correctly. You can leave the timer tick
disabled if you want.
Thanks,
Jordan
Arnd
Lars
Thanks for the hint!
Arnd
Jordan Crouse
2008-01-22 18:15:42 UTC
Permalink
Post by Arnd Hannemann
Hi Lars,
Post by Lars Heete
Hello,
Post by Arnd Hannemann
Post by Jordan Crouse
Okay - I've been exploring a little bit more. I talked to the TinyBIOS
developer, and he verified that TinyBIOS shouldn't use any MFGPT timers.
He also told me that the mysterious "MFGPT workaround" was in fact the
magic MFGPT erasing MSR that was in the old kernel driver.
So with the "MFGPT workaround" turned off, TinyBIOS should be acting like
the OLPC firmware with regards to timers, yet it is not. So that is
curious. I think I might have identified a race condition in the code,
but I'm not 100% sure thats the same problem that the ALIX platform is
seeing.
Anrd and others - will you please try the attached patch on your platform
with the "MFGPT workaround" turned off and mfgpts enabled, and send out
the dmesg?
Of course, tinyBios version v0.99, "MFGPT workaround" turned off,
[ 67.369697] NET: Registered protocol family 16
[ 67.383059] geode-mfgpt: IRQ MSR=0:0
[ 67.394058] geode-mfgpt: NMI MSR=0:0
[ 67.405049] geode-mfgpt: Unrestricted sources=0
[ 67.418909] geode: 8 MFGPT timers available.
[ 67.433211] PCI: PCI BIOS revision 2.10 entry at 0xfcd03, last bus=0
[ 22.289349] NET: Registered protocol family 16
[ 22.302716] geode-mfgpt: IRQ MSR=0:0
[ 22.313716] geode-mfgpt: NMI MSR=0:0
[ 22.324704] geode-mfgpt: Unrestricted sources=0
[ 22.338566] geode-mfgpt: Registered timer 0
[ 22.351393] mfgpt-timer: registering the MFGT timer as a clock event.
^^^^ Hangs here
I had the same problem with MFGPT Timers and alix (BIOS v0.99). I found that
if you use a different interrupt than the default 7 for MFGPT (just append
mfgpt_irq=8 to the kernel commandline), the timer seems to work.
Indeed.
That is very unfortunate. We must continue to investigate.
Post by Arnd Hannemann
[ 21.805129] geode-mfgpt: IRQ MSR=0:0
[ 21.816129] geode-mfgpt: NMI MSR=0:0
[ 21.827116] geode-mfgpt: Unrestricted sources=0
[ 21.840979] geode-mfgpt: Registered timer 0
[ 21.853806] mfgpt-timer: registering the MFGT timer as a clock event.
[ 21.873576] geode: 8 MFGPT timers available.
[ 21.887962] PCI: PCI BIOS revision 2.10 entry at 0xfcd03, last bus=0
<snip>
Post by Arnd Hannemann
I noted that "rtc" disappeared, do I have any drawback of this?
I assume that the mfgpt-timer cannot be shared?
No. We figured that was the best way to go. I don't know if there
are any technical reasons why it can't be shared, other then latency
problems.

But I would still prefer to figure out why 7 doesn't work - because there
is nothing in the hardware or kernel code that should prevent that.

Arnd - if you don't mind doing a little bit more debug work. You've already
given us the MSRs for 0x51400020 - 0x51400023. Can you now give us
0x51400024-0x51400027, as well as 5140004E? Our current working theory
is that IRQ7 (traditionally used by the LPT port) is still conflicting
somewhere.

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Jordan Crouse
2008-01-22 19:27:57 UTC
Permalink
Post by Arnd Hannemann
Indeed.
I have most excellent news. I was able to get tinyBIOS booting on my
development platform. I looked at the problem with the debugger and
I think I might have found something. It looks like the interrupt is
firing immediately before the clock is enabled. In the handler, we
were returning immediately if the clock wasn't enabled (and not clearing
the event), so we were caught in a classic interrupt storm.

The attached patch rearranges the code so that the handler is installed
before we setup the interrupt (so we have somebody to listen to the
immediate interrupt), and it makes sure that we clear the event in the IRQ
handler regardless of the state of the timer tick.

I'm not 100% sure why this happens on IRQ7 but not on 5 or 8, but it might
have something to do with the interrupts already being enabled on the other
vectors. Anyway, please try this test patch and let me know what happens.

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Arnd Hannemann
2008-01-22 20:54:30 UTC
Permalink
Hi,
Post by Jordan Crouse
Post by Arnd Hannemann
Indeed.
I have most excellent news. I was able to get tinyBIOS booting on my
development platform. I looked at the problem with the debugger and
I think I might have found something. It looks like the interrupt is
firing immediately before the clock is enabled. In the handler, we
were returning immediately if the clock wasn't enabled (and not clearing
the event), so we were caught in a classic interrupt storm.
The attached patch rearranges the code so that the handler is installed
before we setup the interrupt (so we have somebody to listen to the
immediate interrupt), and it makes sure that we clear the event in the IRQ
handler regardless of the state of the timer tick.
This patch indeed solves the problem. The board boots fine. Great work!

0: 48 XT-PIC-XT timer
2: 0 XT-PIC-XT cascade
4: 493 XT-PIC-XT serial
7: 25875 XT-PIC-XT mfgpt-timer
8: 3 XT-PIC-XT rtc
10: 56963 XT-PIC-XT eth0
15: 1 XT-PIC-XT ehci_hcd:usb1, ohci_hcd:usb2
NMI: 0 Non-maskable interrupts
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0
Post by Jordan Crouse
I'm not 100% sure why this happens on IRQ7 but not on 5 or 8, but it might
have something to do with the interrupts already being enabled on the other
vectors. Anyway, please try this test patch and let me know what happens.
Congratulations for this long but successful remote debugging ;-)

Greetings,
Arnd
Ingo Molnar
2008-01-22 21:10:14 UTC
Permalink
Post by Arnd Hannemann
Post by Jordan Crouse
The attached patch rearranges the code so that the handler is
installed before we setup the interrupt (so we have somebody to
listen to the immediate interrupt), and it makes sure that we clear
the event in the IRQ handler regardless of the state of the timer
tick.
This patch indeed solves the problem. The board boots fine. Great work!
since this driver is new in 2.6.24, perhaps we should apply this fix to
v2.6.24?

Ingo
Willy Tarreau
2008-01-22 21:20:59 UTC
Permalink
Post by Ingo Molnar
Post by Arnd Hannemann
Post by Jordan Crouse
The attached patch rearranges the code so that the handler is
installed before we setup the interrupt (so we have somebody to
listen to the immediate interrupt), and it makes sure that we clear
the event in the IRQ handler regardless of the state of the timer
tick.
This patch indeed solves the problem. The board boots fine. Great work!
since this driver is new in 2.6.24, perhaps we should apply this fix to
v2.6.24?
Yes, I do think so. You can add me too to the "Tested-by" lines if you want.

Willy
Thomas Gleixner
2008-01-22 21:53:18 UTC
Permalink
Post by Ingo Molnar
Post by Arnd Hannemann
Post by Jordan Crouse
The attached patch rearranges the code so that the handler is
installed before we setup the interrupt (so we have somebody to
listen to the immediate interrupt), and it makes sure that we clear
the event in the IRQ handler regardless of the state of the timer
tick.
This patch indeed solves the problem. The board boots fine. Great work!
since this driver is new in 2.6.24, perhaps we should apply this fix to
v2.6.24?
Yes. It is definitely a safe change and makes otherwise broken systems
work. There is no impact to anything else than the GEODEs.

Linus,

if you agree, please pull the fix from:

ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git

Thanks,

tglx
Willy Tarreau
2008-01-23 21:17:46 UTC
Permalink
Post by Ingo Molnar
Post by Arnd Hannemann
Post by Jordan Crouse
The attached patch rearranges the code so that the handler is
installed before we setup the interrupt (so we have somebody to
listen to the immediate interrupt), and it makes sure that we clear
the event in the IRQ handler regardless of the state of the timer
tick.
This patch indeed solves the problem. The board boots fine. Great work!
since this driver is new in 2.6.24, perhaps we should apply this fix to
v2.6.24?
Ingo, Jordan,

I have written the two minor patches we were talking about in previous
mail. The first one fixes the input clock from 32000 to 32768 Hz, and
the second one adds an "mfgptfix" boot parameter to enable the workaround
in order to fix a regression on motherboards with a broken BIOS on which
2.6.24-rc8 does not boot anymore whether mfgpt timers are enabled or not.

Jordan, if you think you'll push your changes for 2.6.24, feel free to
merge this patch with your work.

Both patches will be sent as a reply to this mail.

Regards,
Willy
Willy Tarreau
2008-01-23 21:18:59 UTC
Permalink
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.

Signed-off-by: Willy Tarreau <***@1wt.eu>
---
arch/x86/kernel/mfgpt_32.c | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/mfgpt_32.c b/arch/x86/kernel/mfgpt_32.c
index 3960ab7..5519091 100644
--- a/arch/x86/kernel/mfgpt_32.c
+++ b/arch/x86/kernel/mfgpt_32.c
@@ -12,21 +12,21 @@
*/

/*
- * We are using the 32Khz input clock - its the only one that has the
+ * We are using the 32.768Khz input clock - its the only one that has the
* ranges we find desirable. The following table lists the suitable
* divisors and the associated hz, minimum interval
* and the maximum interval:
*
- * Divisor Hz Min Delta (S) Max Delta (S)
- * 1 32000 .0005 2.048
- * 2 16000 .001 4.096
- * 4 8000 .002 8.192
- * 8 4000 .004 16.384
- * 16 2000 .008 32.768
- * 32 1000 .016 65.536
- * 64 500 .032 131.072
- * 128 250 .064 262.144
- * 256 125 .128 524.288
+ * Divisor Hz Min Delta (S) Max Delta (S)
+ * 1 32768 .00048828125 2.000
+ * 2 16384 .0009765625 4.000
+ * 4 8192 .001953125 8.000
+ * 8 4096 .00390625 16.000
+ * 16 2048 .0078125 32.000
+ * 32 1024 .015625 64.000
+ * 64 512 .03125 128.000
+ * 128 256 .0625 256.000
+ * 256 128 .125 512.000
*/

#include <linux/kernel.h>
@@ -45,7 +45,7 @@ static struct mfgpt_timer_t {

#define MFGPT_DIVISOR 16
#define MFGPT_SCALE 4 /* divisor = 2^(scale) */
-#define MFGPT_HZ (32000 / MFGPT_DIVISOR)
+#define MFGPT_HZ (32768 / MFGPT_DIVISOR)
#define MFGPT_PERIODIC (MFGPT_HZ / HZ)

#ifdef CONFIG_GEODE_MFGPT_TIMER
--
1.5.3.4
H. Peter Anvin
2008-01-23 21:59:32 UTC
Permalink
Post by Willy Tarreau
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
- * Divisor Hz Min Delta (S) Max Delta (S)
Seconds are "s", not "S" (S = siemens.)

-hpa
Willy Tarreau
2008-01-23 22:11:29 UTC
Permalink
Hi Peter,
Post by H. Peter Anvin
Post by Willy Tarreau
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
- * Divisor Hz Min Delta (S) Max Delta (S)
Seconds are "s", not "S" (S = siemens.)
You're quite right. Same as we should write kHz and not Khz. But I'm
used to change other people's work and particularly comments the least
possible. Do you want an update ?

Willy
H. Peter Anvin
2008-01-23 22:22:59 UTC
Permalink
Post by Willy Tarreau
Hi Peter,
Post by H. Peter Anvin
Post by Willy Tarreau
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
- * Divisor Hz Min Delta (S) Max Delta (S)
Seconds are "s", not "S" (S = siemens.)
You're quite right. Same as we should write kHz and not Khz. But I'm
used to change other people's work and particularly comments the least
possible. Do you want an update ?
I much prefer to see this done right. I have a pretty case-sensitive
brain, it seems.

-hpa
Willy Tarreau
2008-01-23 22:10:53 UTC
Permalink
Post by H. Peter Anvin
Post by Willy Tarreau
Hi Peter,
Post by H. Peter Anvin
Post by Willy Tarreau
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
- * Divisor Hz Min Delta (S) Max Delta (S)
Seconds are "s", not "S" (S = siemens.)
You're quite right. Same as we should write kHz and not Khz. But I'm
used to change other people's work and particularly comments the least
possible. Do you want an update ?
I much prefer to see this done right. I have a pretty case-sensitive
brain, it seems.
Oh rest assured that you're not alone. I was smiling reading Andrew's
comments about people who "cnat tpye", and I must admit that I too am
often quite irritated by incorrect case and inverted letters, but I try
to refrain from whining, people say that I do it too much and for nothing :-)

OK, here it comes updated.

Willy
Post by H. Peter Anvin
From 3a314dd5c2a694f5b0a1c1b8b5690ee28f711b5e Mon Sep 17 00:00:00 2001
From: Willy Tarreau <***@1wt.eu>
Date: Wed, 23 Jan 2008 23:05:50 +0100
Subject: [PATCH 1/2] x86: GEODE fix MFGPT input clock value

The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.

Signed-off-by: Willy Tarreau <***@1wt.eu>
---
arch/x86/kernel/mfgpt_32.c | 27 +++++++++++++--------------
1 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/mfgpt_32.c b/arch/x86/kernel/mfgpt_32.c
index 3960ab7..f97e6e3 100644
--- a/arch/x86/kernel/mfgpt_32.c
+++ b/arch/x86/kernel/mfgpt_32.c
@@ -12,21 +12,20 @@
*/

/*
- * We are using the 32Khz input clock - its the only one that has the
+ * We are using the 32.768kHz input clock - its the only one that has the
* ranges we find desirable. The following table lists the suitable
- * divisors and the associated hz, minimum interval
- * and the maximum interval:
+ * divisors and the associated Hz, minimum interval and the maximum interval:
*
- * Divisor Hz Min Delta (S) Max Delta (S)
- * 1 32000 .0005 2.048
- * 2 16000 .001 4.096
- * 4 8000 .002 8.192
- * 8 4000 .004 16.384
- * 16 2000 .008 32.768
- * 32 1000 .016 65.536
- * 64 500 .032 131.072
- * 128 250 .064 262.144
- * 256 125 .128 524.288
+ * Divisor Hz Min Delta (s) Max Delta (s)
+ * 1 32768 .00048828125 2.000
+ * 2 16384 .0009765625 4.000
+ * 4 8192 .001953125 8.000
+ * 8 4096 .00390625 16.000
+ * 16 2048 .0078125 32.000
+ * 32 1024 .015625 64.000
+ * 64 512 .03125 128.000
+ * 128 256 .0625 256.000
+ * 256 128 .125 512.000
*/

#include <linux/kernel.h>
@@ -45,7 +44,7 @@ static struct mfgpt_timer_t {

#define MFGPT_DIVISOR 16
#define MFGPT_SCALE 4 /* divisor = 2^(scale) */
-#define MFGPT_HZ (32000 / MFGPT_DIVISOR)
+#define MFGPT_HZ (32768 / MFGPT_DIVISOR)
#define MFGPT_PERIODIC (MFGPT_HZ / HZ)

#ifdef CONFIG_GEODE_MFGPT_TIMER
--
1.5.3.3
Jordan Crouse
2008-01-23 22:38:30 UTC
Permalink
Post by Willy Tarreau
Hi Peter,
Post by H. Peter Anvin
Post by Willy Tarreau
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
- * Divisor Hz Min Delta (S) Max Delta (S)
Seconds are "s", not "S" (S = siemens.)
You're quite right. Same as we should write kHz and not Khz. But I'm
used to change other people's work and particularly comments the least
possible. Do you want an update ?
Please do - I don't mind if you change the comments - better that they
are right.

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Arnd Hannemann
2008-01-23 23:17:53 UTC
Permalink
Hi,
Post by Jordan Crouse
Post by Willy Tarreau
Hi Peter,
Post by H. Peter Anvin
Post by Willy Tarreau
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
- * Divisor Hz Min Delta (S) Max Delta (S)
Seconds are "s", not "S" (S = siemens.)
You're quite right. Same as we should write kHz and not Khz. But I'm
used to change other people's work and particularly comments the least
possible. Do you want an update ?
Please do - I don't mind if you change the comments - better that they
are right.
While you are on it, please include this one:

--- a/arch/x86/kernel/mfgpt_32.c 2008-01-24 00:08:20.000000000 +0100
+++ b/arch/x86/kernel/mfgpt_32.c 2008-01-24 00:05:17.000000000 +0100
@@ -361,7 +361,7 @@
&mfgpt_clockevent);

printk(KERN_INFO
- "mfgpt-timer: registering the MFGT timer as a clock event.\n");
+ "mfgpt-timer: registering the MFGPT timer as a clock event.\n");
clockevents_register_device(&mfgpt_clockevent);

return 0;
Post by Jordan Crouse
Jordan
Arnd
Willy Tarreau
2008-01-23 21:19:52 UTC
Permalink
The new "mfgptfix" boot command line option may be usd to fix MFGPT
timers on AMD Geode platforms when the BIOS has incorrectly applied
a workaround. TinyBIOS version 0.98 is known to be affected, 0.99
fixes the problem by letting the user disable the workaround.

Signed-off-by: Willy Tarreau <***@1wt.eu>
---
Documentation/kernel-parameters.txt | 5 +++++
arch/x86/kernel/mfgpt_32.c | 15 +++++++++++++++
2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c417877..83c6704 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1051,6 +1051,11 @@ and is between 256 and 4096 characters. It is defined in the file
Multi-Function General Purpose Timers on AMD Geode
platforms.

+ mfgptfix [X86-32] Fix MFGPT timers on AMD Geode platforms when
+ the BIOS has incorrectly applied a workaround. TinyBIOS
+ version 0.98 is known to be affected, 0.99 fixes the
+ problem by letting the user disable the workaround.
+
mga= [HW,DRM]

mousedev.tap_time=
diff --git a/arch/x86/kernel/mfgpt_32.c b/arch/x86/kernel/mfgpt_32.c
index 5519091..f38d4a9 100644
--- a/arch/x86/kernel/mfgpt_32.c
+++ b/arch/x86/kernel/mfgpt_32.c
@@ -63,6 +63,21 @@ static int __init mfgpt_disable(char *s)
}
__setup("nomfgpt", mfgpt_disable);

+/* Reset the MFGPT timers. This is required by some broken BIOSes which already
+ * do the same and leave the system in an unstable state. TinyBIOS 0.98 is
+ * affected at least (0.99 is OK with MFGPT workaround left to off).
+ */
+static int __init mfgpt_fix(char *s)
+{
+ u32 val, dummy;
+
+ /* The following udocumented bit resets the MFGPT timers */
+ val = 0xFF; dummy = 0;
+ wrmsr(0x5140002B, val, dummy);
+ return 1;
+}
+__setup("mfgptfix", mfgpt_fix);
+
/*
* Check whether any MFGPTs are available for the kernel to use. In most
* cases, firmware that uses AMD's VSA code will claim all timers during
--
1.5.3.4
Jordan Crouse
2008-01-19 01:06:24 UTC
Permalink
Post by Jordan Crouse
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.
As promised, a watchdog driver for the Geode GX/LX processors is attached.
I basically just ported the previous patch forward to 2.6.24.

I also have good news or bad news depending on your perspective. I wanted
to test this against 2.6.24, and OLPC is stuck at an older kernel version,
so I had to test this with coreboot (LinuxBIOS) on another Geode
platform. Like all BIOSen execpt for the OLPC firmware, coreboot uses
VSA (SMM handler) which consumes all the timers.

So I used the magical MSR and surprise! - the timer tick hung.
I compiled out the timer tick, and tested the watchdog timer instead,
and it worked fine on timer 0. So I don't think the MFGPTs themselves
have anything to do with this problem, but I do think it might be
related to VSA and possibly interrupts too. I'm going to invoke the
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see what
he comes up with.

I don't know how much of a hassle it would be for Andres to get a 2.6.24
kernel running on the OLPC to make sure that this isn't a regression
in the timer tick code (I suspect it isn't a regression, but you never
know). I also think that it would probably be in our best interest to
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't affect
too many people.

So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
Willy Tarreau
2008-01-19 06:36:36 UTC
Permalink
Post by Jordan Crouse
I don't know how much of a hassle it would be for Andres to get a 2.6.24
kernel running on the OLPC to make sure that this isn't a regression
in the timer tick code (I suspect it isn't a regression, but you never
know). I also think that it would probably be in our best interest to
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't affect
too many people.
Well, I've successfully used earlier version of this code with 2.6.22
on a PCEngines ALIX motherboard equipped with LX800/CS5536. It boots
on a TinyBIOS.

I will try 2.6.24 + this patch on these boards when I have some time.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Hannemann
2008-01-20 13:22:54 UTC
Permalink
Hi,
Post by Jordan Crouse
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.
=20
As promised, a watchdog driver for the Geode GX/LX processors is atta=
ched.
I basically just ported the previous patch forward to 2.6.24.
Great work!
=20
I also have good news or bad news depending on your perspective. I w=
anted
to test this against 2.6.24, and OLPC is stuck at an older kernel ver=
sion,
so I had to test this with coreboot (LinuxBIOS) on another Geode=20
platform. Like all BIOSen execpt for the OLPC firmware, coreboot use=
s
VSA (SMM handler) which consumes all the timers.
=20
So I used the magical MSR and surprise! - the timer tick hung. =20
I compiled out the timer tick, and tested the watchdog timer instead,
and it worked fine on timer 0. So I don't think the MFGPTs themselve=
s
have anything to do with this problem, but I do think it might be=20
related to VSA and possibly interrupts too. I'm going to invoke the
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see wha=
t
he comes up with.
=20
I don't know how much of a hassle it would be for Andres to get a 2.6=
=2E24
kernel running on the OLPC to make sure that this isn't a regression
in the timer tick code (I suspect it isn't a regression, but you neve=
r
know). I also think that it would probably be in our best interest t=
o
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't a=
ffect
too many people.
=20
So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.
Thanks a lot for this, it works great! (with CONFIG_GEODE_MFGPT_TIMER
not set).
However some minor issues:
Could the name of the /dev entry perhaps be changed from
"geode-watchdog" to "watchdog" instead?
I think all other watchdogs use "watchdog", and using two different
watchdogs in the same machine won't work anyway, because of the same
minor number, right?

As a second point my gcc (4.1.2) issues a warning:

drivers/watchdog/geodewdt.c: In function =E2=80=98geodewdt_remove=E2=80=
=99:
drivers/watchdog/geodewdt.c:256: warning: control reaches end of
non-void function

which I think is a valid one.
=20
Jordan
=20
Best regards,
Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jordan Crouse
2008-01-20 16:34:47 UTC
Permalink
Post by Arnd Hannemann
Hi,
=20
Post by Jordan Crouse
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.
=20
As promised, a watchdog driver for the Geode GX/LX processors is at=
tached.
Post by Arnd Hannemann
I basically just ported the previous patch forward to 2.6.24.
=20
Great work!
=20
=20
I also have good news or bad news depending on your perspective. I=
wanted
Post by Arnd Hannemann
to test this against 2.6.24, and OLPC is stuck at an older kernel v=
ersion,
Post by Arnd Hannemann
so I had to test this with coreboot (LinuxBIOS) on another Geode=20
platform. Like all BIOSen execpt for the OLPC firmware, coreboot u=
ses
Post by Arnd Hannemann
VSA (SMM handler) which consumes all the timers.
=20
So I used the magical MSR and surprise! - the timer tick hung. =20
I compiled out the timer tick, and tested the watchdog timer instea=
d,
Post by Arnd Hannemann
and it worked fine on timer 0. So I don't think the MFGPTs themsel=
ves
Post by Arnd Hannemann
have anything to do with this problem, but I do think it might be=20
related to VSA and possibly interrupts too. I'm going to invoke th=
e
Post by Arnd Hannemann
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see w=
hat
Post by Arnd Hannemann
he comes up with.
=20
I don't know how much of a hassle it would be for Andres to get a 2=
=2E6.24
Post by Arnd Hannemann
kernel running on the OLPC to make sure that this isn't a regressio=
n
Post by Arnd Hannemann
in the timer tick code (I suspect it isn't a regression, but you ne=
ver
Post by Arnd Hannemann
know). I also think that it would probably be in our best interest=
to
Post by Arnd Hannemann
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't=
affect
Post by Arnd Hannemann
too many people.
=20
So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.
=20
Thanks a lot for this, it works great! (with CONFIG_GEODE_MFGPT_TIMER
not set).
Could the name of the /dev entry perhaps be changed from
"geode-watchdog" to "watchdog" instead?
I think all other watchdogs use "watchdog", and using two different
watchdogs in the same machine won't work anyway, because of the same
minor number, right?
Very much yes - complete oversight on my part.
Post by Arnd Hannemann
=20
drivers/watchdog/geodewdt.c: In function =E2=80=98geodewdt_remove=E2=80=
drivers/watchdog/geodewdt.c:256: warning: control reaches end of
non-void function
=20
which I think is a valid one.
Yes again. I'll refactor. Thanks for your comments.

Jordan
--=20
Jordan Crouse
Systems Software Development Engineer=20
Advanced Micro Devices, Inc.
Jordan Crouse
2008-01-21 17:07:00 UTC
Permalink
Post by Arnd Hannemann
Hi,
Post by Jordan Crouse
Post by Jordan Crouse
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.
As promised, a watchdog driver for the Geode GX/LX processors is attached.
I basically just ported the previous patch forward to 2.6.24.
Great work!
Post by Jordan Crouse
I also have good news or bad news depending on your perspective. I wanted
to test this against 2.6.24, and OLPC is stuck at an older kernel version,
so I had to test this with coreboot (LinuxBIOS) on another Geode
platform. Like all BIOSen execpt for the OLPC firmware, coreboot uses
VSA (SMM handler) which consumes all the timers.
So I used the magical MSR and surprise! - the timer tick hung.
I compiled out the timer tick, and tested the watchdog timer instead,
and it worked fine on timer 0. So I don't think the MFGPTs themselves
have anything to do with this problem, but I do think it might be
related to VSA and possibly interrupts too. I'm going to invoke the
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see what
he comes up with.
I don't know how much of a hassle it would be for Andres to get a 2.6.24
kernel running on the OLPC to make sure that this isn't a regression
in the timer tick code (I suspect it isn't a regression, but you never
know). I also think that it would probably be in our best interest to
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't affect
too many people.
So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.
Thanks a lot for this, it works great! (with CONFIG_GEODE_MFGPT_TIMER
not set).
Could the name of the /dev entry perhaps be changed from
"geode-watchdog" to "watchdog" instead?
I think all other watchdogs use "watchdog", and using two different
watchdogs in the same machine won't work anyway, because of the same
minor number, right?
drivers/watchdog/geodewdt.c:256: warning: control reaches end of
non-void function
which I think is a valid one.
Resending with Arnd's concern's addressed. Thanks.

Jordan
Arnd Hannemann
2008-01-21 18:37:24 UTC
Permalink
Post by Arnd Hannemann
Hi,
Post by Jordan Crouse
Post by Arnd Hannemann
Watchdog for the new API would be great :-)
Coming soon.
As promised, a watchdog driver for the Geode GX/LX processors is at=
tached.
Post by Arnd Hannemann
I basically just ported the previous patch forward to 2.6.24.
Great work!
I also have good news or bad news depending on your perspective. I=
wanted
Post by Arnd Hannemann
to test this against 2.6.24, and OLPC is stuck at an older kernel v=
ersion,
Post by Arnd Hannemann
so I had to test this with coreboot (LinuxBIOS) on another Geode=20
platform. Like all BIOSen execpt for the OLPC firmware, coreboot u=
ses
Post by Arnd Hannemann
VSA (SMM handler) which consumes all the timers.
So I used the magical MSR and surprise! - the timer tick hung. =20
I compiled out the timer tick, and tested the watchdog timer instea=
d,
Post by Arnd Hannemann
and it worked fine on timer 0. So I don't think the MFGPTs themsel=
ves
Post by Arnd Hannemann
have anything to do with this problem, but I do think it might be=20
related to VSA and possibly interrupts too. I'm going to invoke th=
e
Post by Arnd Hannemann
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see w=
hat
Post by Arnd Hannemann
he comes up with.
I don't know how much of a hassle it would be for Andres to get a 2=
=2E6.24
Post by Arnd Hannemann
kernel running on the OLPC to make sure that this isn't a regressio=
n
Post by Arnd Hannemann
in the timer tick code (I suspect it isn't a regression, but you ne=
ver
Post by Arnd Hannemann
know). I also think that it would probably be in our best interest=
to
Post by Arnd Hannemann
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't=
affect
Post by Arnd Hannemann
too many people.
So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.
Thanks a lot for this, it works great! (with CONFIG_GEODE_MFGPT_TIME=
R
Post by Arnd Hannemann
not set).
Could the name of the /dev entry perhaps be changed from
"geode-watchdog" to "watchdog" instead?
I think all other watchdogs use "watchdog", and using two different
watchdogs in the same machine won't work anyway, because of the same
minor number, right?
drivers/watchdog/geodewdt.c: In function =E2=80=98geodewdt_remove=E2=
drivers/watchdog/geodewdt.c:256: warning: control reaches end of
non-void function
which I think is a valid one.
=20
Resending with Arnd's concern's addressed. Thanks. =20
I can confirm that it is still working fine :-)
=20
Jordan
=20
Greetings,
Arnd
Iain Paton
2008-02-17 14:14:18 UTC
Permalink
Post by Arnd Hannemann
I can confirm that it is still working fine :-)
Hi,

Has anyone managed to build this as a module against the full 2.6.24
release ?

I am seeing the following error:

CC [M] lib/zlib_inflate/infutil.o
CC [M] lib/zlib_inflate/inftrees.o
CC [M] lib/zlib_inflate/inflate_syms.o
LD [M] lib/zlib_inflate/zlib_inflate.o
Building modules, stage 2.
MODPOST 251 modules
ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2


arch/x86/kernel/mfgpt_32.c where these are defined seems to have been
built ok and they appear in System.map.

The only other thing I've done is to add "[PATCH] pata_cs5536 Fix
secondary port configuration" recently posted by Martin K. Petersen but
I don't believe that would cause this.

.config is available at http://pastebin.ca/907299

Iain
Arnd Hannemann
2008-02-17 14:46:01 UTC
Permalink
Hi,
Post by Iain Paton
Post by Arnd Hannemann
I can confirm that it is still working fine :-)
Hi,
Has anyone managed to build this as a module against the full 2.6.24
release ?
CC [M] lib/zlib_inflate/infutil.o
CC [M] lib/zlib_inflate/inftrees.o
CC [M] lib/zlib_inflate/inflate_syms.o
LD [M] lib/zlib_inflate/zlib_inflate.o
Building modules, stage 2.
MODPOST 251 modules
ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
arch/x86/kernel/mfgpt_32.c where these are defined seems to have been
built ok and they appear in System.map.
The only other thing I've done is to add "[PATCH] pata_cs5536 Fix
secondary port configuration" recently posted by Martin K. Petersen but
I don't believe that would cause this.
.config is available at http://pastebin.ca/907299
Never tried to built it as a module.
Probably there are issues with that. If I remember correctly I saw a patch in 2.6.25-rc which
mentioned that using mfgpt in modules won't work. Does this apply to 2.6.24 as well?

Best regards,
Arnd Hannemann
Adrian Bunk
2008-02-17 14:54:50 UTC
Permalink
Post by Arnd Hannemann
Hi,
Post by Iain Paton
Post by Arnd Hannemann
I can confirm that it is still working fine :-)
Hi,
Has anyone managed to build this as a module against the full 2.6.24
release ?
CC [M] lib/zlib_inflate/infutil.o
CC [M] lib/zlib_inflate/inftrees.o
CC [M] lib/zlib_inflate/inflate_syms.o
LD [M] lib/zlib_inflate/zlib_inflate.o
Building modules, stage 2.
MODPOST 251 modules
ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
arch/x86/kernel/mfgpt_32.c where these are defined seems to have been
built ok and they appear in System.map.
The only other thing I've done is to add "[PATCH] pata_cs5536 Fix
secondary port configuration" recently posted by Martin K. Petersen but
I don't believe that would cause this.
.config is available at http://pastebin.ca/907299
Never tried to built it as a module.
Probably there are issues with that. If I remember correctly I saw a patch in 2.6.25-rc which
mentioned that using mfgpt in modules won't work. Does this apply to 2.6.24 as well?
For using code from modules it must be explicitely EXPORT_SYMBOL{,GPL}'ed.

Adding
EXPORT_SYMBOL_GPL(geode_mfgpt_toggle_event);
and
EXPORT_SYMBOL_GPL(geode_mfgpt_alloc_timer);
below the respective functions in mfgpt_32.c should fix this issue.
Post by Arnd Hannemann
Best regards,
Arnd Hannemann
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
Iain Paton
2008-02-17 16:10:09 UTC
Permalink
Post by Adrian Bunk
Post by Arnd Hannemann
Never tried to built it as a module.
Probably there are issues with that. If I remember correctly I saw a patch in 2.6.25-rc which
mentioned that using mfgpt in modules won't work. Does this apply to 2.6.24 as well?
For using code from modules it must be explicitely EXPORT_SYMBOL{,GPL}'ed.
Adding
EXPORT_SYMBOL_GPL(geode_mfgpt_toggle_event);
and
EXPORT_SYMBOL_GPL(geode_mfgpt_alloc_timer);
below the respective functions in mfgpt_32.c should fix this issue.
I couldn't find the patch Arnd mentioned on lkml or in Linus git tree,
but adding the lines suggested by Adrian gets me a working module.

Hopefully the patch will arrive in the mainline tree at some point.

Thanks to both of you for the help!

Iain
Andres Salomon
2008-02-17 17:32:18 UTC
Permalink
On Sun, 17 Feb 2008 16:10:09 +0000
Post by Iain Paton
Post by Adrian Bunk
Post by Arnd Hannemann
Never tried to built it as a module.
Probably there are issues with that. If I remember correctly I saw a patch in 2.6.25-rc which
mentioned that using mfgpt in modules won't work. Does this apply to 2.6.24 as well?
For using code from modules it must be explicitely EXPORT_SYMBOL{,GPL}'ed.
Adding
EXPORT_SYMBOL_GPL(geode_mfgpt_toggle_event);
and
EXPORT_SYMBOL_GPL(geode_mfgpt_alloc_timer);
below the respective functions in mfgpt_32.c should fix this issue.
I couldn't find the patch Arnd mentioned on lkml or in Linus git tree,
but adding the lines suggested by Adrian gets me a working module.
Hopefully the patch will arrive in the mainline tree at some point.
This was originally split out into two separate patches; one that exported
the proper symbols, and the other containing the watchdog timer. I merged
them in the geode tree. The patch is here:

http://git.infradead.org/?p=geode.git;a=commitdiff;h=5a840828ddb5bb7381435509a9460e0ba4aab550

That's also checkpatch.pl happy (or at least, it was when I committed it).
Arnd Hannemann
2008-02-17 19:46:55 UTC
Permalink
Post by Iain Paton
Post by Adrian Bunk
Post by Arnd Hannemann
Never tried to built it as a module.
Probably there are issues with that. If I remember correctly I saw a
patch in 2.6.25-rc which
mentioned that using mfgpt in modules won't work. Does this apply to 2.6.24 as well?
For using code from modules it must be explicitely
EXPORT_SYMBOL{,GPL}'ed.
Adding
EXPORT_SYMBOL_GPL(geode_mfgpt_toggle_event);
and
EXPORT_SYMBOL_GPL(geode_mfgpt_alloc_timer);
below the respective functions in mfgpt_32.c should fix this issue.
I couldn't find the patch Arnd mentioned on lkml or in Linus git tree,
but adding the lines suggested by Adrian gets me a working module.
I meant commit fa28e067c3b8af96c79c060e163b1387c172ae75 in linus git, but seems
I misinterpreted it.
Post by Iain Paton
x86: GEODE: MFGPT: drop module owner usage from MFGPT API
We had planned to use the 'owner' field for allowing re-allocation of
MFGPTs; however, doing it by module owner name isn't flexible enough. So,
drop this for now. If it turns out that we need timers in modules, we'll
need to come up with a scheme that matches the write-once fields of the
MFGPTx_SETUP register, and drops ponies from the sky.
Best regards,
Arnd

Lennart Sorensen
2008-01-20 20:16:03 UTC
Permalink
Post by Jordan Crouse
As promised, a watchdog driver for the Geode GX/LX processors is attached.
I basically just ported the previous patch forward to 2.6.24.
I also have good news or bad news depending on your perspective. I wanted
to test this against 2.6.24, and OLPC is stuck at an older kernel version,
so I had to test this with coreboot (LinuxBIOS) on another Geode
platform. Like all BIOSen execpt for the OLPC firmware, coreboot uses
VSA (SMM handler) which consumes all the timers.
Hmm, the Geode LX platform I work with uses VSA/SMM and I have had no
problems using timer 2 for a watchdog. Some of the timers are certainly
already setup by something in the BIOS, but quite a few are not in use.
The board I am using is the compulab iGLX module. I did have to mangle
together a watchdog driver myself some time ago since there wasn't one
for the Geode LX at the time.

Of course I haven't gone past 2.6.18 kernel for now (I am tracking
Debian stable for the most part), so I have no idea if something in a
newer kernel breaks things.
Post by Jordan Crouse
So I used the magical MSR and surprise! - the timer tick hung.
I compiled out the timer tick, and tested the watchdog timer instead,
and it worked fine on timer 0. So I don't think the MFGPTs themselves
have anything to do with this problem, but I do think it might be
related to VSA and possibly interrupts too. I'm going to invoke the
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see what
he comes up with.
I don't know how much of a hassle it would be for Andres to get a 2.6.24
kernel running on the OLPC to make sure that this isn't a regression
in the timer tick code (I suspect it isn't a regression, but you never
know). I also think that it would probably be in our best interest to
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out. Since most BIOSen don't have timers available, that shouldn't affect
too many people.
So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.
--
Len Sorensen
Loading...