Discussion:
[PATCH] RTC: Add mmap method to rtc character driver
(too old to reply)
Neil Horman
2006-07-25 17:41:00 UTC
Permalink
Hey-
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond resolution timer
in userspace, one that they could perhaps read from a memory mapped page. I was
right behind them and though that seemed like a reasonable request, so I've
taken a stab at it. This patch allows for a page to be mmaped from /dev/rtc
character interface, the first 4 bytes of which provide a regularly increasing
count, once every rtc interrupt. The frequency is of course controlled by the
regular ioctls provided by the rtc driver. I've done some basic testing on it,
and it seems to work well.

Thanks And Regards
Neil

Signed-off-by: Neil Horman



rtc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
1 files changed, 40 insertions(+), 1 deletion(-)


diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index 6e6a7c7..4ed673e 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -48,9 +48,10 @@
* CONFIG_HPET_EMULATE_RTC
* 1.12a Maciej W. Rozycki: Handle memory-mapped chips properly.
* 1.12ac Alan Cox: Allow read access to the day of week register
+ * 1.12b Neil Horman: Allow memory mapping of /dev/rtc
*/

-#define RTC_VERSION "1.12ac"
+#define RTC_VERSION "1.12b"

/*
* Note that *all* calls to CMOS_READ and CMOS_WRITE are done with
@@ -183,6 +184,8 @@ static int rtc_proc_open(struct inode *i
*/
static unsigned long rtc_status = 0; /* bitmapped status byte. */
static unsigned long rtc_freq = 0; /* Current periodic IRQ rate */
+#define BUF_SIZE (PAGE_SIZE/sizeof(unsigned long))
+static unsigned long rtc_irq_buf[BUF_SIZE] __attribute__ ((aligned (PAGE_SIZE)));
static unsigned long rtc_irq_data = 0; /* our output to the world */
static unsigned long rtc_max_user_freq = 64; /* > this, need CAP_SYS_RESOURCE */

@@ -230,6 +233,7 @@ static inline unsigned char rtc_is_updat

irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
+ unsigned long *count_ptr = (unsigned long *)rtc_irq_buf;
/*
* Can be an alarm interrupt, update complete interrupt,
* or a periodic interrupt. We store the status in the
@@ -265,6 +269,7 @@ irqreturn_t rtc_interrupt(int irq, void

kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);

+ *count_ptr = (*count_ptr)++;
return IRQ_HANDLED;
}
#endif
@@ -389,6 +394,37 @@ static ssize_t rtc_read(struct file *fil
#endif
}

+static int rtc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ unsigned long rtc_addr;
+ unsigned long *count_ptr = rtc_irq_buf;
+
+ if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+ return -EINVAL;
+
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+
+ if (PAGE_SIZE > (1 << 16))
+ return -ENOSYS;
+
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+ rtc_addr = __pa(rtc_irq_buf);
+ rtc_addr &= ~(PAGE_SIZE - 1);
+ rtc_addr &= -1;
+
+ if (remap_pfn_range(vma, vma->vm_start, rtc_addr >> PAGE_SHIFT,
+ PAGE_SIZE, vma->vm_page_prot)) {
+ printk(KERN_ERR "remap_pfn_range failed in rtc.c\n");
+ return -EAGAIN;
+ }
+
+ *count_ptr = 0;
+ return 0;
+
+}
+
static int rtc_do_ioctl(unsigned int cmd, unsigned long arg, int kernel)
{
struct rtc_time wtime;
@@ -890,6 +926,7 @@ static const struct file_operations rtc_
.owner = THIS_MODULE,
.llseek = no_llseek,
.read = rtc_read,
+ .mmap = rtc_mmap,
#ifdef RTC_IRQ
.poll = rtc_poll,
#endif
@@ -1082,6 +1119,8 @@ no_irq:
no_irq2:
#endif

+ memset(rtc_irq_buf,0,PAGE_SIZE);
+
(void) init_sysctl();

printk(KERN_INFO "Real Time Clock Driver v" RTC_VERSION "\n");
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Arjan van de Ven
2006-07-25 17:55:39 UTC
Permalink
Post by Neil Horman
@@ -265,6 +269,7 @@ irqreturn_t rtc_interrupt(int irq, void
kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
+ *count_ptr = (*count_ptr)++;
Hi,

it's a cute idea, however 3 questions:
1) you probably want to add a few memory barriers around this, right?
2) why use the rtc and not the regular timer interrupt?

(and
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )

Greetings,
Arjan van de Ven
Jim Gettys
2006-07-25 18:01:54 UTC
Permalink
Post by Arjan van de Ven
Post by Neil Horman
@@ -265,6 +269,7 @@ irqreturn_t rtc_interrupt(int irq, void
kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
+ *count_ptr = (*count_ptr)++;
Hi,
1) you probably want to add a few memory barriers around this, right?
2) why use the rtc and not the regular timer interrupt?
(and
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )
The field only needs to get updated if you've scheduled something to
run...
- Jim
Post by Arjan van de Ven
Greetings,
Arjan van de Ven
--
Jim Gettys
One Laptop Per Child
Neil Horman
2006-07-25 18:22:08 UTC
Permalink
Post by Arjan van de Ven
Post by Neil Horman
@@ -265,6 +269,7 @@ irqreturn_t rtc_interrupt(int irq, void
kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
+ *count_ptr = (*count_ptr)++;
Hi,
1) you probably want to add a few memory barriers around this, right?
Actually, I was curious about that. I had initially planned on using xchg to
update the counter on the mmaped page, but then it occured to me that an
unsigned long would always be an atomic update (or so I thought). Is there a
case in which userspace will read a munged value the way the code is now?
Post by Arjan van de Ven
2) why use the rtc and not the regular timer interrupt?
Honestly, because it seemed to be a quick way forward. The real time clock
driver was there, and its infrastructure lent itself rather easily toward adding
this functionality. If you can elaborate on a better suggestion, I'll happily
take a crack at it.
Post by Arjan van de Ven
(and
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )
That is true, but only in the case where someone opens up /dev/rtc, and if they
open that driver and send it a UIE or PIE ioctl, it will start ticking
regardless of this patch (or that is at least my impression).

Regards
Neil
Post by Arjan van de Ven
Greetings,
Arjan van de Ven
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Arjan van de Ven
2006-07-25 18:32:55 UTC
Permalink
Post by Neil Horman
Post by Arjan van de Ven
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )
That is true, but only in the case where someone opens up /dev/rtc, and if they
open that driver and send it a UIE or PIE ioctl, it will start ticking
regardless of this patch (or that is at least my impression).
but.. if that's X like you said.. then it's basically "always"...
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Neil Horman
2006-07-25 18:43:28 UTC
Permalink
Post by Arjan van de Ven
Post by Neil Horman
Post by Arjan van de Ven
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )
That is true, but only in the case where someone opens up /dev/rtc, and if they
open that driver and send it a UIE or PIE ioctl, it will start ticking
regardless of this patch (or that is at least my impression).
but.. if that's X like you said.. then it's basically "always"...
Well, not always (considering the number of non-X embedded systems out there),
but I take your point. So it really boils down to not having a tickless kernel,
or an X server that calls gettimeofday 1 million times per second (I think thats
the number that Dave threw out there). Unless of course, you have a third
alternative, which, as I mentioned before I would be happy to take a crack at,
if you would elaborate on your idea a little more.

Thanks & Regards
Neil
Post by Arjan van de Ven
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Arjan van de Ven
2006-07-25 18:53:16 UTC
Permalink
Post by Neil Horman
Post by Arjan van de Ven
Post by Neil Horman
Post by Arjan van de Ven
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )
That is true, but only in the case where someone opens up /dev/rtc, and if they
open that driver and send it a UIE or PIE ioctl, it will start ticking
regardless of this patch (or that is at least my impression).
but.. if that's X like you said.. then it's basically "always"...
Well, not always (considering the number of non-X embedded systems out there),
but I take your point. So it really boils down to not having a tickless kernel,
or an X server that calls gettimeofday 1 million times per second (I think thats
the number that Dave threw out there). Unless of course, you have a third
alternative, which, as I mentioned before I would be happy to take a crack at,
if you would elaborate on your idea a little more.
well the idea that has been tossed about a few times is using a vsyscall
function that either calls into the kernel, or directly uses the hpet
page (which can be user mapped) to get time information that way...
or even would use rdtsc in a way the kernel knows is safe (eg corrected
for the local cpu's speed and offset etc etc).
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Neil Horman
2006-07-25 19:03:57 UTC
Permalink
Post by Arjan van de Ven
Post by Neil Horman
Post by Arjan van de Ven
Post by Neil Horman
Post by Arjan van de Ven
3) this will negate the power gain you get for tickless kernels, since
now they need to start ticking again ;( )
That is true, but only in the case where someone opens up /dev/rtc, and if they
open that driver and send it a UIE or PIE ioctl, it will start ticking
regardless of this patch (or that is at least my impression).
but.. if that's X like you said.. then it's basically "always"...
Well, not always (considering the number of non-X embedded systems out there),
but I take your point. So it really boils down to not having a tickless kernel,
or an X server that calls gettimeofday 1 million times per second (I think thats
the number that Dave threw out there). Unless of course, you have a third
alternative, which, as I mentioned before I would be happy to take a crack at,
if you would elaborate on your idea a little more.
well the idea that has been tossed about a few times is using a vsyscall
function that either calls into the kernel, or directly uses the hpet
page (which can be user mapped) to get time information that way...
or even would use rdtsc in a way the kernel knows is safe (eg corrected
for the local cpu's speed and offset etc etc).
Ok, that makes sense, although thats only going to be supportable on hpet
enabled systems right? Would a "both" make more sense, so that things like X
can get user space monotonic time regardless of cpu abilities?

Regarsds
Neil
Post by Arjan van de Ven
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Arjan van de Ven
2006-07-25 19:06:52 UTC
Permalink
Post by Neil Horman
Post by Arjan van de Ven
well the idea that has been tossed about a few times is using a vsyscall
function that either calls into the kernel, or directly uses the hpet
page (which can be user mapped) to get time information that way...
or even would use rdtsc in a way the kernel knows is safe (eg corrected
for the local cpu's speed and offset etc etc).
Ok, that makes sense, although thats only going to be supportable on hpet
enabled systems right?
well it's only going to be *fast* on hpet enabled systems (which should
be the *vast* majority nowadays if it wasn't for some silly bios
defaults by some vendors); all others can just fall back to other
methods. The beauty of the vsyscall concept :)
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
John W. Linville
2006-07-25 19:07:48 UTC
Permalink
Post by Arjan van de Ven
Post by Neil Horman
alternative, which, as I mentioned before I would be happy to take a crack at,
if you would elaborate on your idea a little more.
well the idea that has been tossed about a few times is using a vsyscall
function that either calls into the kernel, or directly uses the hpet
page (which can be user mapped) to get time information that way...
or even would use rdtsc in a way the kernel knows is safe (eg corrected
for the local cpu's speed and offset etc etc).
Aren't both of those examples x86(_64)-specific? Wouldn't a generic
solution be preferrable?

John
--
John W. Linville
***@tuxdriver.com
Arjan van de Ven
2006-07-25 19:16:02 UTC
Permalink
Post by John W. Linville
Post by Arjan van de Ven
Post by Neil Horman
alternative, which, as I mentioned before I would be happy to take a crack at,
if you would elaborate on your idea a little more.
well the idea that has been tossed about a few times is using a vsyscall
function that either calls into the kernel, or directly uses the hpet
page (which can be user mapped) to get time information that way...
or even would use rdtsc in a way the kernel knows is safe (eg corrected
for the local cpu's speed and offset etc etc).
Aren't both of those examples x86(_64)-specific? Wouldn't a generic
solution be preferrable?
the implementation is; the interface.. not so. other platforms can
implement their optimal solution obviously...
H. Peter Anvin
2006-07-25 19:08:38 UTC
Permalink
Post by Arjan van de Ven
well the idea that has been tossed about a few times is using a vsyscall
function that either calls into the kernel, or directly uses the hpet
page (which can be user mapped) to get time information that way...
or even would use rdtsc in a way the kernel knows is safe (eg corrected
for the local cpu's speed and offset etc etc).
x86-64 already does that, IIRC.

-hpa
Segher Boessenkool
2006-07-25 17:57:30 UTC
Permalink
Post by Neil Horman
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond
resolution timer
in userspace, one that they could perhaps read from a memory mapped page. I was
right behind them and though that seemed like a reasonable
request, so I've
taken a stab at it. This patch allows for a page to be mmaped
from /dev/rtc
character interface, the first 4 bytes of which provide a regularly increasing
count, once every rtc interrupt. The frequency is of course
controlled by the
regular ioctls provided by the rtc driver. I've done some basic testing on it,
and it seems to work well.
Similar functionality is already available via VDSO on
platforms that support it (currently PowerPC and AMD64?) --
seems like a better way forward.


Segher
Neil Horman
2006-07-25 18:28:33 UTC
Permalink
Post by Segher Boessenkool
Post by Neil Horman
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond resolution timer
in userspace, one that they could perhaps read from a memory mapped page. I was
right behind them and though that seemed like a reasonable
request, so I've
taken a stab at it. This patch allows for a page to be mmaped from /dev/rtc
character interface, the first 4 bytes of which provide a regularly increasing
count, once every rtc interrupt. The frequency is of course
controlled by the
regular ioctls provided by the rtc driver. I've done some basic testing on it,
and it seems to work well.
Similar functionality is already available via VDSO on
platforms that support it (currently PowerPC and AMD64?) --
seems like a better way forward.
In general I agree, but that only works if you operate on a platform that
supports virtual syscalls, and has vdso configured. I'm not overly familiar
with vdso, but I didn't think vdso could be supported on all platforms/arches.
This seems like it might be a nice addition in those cases.

Neil
Post by Segher Boessenkool
Segher
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Segher Boessenkool
2006-07-25 18:56:14 UTC
Permalink
Post by Neil Horman
Post by Segher Boessenkool
Similar functionality is already available via VDSO on
platforms that support it (currently PowerPC and AMD64?) --
seems like a better way forward.
In general I agree, but that only works if you operate on a
platform that
supports virtual syscalls, and has vdso configured.
That's why I said "a better way forward", not "this already
works everywhere".
Post by Neil Horman
I'm not overly familiar
with vdso, but I didn't think vdso could be supported on all
platforms/arches.
Oh? Which can not, and why?


Segher
Neil Horman
2006-07-25 19:07:47 UTC
Permalink
Post by Segher Boessenkool
Post by Neil Horman
Post by Segher Boessenkool
Similar functionality is already available via VDSO on
platforms that support it (currently PowerPC and AMD64?) --
seems like a better way forward.
In general I agree, but that only works if you operate on a
platform that
supports virtual syscalls, and has vdso configured.
That's why I said "a better way forward", not "this already
works everywhere".
Post by Neil Horman
I'm not overly familiar
with vdso, but I didn't think vdso could be supported on all
platforms/arches.
Oh? Which can not, and why?
I'm sorry, I shouldn't say that vdso itself can't be supported, but rather a
vsyscall that doesn't just wind up trapping into the kernel anyway. Older
systems without a hpet timer to map into user space jump immediately to mind.
Arjan had mentioned a calibration on rdtsc as another alternative, which I had
not considered, so this may all be moot, but I was worried that a vdso solution
wouldn't always give the X guys what they were really after.

Regards
Neil
Post by Segher Boessenkool
Segher
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
H. Peter Anvin
2006-07-25 19:10:09 UTC
Permalink
Post by Neil Horman
In general I agree, but that only works if you operate on a platform that
supports virtual syscalls, and has vdso configured. I'm not overly familiar
with vdso, but I didn't think vdso could be supported on all platforms/arches.
This seems like it might be a nice addition in those cases.
Not really. This introduces a potentially very difficult support
user-visible interface. Consider a tickless kernel -- you might end up
taking tick interrupts ONLY to update this page, since you don't have
any way of knowing when userspace wants to look at it.

-hpa
Neil Horman
2006-07-25 19:21:38 UTC
Permalink
Post by H. Peter Anvin
Post by Neil Horman
In general I agree, but that only works if you operate on a platform that
supports virtual syscalls, and has vdso configured. I'm not overly familiar
with vdso, but I didn't think vdso could be supported on all
platforms/arches.
This seems like it might be a nice addition in those cases.
Not really. This introduces a potentially very difficult support
user-visible interface. Consider a tickless kernel -- you might end up
taking tick interrupts ONLY to update this page, since you don't have
any way of knowing when userspace wants to look at it.
Well, you do actually know when they want to look at it. The rtc driver only
unmasks its interrupt when a user space process has opened the device and sent
it a RTC_UIE ON or RTC_PIE_ON (or other shuch ioctl). So if you open /dev/rtc,
and memory map the page, but never enable a timer method, then every read of the
page returns zero. The only overhead this patch is currently adding, execution
time-wise is the extra time it takes to write to a the shared page variable. If
the timer tick interrupt is executing, its because someone is reading tick data,
or plans to very soon.

Neil
Post by H. Peter Anvin
-hpa
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Segher Boessenkool
2006-07-25 19:31:32 UTC
Permalink
Post by Neil Horman
Post by H. Peter Anvin
Not really. This introduces a potentially very difficult support
user-visible interface. Consider a tickless kernel -- you might end up
taking tick interrupts ONLY to update this page, since you don't have
any way of knowing when userspace wants to look at it.
Well, you do actually know when they want to look at it. The rtc driver only
unmasks its interrupt when a user space process has opened the
device and sent
it a RTC_UIE ON or RTC_PIE_ON (or other shuch ioctl). So if you open /dev/rtc,
and memory map the page, but never enable a timer method, then
every read of the
page returns zero. The only overhead this patch is currently
adding, execution
time-wise is the extra time it takes to write to a the shared page variable. If
the timer tick interrupt is executing, its because someone is
reading tick data,
or plans to very soon.
But userland cannot know if there is a more efficient option to
use than this /dev/rtc way, without using VDSO/vsyscall.


Segher
Neil Horman
2006-07-25 19:47:33 UTC
Permalink
Post by Segher Boessenkool
Post by Neil Horman
Post by H. Peter Anvin
Not really. This introduces a potentially very difficult support
user-visible interface. Consider a tickless kernel -- you might end up
taking tick interrupts ONLY to update this page, since you don't have
any way of knowing when userspace wants to look at it.
Well, you do actually know when they want to look at it. The rtc driver only
unmasks its interrupt when a user space process has opened the device and sent
it a RTC_UIE ON or RTC_PIE_ON (or other shuch ioctl). So if you open /dev/rtc,
and memory map the page, but never enable a timer method, then every read of the
page returns zero. The only overhead this patch is currently
adding, execution
time-wise is the extra time it takes to write to a the shared page variable. If
the timer tick interrupt is executing, its because someone is
reading tick data,
or plans to very soon.
But userland cannot know if there is a more efficient option to
use than this /dev/rtc way, without using VDSO/vsyscall.
Sure, but detecting if /dev/rtc via mmap is faster than gettimeofday is an
orthogonal issue to having the choice in the first place. I say let the X guys
write code to determine at run time what is more efficient to get their job
done. I really just wanted to give them the ability to avoid making a million
kernel traps a second for those arches where a userspace gettimeofday is not
yet implemented, or cannot be implemented. It won't cost anything to add this
feature, and if the Xorg people can write code to use gettimeofday if its faster
than mmaped /dev/rtc (or even configured to do so at compile-time). This patch
doesn't create any interrupts that wouldn't be generated already anyway by any
user using /dev/rtc, and even if X doesn't already use /dev/rtc, the added
interrupts are in trade for an equally fewer number of kernel traps, which I
think has to be a net savings.

I'm not saying we shouldn't implement a vsyscall on more platforms to provide a
speedup for this problem (in fact I'm interested to learn how, since I hadn't
previously considered that as a possibility), but I think offering the choice is
a smart thing to do until the latter solution gets propogated to other
arches/platforms besides x86_64

Regards
Neil
Post by Segher Boessenkool
Segher
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Dave Airlie
2006-07-25 20:04:14 UTC
Permalink
Post by Neil Horman
Post by Segher Boessenkool
But userland cannot know if there is a more efficient option to
use than this /dev/rtc way, without using VDSO/vsyscall.
Sure, but detecting if /dev/rtc via mmap is faster than gettimeofday is an
orthogonal issue to having the choice in the first place. I say let the X guys
write code to determine at run time what is more efficient to get their job
done. I really just wanted to give them the ability to avoid making a million
kernel traps a second for those arches where a userspace gettimeofday is not
yet implemented, or cannot be implemented. It won't cost anything to add this
feature, and if the Xorg people can write code to use gettimeofday if its faster
than mmaped /dev/rtc (or even configured to do so at compile-time). This patch
doesn't create any interrupts that wouldn't be generated already anyway by any
user using /dev/rtc, and even if X doesn't already use /dev/rtc, the added
interrupts are in trade for an equally fewer number of kernel traps, which I
think has to be a net savings.
I'm not saying we shouldn't implement a vsyscall on more platforms to provide a
speedup for this problem (in fact I'm interested to learn how, since I hadn't
previously considered that as a possibility), but I think offering the choice is
a smart thing to do until the latter solution gets propogated to other
arches/platforms besides x86_64
So far the requirements are pretty much not high resolution but is
accurate and increasing. so like 10ms is fine, the current X timer is
in the 20ms range.

I think an mmap'ed page with whatever cgt(CLOCK_MONOTONIC) returns
would be very good, but it might be nice to implement some sort of new
generic /dev that X can mmap and each arch can do what they want in
it,

I'm wondering why x86 doesn't have gettimeofday vDSO (does x86 have
proper vDSO support at all apart from sysenter?),

Dave.
H. Peter Anvin
2006-07-25 20:24:58 UTC
Permalink
Post by Dave Airlie
So far the requirements are pretty much not high resolution but is
accurate and increasing. so like 10ms is fine, the current X timer is
in the 20ms range.
I think an mmap'ed page with whatever cgt(CLOCK_MONOTONIC) returns
would be very good, but it might be nice to implement some sort of new
generic /dev that X can mmap and each arch can do what they want in
it,
I'm wondering why x86 doesn't have gettimeofday vDSO (does x86 have
proper vDSO support at all apart from sysenter?),
The i386 vdso right now has only two entry points, as far as I can tell:
system call and signal return.

There is no reason it couldn't have more than that. A low-resolution
and a high-resolution gettimeofday might be a good idea.

-hpa
Neil Horman
2006-07-25 20:47:36 UTC
Permalink
Post by H. Peter Anvin
Post by Dave Airlie
So far the requirements are pretty much not high resolution but is
accurate and increasing. so like 10ms is fine, the current X timer is
in the 20ms range.
I think an mmap'ed page with whatever cgt(CLOCK_MONOTONIC) returns
would be very good, but it might be nice to implement some sort of new
generic /dev that X can mmap and each arch can do what they want in
it,
I'm wondering why x86 doesn't have gettimeofday vDSO (does x86 have
proper vDSO support at all apart from sysenter?),
system call and signal return.
There is no reason it couldn't have more than that. A low-resolution
and a high-resolution gettimeofday might be a good idea.
-hpa
Agreed. How about we take the /dev/rtc patch now (since its an added feature
that doesn't hurt anything if its not used, as far as tickless kernels go), and
I'll start working on doing gettimeofday in vdso for arches other than x86_64.
That will give the X guys what they wanted until such time until all the other
arches have a gettimeofday alternative that doesn't require kernel traps.

Regards
Neil
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
H. Peter Anvin
2006-07-25 20:50:52 UTC
Permalink
Post by Neil Horman
Agreed. How about we take the /dev/rtc patch now (since its an added feature
that doesn't hurt anything if its not used, as far as tickless kernels go), and
I'll start working on doing gettimeofday in vdso for arches other than x86_64.
That will give the X guys what they wanted until such time until all the other
arches have a gettimeofday alternative that doesn't require kernel traps.
It hurts if it DOES get used.

-hpa
Neil Horman
2006-07-25 22:25:47 UTC
Permalink
Post by H. Peter Anvin
Post by Neil Horman
Agreed. How about we take the /dev/rtc patch now (since its an added feature
that doesn't hurt anything if its not used, as far as tickless kernels go), and
I'll start working on doing gettimeofday in vdso for arches other than x86_64.
That will give the X guys what they wanted until such time until all the other
arches have a gettimeofday alternative that doesn't require kernel traps.
It hurts if it DOES get used.
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.

I'm not arguing with you that adding a low res gettimeofday vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches. It has zero impact for systems which do not use it, and for
those applications which make a decision to use it instead of an alternate
method, the result I expect will be a net gain, until such time as we code up,
test and roll out a vsyscall solution

Thanks & Regards
Neil
Post by H. Peter Anvin
-hpa
H. Peter Anvin
2006-07-25 22:33:18 UTC
Permalink
Post by Neil Horman
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.
I'm not arguing with you that adding a low res gettimeofday vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches. It has zero impact for systems which do not use it, and for
those applications which make a decision to use it instead of an alternate
method, the result I expect will be a net gain, until such time as we code up,
test and roll out a vsyscall solution
Quick hacks are frowned upon in the Linux universe. The kernel-user
space interface is supposed to be stable, and thus a hack like this has
to be maintained indefinitely.

Putting temporary hacks like this in is not a good idea.

-hpa
Neil Horman
2006-07-25 23:10:43 UTC
Permalink
Post by H. Peter Anvin
Post by Neil Horman
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.
I'm not arguing with you that adding a low res gettimeofday vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches. It has zero impact for systems which do not use it, and for
those applications which make a decision to use it instead of an alternate
method, the result I expect will be a net gain, until such time as we code up,
test and roll out a vsyscall solution
Quick hacks are frowned upon in the Linux universe. The kernel-user
space interface is supposed to be stable, and thus a hack like this has
to be maintained indefinitely.
Putting temporary hacks like this in is not a good idea.
Only if you make the mental leap that this is a hack; its not. Its a new
feature for a driver. mmap on device drivers is a well known and understood
interface. There is nothing hackish about it. And there is no need for it to
be temporary either. Why shouldn't the rtc driver be able to export a monotonic
counter via the mmap interface? mmtimer does it already, as do many other
drivers. Theres nothing unstable about this interface, and it need not be short
lived. It can live in perpituity, and applications can choose to use it, or
migrate away from it should something else more efficient become available (a
gettimeofday vsyscall). More importantly, it can continue to be used in those
situations where a vsyscall is not feasable, or simply maps to the nominal slow
path kernel trap that one would find to heavy-weight to use in comparison to an
mmaped page.

Neil
Post by H. Peter Anvin
-hpa
H. Peter Anvin
2006-07-25 23:22:31 UTC
Permalink
Post by Neil Horman
Post by H. Peter Anvin
Quick hacks are frowned upon in the Linux universe. The kernel-user
space interface is supposed to be stable, and thus a hack like this has
to be maintained indefinitely.
Putting temporary hacks like this in is not a good idea.
Only if you make the mental leap that this is a hack; its not. Its a new
feature for a driver. mmap on device drivers is a well known and understood
interface. There is nothing hackish about it. And there is no need for it to
be temporary either. Why shouldn't the rtc driver be able to export a monotonic
counter via the mmap interface? mmtimer does it already, as do many other
drivers. Theres nothing unstable about this interface, and it need not be short
lived. It can live in perpituity, and applications can choose to use it, or
migrate away from it should something else more efficient become available (a
gettimeofday vsyscall). More importantly, it can continue to be used in those
situations where a vsyscall is not feasable, or simply maps to the nominal slow
path kernel trap that one would find to heavy-weight to use in comparison to an
mmaped page.
The reason it is a hack is because you're hard-coding the fact that
you're taking a global, periodic interrupt. Yes, it can be dealt with
scheduler hacks in tickless case, but that seems really heavyweight.

-hpa
Neil Horman
2006-07-26 00:03:14 UTC
Permalink
Post by H. Peter Anvin
Post by Neil Horman
Post by H. Peter Anvin
Quick hacks are frowned upon in the Linux universe. The kernel-user
space interface is supposed to be stable, and thus a hack like this has
to be maintained indefinitely.
Putting temporary hacks like this in is not a good idea.
Only if you make the mental leap that this is a hack; its not. Its a new
feature for a driver. mmap on device drivers is a well known and understood
interface. There is nothing hackish about it. And there is no need for it to
be temporary either. Why shouldn't the rtc driver be able to export a monotonic
counter via the mmap interface? mmtimer does it already, as do many other
drivers. Theres nothing unstable about this interface, and it need not be short
lived. It can live in perpituity, and applications can choose to use it, or
migrate away from it should something else more efficient become available (a
gettimeofday vsyscall). More importantly, it can continue to be used in those
situations where a vsyscall is not feasable, or simply maps to the nominal slow
path kernel trap that one would find to heavy-weight to use in comparison to an
mmaped page.
The reason it is a hack is because you're hard-coding the fact that
you're taking a global, periodic interrupt. Yes, it can be dealt with
scheduler hacks in tickless case, but that seems really heavyweight.
I think that is an enormous overstatement.

My patch most certainly does not export that fact. The only thing it provides
to userspace is a regular monotonically increasing counter independent of any
userspace scheduling. The implementation using a regular interrupt is
completely hidden from userspace. The rtc driver itself is whats responsible
for the global periodic interrupt. By your logic the driver itself is a hack.

Honestly, this patch doesn't do any harm. Any application using it currently
creates the same interrupt behavior that it would if it used the mmap interface.
I think the only argument here is that applications using other timing
facilities would create additional interrupts, but given that those applications
are using interfaces with more overhead than this one, making the switch would
be a net gain.

Neil
Post by H. Peter Anvin
-hpa
David Lang
2006-07-25 23:29:27 UTC
Permalink
Post by Neil Horman
Post by H. Peter Anvin
Quick hacks are frowned upon in the Linux universe. The kernel-user
space interface is supposed to be stable, and thus a hack like this has
to be maintained indefinitely.
Putting temporary hacks like this in is not a good idea.
Only if you make the mental leap that this is a hack; its not. Its a new
feature for a driver. mmap on device drivers is a well known and understood
interface. There is nothing hackish about it. And there is no need for it to
be temporary either. Why shouldn't the rtc driver be able to export a monotonic
counter via the mmap interface? mmtimer does it already, as do many other
drivers. Theres nothing unstable about this interface, and it need not be short
lived. It can live in perpituity, and applications can choose to use it, or
migrate away from it should something else more efficient become available (a
gettimeofday vsyscall). More importantly, it can continue to be used in those
situations where a vsyscall is not feasable, or simply maps to the nominal slow
path kernel trap that one would find to heavy-weight to use in comparison to an
mmaped page.
given that this won't go into 2.6.18 at this point, isn't there time to figure
out the gettimeofday vsyscall before the 2.6.19 merge window? (in a month or
so). even if you have to wait until 2.6.20 it's unlikly that any apps could be
released with an interface to /dev/rtc rather then waiting a little bit for the
better interface.

David Lang
Neil Horman
2006-07-26 00:18:15 UTC
Permalink
Post by David Lang
Post by Neil Horman
Post by H. Peter Anvin
Quick hacks are frowned upon in the Linux universe. The kernel-user
space interface is supposed to be stable, and thus a hack like this has
to be maintained indefinitely.
Putting temporary hacks like this in is not a good idea.
Only if you make the mental leap that this is a hack; its not. Its a new
feature for a driver. mmap on device drivers is a well known and understood
interface. There is nothing hackish about it. And there is no need for it to
be temporary either. Why shouldn't the rtc driver be able to export a monotonic
counter via the mmap interface? mmtimer does it already, as do many other
drivers. Theres nothing unstable about this interface, and it need not be short
lived. It can live in perpituity, and applications can choose to use it, or
migrate away from it should something else more efficient become available (a
gettimeofday vsyscall). More importantly, it can continue to be used in those
situations where a vsyscall is not feasable, or simply maps to the nominal slow
path kernel trap that one would find to heavy-weight to use in comparison to an
mmaped page.
given that this won't go into 2.6.18 at this point, isn't there time to
figure out the gettimeofday vsyscall before the 2.6.19 merge window? (in a
month or so). even if you have to wait until 2.6.20 it's unlikly that any
apps could be released with an interface to /dev/rtc rather then waiting a
little bit for the better interface.
David Lang
My primary concern is my skill level. I normally work in the kernel, and I'm
not too familiar with glibc, and completely unfamiliar with vdso
implementations. I'm interested to do it, but I have no idea how long it will
take to understand vsyscall implementations, code one up, and get it right. If
you think a month is sufficient, I'll take your word for it, but I'm starting
from zero in this area.
Neil
Segher Boessenkool
2006-07-25 23:29:25 UTC
Permalink
Post by Neil Horman
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.
...in the short term.
Post by Neil Horman
I'm not arguing with you that adding a low res gettimeofday
vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches.
"All"?


Segher
Neil Horman
2006-07-25 23:56:44 UTC
Permalink
Post by Segher Boessenkool
Post by Neil Horman
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.
...in the short term.
And for any arch that isn't able to leverage a speedup via a vdso implementation
of a simmilar functionality in the long term
Post by Segher Boessenkool
Post by Neil Horman
I'm not arguing with you that adding a low res gettimeofday
vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches.
"All"?
It there any arch for which the rtc driver doesn't function?
Neil
Post by Segher Boessenkool
Segher
H. Peter Anvin
2006-07-26 00:02:31 UTC
Permalink
Post by Neil Horman
Post by Segher Boessenkool
Post by Neil Horman
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.
...in the short term.
And for any arch that isn't able to leverage a speedup via a vdso implementation
of a simmilar functionality in the long term
If they can't, then they can't use your driver either.
Post by Neil Horman
Post by Segher Boessenkool
Post by Neil Horman
I'm not arguing with you that adding a low res gettimeofday
vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches.
"All"?
It there any arch for which the rtc driver doesn't function?
Yes, there are plenty of systems which don't have an RTC, or have an RTC
which can't generate interrupts.

-hpa
Neil Horman
2006-07-26 00:20:43 UTC
Permalink
Post by H. Peter Anvin
Post by Neil Horman
Post by Segher Boessenkool
Post by Neil Horman
Yes, but if its in trade for something thats being used currently which hurts
more (case in point being the X server), using this solution is a net gain.
...in the short term.
And for any arch that isn't able to leverage a speedup via a vdso implementation
of a simmilar functionality in the long term
If they can't, then they can't use your driver either.
Whats your reasoning here?
Post by H. Peter Anvin
Post by Neil Horman
Post by Segher Boessenkool
Post by Neil Horman
I'm not arguing with you that adding a low res gettimeofday
vsyscall is a better
long term solution, but doing that requires potentially several implementations
in the C library accross a range of architectures, some of which may not be able
to provide a time solution any better than what the gettimeofday syscall
provides today. The /dev/rtc solution is easy, available right now, and applies
to all arches.
"All"?
It there any arch for which the rtc driver doesn't function?
Yes, there are plenty of systems which don't have an RTC, or have an RTC
which can't generate interrupts.
Ok, for those implementations which don't have an RTC that the rtc driver can
drive, the mmap functionality will not work, but at that point what interface
are you left with at all for obtaining periodic time?
Neil
Post by H. Peter Anvin
-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
H. Peter Anvin
2006-07-26 00:36:53 UTC
Permalink
Post by Neil Horman
Post by H. Peter Anvin
Post by Neil Horman
It there any arch for which the rtc driver doesn't function?
Yes, there are plenty of systems which don't have an RTC, or have an RTC
which can't generate interrupts.
Ok, for those implementations which don't have an RTC that the rtc driver can
drive, the mmap functionality will not work, but at that point what interface
are you left with at all for obtaining periodic time?
Depends completely on the hardware. Some hardware will rely on cycle
counters, some may rely on I/O devices which may or may not be mappable
into user space, and some will have to enter the kernel.

These aren't compatible with your programming model.

-hpa
Theodore Tso
2006-07-26 14:45:36 UTC
Permalink
Post by Neil Horman
Post by H. Peter Anvin
Yes, there are plenty of systems which don't have an RTC, or have an RTC
which can't generate interrupts.
Ok, for those implementations which don't have an RTC that the rtc driver can
drive, the mmap functionality will not work, but at that point what interface
are you left with at all for obtaining periodic time?
Well, the HPET, for one. My main problem with this interface is that
it is tied to the /dev/rtc, and the system may have any number of
timer hardware that may be more appropriate, and it shouldn't be up to
the user application to select which one.

But this does bring up an interesting coding paradigm which is used by
more than just the X server. As it turns out, there is a real-time
garbage collector[1] for Java that needs exactly the same thing, although
the resolution window is a few orders of magnitude faster than what X
needs. Fundamentally, this coding paradigm is:

while (work to do) {
do_a_bit_of_work();
if (we_have_exceeded_a_timeout_period())
break;
}
/* Clean up and let some other client/thread run */

So there are a couple of things to note about this high-level
abstracted paradigm. The application doesn't need to know _exactly_
how much time has passed, just whether or not the the appointed time
slice has expired (which might be 10ms or it might be 100us in the
case of the rt garbage collector). So calculating exactly how much
time has ellapsed is not necessary, and if there is a single-shot
event timer hardware available to the system, it might be sufficient.
So even if a VDSO implementation of gettimeofday() would be faster
than calling gettimeofday(), it still may be doing work that strictly
speaking doesn't need to happen; if the application doesn't need to
know exactly how many microseconds have gone by, but just whether or
not 150us has ellapsed, why calculate the necessary time? (Especially
if it requires using some ACPI interface...)

Secondly, it's different from a kernel-mediated secheduler timeslice
because the application needs to give up control only at certain
specifically defined stopping points (i.e., after copying a tiny
amount of live data in an incremental garbage collector design, or
after servicing a single X request, for example), and it may need to
do some cleanups. So it's often not possible to just say, well, put
it in its own thread, and let the scheduler handle it.

So maybe what we need is an interface where a particular memory
location gets incremented when a timeout has happened. It's probably
enough to say that each thread (task_struct) can have one of these
(another problem with using /dev/rtc and tieing it directly to
interrupts is that what happens if two processes want to use this
facility?), and what hardware timer source gets used is hidden from
the user application. In fact, depending on the resolution which is
specified (i.e., 100's of microseconds versus 10's of milliseconds),
different hardware might get used; we should leave that up to the
kernel.

The other thing which would be nice is if the application could
specify whether it is interested in CPU time or wall clock time for
this timeout.

If we had such an interface, then the application would look like
this:

volatile int flag = 0;

register_timout(&time_val, &flag);
while (work to do) {
do_a_bit_of_work();
if (flag)
break;
}

Finally, a note about tickless designs. Very often such applications
don't need a constantly ticking design. For example, the X server
only needs to have the memory location incremented while it is
processing events; if the laptop is idle, there's no reason to have
the RTC generating interrupts and incrementing memory locations.
Similarly, the Metronome garbage collector would only need to poll to
see if the timeout has expired while the garbage collector is running,
which is _not_ all of the time.

Yes, you could use ioctl's to start and stop the RTC interrupt
handler, but that's just ugly, and points out that maybe the interface
should not be one of programming the RTC interrupt frequency directly,
but rather one of "increment this flag after X units of
(CPU/wallclock) time, and I don't care how it is implemented at the
hardware level."

Regards,

- Ted

[1] http://www.research.ibm.com/people/d/dfb/papers/Bacon03Metronome.pdf
"The Metronome: A Simpler Approach to Garbage Collection in Real-time
Systems", by David Bacon, Perry Cheng, and V.T. Rajan, Workshop on
Java Technologies for Real-Time and Embedded Systems (Catania, Sicily,
November 2003. (See also http://www.research.ibm.com/metronome)
Steven Rostedt
2006-07-28 13:33:26 UTC
Permalink
Post by Theodore Tso
If we had such an interface, then the application would look like
volatile int flag = 0;
register_timout(&time_val, &flag);
while (work to do) {
do_a_bit_of_work();
if (flag)
break;
}
This wouldn't work simply because the timeout would most likely be
implemented with an interrupt, and the address of flag is in userspace,
so the interrupt handler couldn't modify it (without doing some sort of
single handling, and thus slow down what you want).

What you could have is this:

volatile int *flag;

register_timeout(&time_val, &flag);
while (work_to_do()) {
do_a_bit_of_work();
if (*flag)
break;
}

Where the kernel would register a location to set a timeout with, and
the kernel would setup a flag for you and then map it into userspace.
Perhaps only allow one flag per task and place it as a field of the task
structure. There's no reason that the tasks own task sturct cant be
mapped read only to user space, is there?

-- Steve
Theodore Tso
2006-07-28 14:52:10 UTC
Permalink
Post by Steven Rostedt
volatile int *flag;
register_timeout(&time_val, &flag);
while (work_to_do()) {
do_a_bit_of_work();
if (*flag)
break;
}
Where the kernel would register a location to set a timeout with, and
the kernel would setup a flag for you and then map it into userspace.
Perhaps only allow one flag per task and place it as a field of the task
structure. There's no reason that the tasks own task sturct cant be
mapped read only to user space, is there?
Good point, and limiting this facility to one such timeout per
task_struct seems like a reasonable restriction. The downsides I can
see about about mapping the tasks' own task struct would be (a) a
potential security leak either now or in the future if some field in
the task_struct shouldn't be visible to a non-privileged userspace
program, and (b) exposing the task_struct might cause some (stupid)
programs to depend on the task_struct layout. Allocating an otherwise
empty 4k page just for this purpose wouldn't be all that horrible,
though, and would avoid these potential problems.

- Ted
Steven Rostedt
2006-07-28 15:05:16 UTC
Permalink
Post by Theodore Tso
Good point, and limiting this facility to one such timeout per
task_struct seems like a reasonable restriction. The downsides I can
see about about mapping the tasks' own task struct would be (a) a
potential security leak either now or in the future if some field in
the task_struct shouldn't be visible to a non-privileged userspace
program, and (b) exposing the task_struct might cause some (stupid)
programs to depend on the task_struct layout. Allocating an otherwise
empty 4k page just for this purpose wouldn't be all that horrible,
though, and would avoid these potential problems.
Actually, if you are going to map a page, then allow the user to do
PAGE_SIZE / sizeof(*flag) timers. That way the user gets a single page
mapped for this purpose, and can have multiple flags.

I would only limit it to one page though. Since this page can not be
swapped out, if you allow for more than one page, a non privileged user
can map in a bunch of non swappable pages and might be able to perform a
DoS attack.

-- Steve
Alan Cox
2006-07-28 16:41:25 UTC
Permalink
Post by Theodore Tso
Good point, and limiting this facility to one such timeout per
task_struct seems like a reasonable restriction.
Why is this any better than using a thread or signal handler ? From the
implementation side its certainly horrible - we will be trying to write
user pages from an IRQ event. Far better to let the existing thread code
deal with it.
Steven Rostedt
2006-07-28 16:44:49 UTC
Permalink
Post by Alan Cox
Post by Theodore Tso
Good point, and limiting this facility to one such timeout per
task_struct seems like a reasonable restriction.
Why is this any better than using a thread or signal handler ? From the
implementation side its certainly horrible - we will be trying to write
user pages from an IRQ event. Far better to let the existing thread code
deal with it.
If the user page is special, in that it is really a kernel page mapped
to userspace. The implementation on making sure it doesn't disappear on
the interrupt isn't that difficult.

But for real-time applications, the signal handling has a huge latency.
Where as what Theodore wants to do is very light weight. ie. have a
high prio task doing smaller tasks until a specific time that tells it
to stop. Having a signal, would create the latency on having that task
stop.

These little requests make sense really only in the real-time space.
The normal uses can get by with signals. But I will say, the normal
uses for computing these days are starting to want the real-time
powers. :)

-- Steve
Alan Cox
2006-07-28 20:01:58 UTC
Permalink
Post by Steven Rostedt
But for real-time applications, the signal handling has a huge latency.
For real-time you want a thread. Our thread switching is extremely fast
and threads unlike signals can have RT priorities of their own
Steven Rostedt
2006-07-28 20:12:12 UTC
Permalink
Post by Alan Cox
Post by Steven Rostedt
But for real-time applications, the signal handling has a huge latency.
For real-time you want a thread. Our thread switching is extremely fast
and threads unlike signals can have RT priorities of their own
You mean to have a thread that does a nanosleep till the expected
timeout, then write some variable that the other high prio thread can
see that the timeout has expired?

Hmm, so that register_timeout can be implemented with at thread that
does a nanosleep then updates the flag.

The only problem is that the thread needs to go up to a higher priority
(perhaps the highest), which means that this can only be implemented
with special capabilities. Then again, pretty much all RT tasks are
special, and usually run with privileged capabilities.


There's also something else that would be a nice addition to the kernel
API. A sleep and wakeup that is implemented without signals. Similar to
what the kernel does with wake_up. That way you can sleep till another
process/thread is done with what it was doing and wake up the other task
when done, without the use of signals. Or is there something that
already does this?

-- Steve
Alan Cox
2006-07-28 20:36:02 UTC
Permalink
Post by Steven Rostedt
what the kernel does with wake_up. That way you can sleep till another
process/thread is done with what it was doing and wake up the other task
when done, without the use of signals. Or is there something that
already does this?
futex and sys5 semaphore both do this. The latter is very portable but a
bit less efficient.
Steven Rostedt
2006-07-28 20:31:16 UTC
Permalink
So what language does the above come from ;-) "Ar Gwe" "ysgrifennodd" ?
Post by Alan Cox
Post by Steven Rostedt
what the kernel does with wake_up. That way you can sleep till another
process/thread is done with what it was doing and wake up the other task
when done, without the use of signals. Or is there something that
already does this?
futex and sys5 semaphore both do this. The latter is very portable but a
bit less efficient.
semaphore is a bit awkward for this (I have implemented it for this type
of purpose and it really feels like a hack).

How can this be implemented with futex?? Let me make another scenario.
If you have a task sleeping and it needs to be woken when some other
task needs it (kind of like a kthread) but it doesn't know what task
will wake it. A futex is like a mutex where it has one owner, so you
can sleep till the owner awakes it, but you don't know who the owner is.

I really like the way the kernel has the wake_up_process function, and
it is vary handy to have for even user space. Right now the most common
way to do it is semaphores (yuck!) or signals. Both are very heavy and
I don't really see why a new interface can't be introduced. Yes, it
breaks portability, but it if becomes a standard, then others will port
to it (and maybe it will become a new POSIX standard :)

-- Steve
H. Peter Anvin
2006-07-28 17:11:25 UTC
Permalink
Post by Theodore Tso
So maybe what we need is an interface where a particular memory
location gets incremented when a timeout has happened. It's probably
enough to say that each thread (task_struct) can have one of these
(another problem with using /dev/rtc and tieing it directly to
interrupts is that what happens if two processes want to use this
facility?), and what hardware timer source gets used is hidden from
the user application. In fact, depending on the resolution which is
specified (i.e., 100's of microseconds versus 10's of milliseconds),
different hardware might get used; we should leave that up to the
kernel.
The other thing which would be nice is if the application could
specify whether it is interested in CPU time or wall clock time for
this timeout.
It seems to me that this still assumes that we need to take an interrupt
in the kernel. If so, why not just use the timers already present in
the kernel as opposed to polling gettimeofday()?

-hpa
Jim Gettys
2006-07-25 20:58:14 UTC
Permalink
Post by H. Peter Anvin
Post by H. Peter Anvin
The i386 vdso right now has only two entry points, as far as I can
system call and signal return.
There is no reason it couldn't have more than that. A
low-resolution
Post by H. Peter Anvin
and a high-resolution gettimeofday might be a good idea.
-hpa
Agreed. How about we take the /dev/rtc patch now (since its an added feature
that doesn't hurt anything if its not used, as far as tickless kernels go), and
I'll start working on doing gettimeofday in vdso for arches other than x86_64.
That will give the X guys what they wanted until such time until all the other
arches have a gettimeofday alternative that doesn't require kernel traps.
Some of us want/need both tickless and smart scheduling in the X server.

To give a bit of data: on my machine, a 1x1 rectangle can go almost 2
million rectangles/second; a 500x500 rectangle is only 2800/second. You
can see the tremendous variation (and this is on accelerated hardware;
the variation can in fact be much larger than this, if the operation has
to be done in software fall-backs).

This is why the X server needs to know the time so much, so cheaply; we
have to be able to tell how much time a given client has been using, and
it can't be computed from anything but the time; otherwise individual
clients can "starve" other clients, and interactive feel goes to pot.
- Jim
--
Jim Gettys
One Laptop Per Child
H. Peter Anvin
2006-07-25 21:04:31 UTC
Permalink
Post by Jim Gettys
Some of us want/need both tickless and smart scheduling in the X server.
To give a bit of data: on my machine, a 1x1 rectangle can go almost 2
million rectangles/second; a 500x500 rectangle is only 2800/second. You
can see the tremendous variation (and this is on accelerated hardware;
the variation can in fact be much larger than this, if the operation has
to be done in software fall-backs).
This is why the X server needs to know the time so much, so cheaply; we
have to be able to tell how much time a given client has been using, and
it can't be computed from anything but the time; otherwise individual
clients can "starve" other clients, and interactive feel goes to pot.
- Jim
That's why I'm suggesting adding a cheap, possibly low-res, gettimeofday
virtual system call in case there is no way for the kernel to provide
userspace with a cheap full-resolution gettimeofday. Obviously, if a
high-quality gettimeofday is available, then they can be linked together
by the kernel.

-hpa
Jim Gettys
2006-07-25 21:14:47 UTC
Permalink
Post by H. Peter Anvin
That's why I'm suggesting adding a cheap, possibly low-res, gettimeofday
virtual system call in case there is no way for the kernel to provide
userspace with a cheap full-resolution gettimeofday. Obviously, if a
high-quality gettimeofday is available, then they can be linked together
by the kernel.
Low res is fine: X Timestamps are 1 millisecond values, and wrap after a
few hundred days. What we do care about is monotonically increasing
values (until it wraps). On machines of the past, this was very
convenient; we'd just store a 32 bit value for clients to read, and not
bother with locking. I guess these days, you'd at least have to protect
the store with a memory barrier, maybe....

It was amusing years ago to find toolkit bugs after applications had
been up for that long (32 bits of milliseconds)... Yes, there are
applications and machines that stay up that long, really there are....

Regards,
- Jim
--
Jim Gettys
One Laptop Per Child
H. Peter Anvin
2006-07-25 21:18:32 UTC
Permalink
Post by Jim Gettys
Post by H. Peter Anvin
That's why I'm suggesting adding a cheap, possibly low-res, gettimeofday
virtual system call in case there is no way for the kernel to provide
userspace with a cheap full-resolution gettimeofday. Obviously, if a
high-quality gettimeofday is available, then they can be linked together
by the kernel.
Low res is fine: X Timestamps are 1 millisecond values, and wrap after a
few hundred days. What we do care about is monotonically increasing
values (until it wraps). On machines of the past, this was very
convenient; we'd just store a 32 bit value for clients to read, and not
bother with locking. I guess these days, you'd at least have to protect
the store with a memory barrier, maybe....
It was amusing years ago to find toolkit bugs after applications had
been up for that long (32 bits of milliseconds)... Yes, there are
applications and machines that stay up that long, really there are....
Do you need 1 ms resolution, or is 10 ms good enough?

-hpa
Jim Gettys
2006-07-25 21:39:01 UTC
Permalink
Keith's the expert (who wrote the smart scheduler): I'd take a wild ass
guess that 10ms is good enough.

Maybe people can keep him on the cc list this time...
- Jim
Post by H. Peter Anvin
Post by Jim Gettys
Post by H. Peter Anvin
That's why I'm suggesting adding a cheap, possibly low-res, gettimeofday
virtual system call in case there is no way for the kernel to provide
userspace with a cheap full-resolution gettimeofday. Obviously, if a
high-quality gettimeofday is available, then they can be linked together
by the kernel.
Low res is fine: X Timestamps are 1 millisecond values, and wrap after a
few hundred days. What we do care about is monotonically increasing
values (until it wraps). On machines of the past, this was very
convenient; we'd just store a 32 bit value for clients to read, and not
bother with locking. I guess these days, you'd at least have to protect
the store with a memory barrier, maybe....
It was amusing years ago to find toolkit bugs after applications had
been up for that long (32 bits of milliseconds)... Yes, there are
applications and machines that stay up that long, really there are....
Do you need 1 ms resolution, or is 10 ms good enough?
-hpa
--
Jim Gettys
One Laptop Per Child
Bill Huey (hui)
2006-07-29 04:28:20 UTC
Permalink
Post by Jim Gettys
Keith's the expert (who wrote the smart scheduler): I'd take a wild ass
guess that 10ms is good enough.
Maybe people can keep him on the cc list this time...
Not to poop on people's parade, but the last time I looked /dev/rtc was
a single instance device, right ? If this reasoning is true, then mplayer
and other apps that want to open it can't.

What's the story with this ?

bill
Neil Horman
2006-07-29 12:54:27 UTC
Permalink
Post by Bill Huey (hui)
Post by Jim Gettys
Keith's the expert (who wrote the smart scheduler): I'd take a wild ass
guess that 10ms is good enough.
Maybe people can keep him on the cc list this time...
Not to poop on people's parade, but the last time I looked /dev/rtc was
a single instance device, right ? If this reasoning is true, then mplayer
and other apps that want to open it can't.
What's the story with this ?
Its always been the case. Its hardware can only support one timer (or at least
one timer period), and as such multiple users would interefere with each other.

Regards
Neil
Post by Bill Huey (hui)
bill
Bill Huey (hui)
2006-07-29 20:41:07 UTC
Permalink
Post by Neil Horman
Post by Bill Huey (hui)
Not to poop on people's parade, but the last time I looked /dev/rtc was
a single instance device, right ? If this reasoning is true, then mplayer
and other apps that want to open it can't.
What's the story with this ?
Its always been the case. Its hardware can only support one timer (or at least
one timer period), and as such multiple users would interefere with each other.
Well, this points out a serious problem with doing an mmap extension to
/dev/rtc. It would be better to have a page mapped by another device like
/dev/jiffy_counter, or something like that rather than to overload the
/dev/rtc with that functionality. The semantic for this change is shadey
in the first place so you should consider that option. It basically means
that, if Xorg, adopts this interface, no userspace applications can get at
and use the 'rtc' for any existing purposes with X running (mplayer and
friends).

It's not a really usable interface for Keith's purposes because of this
and it is a serious problem. You might want to consider writting special
device to do this like I mentioned above instead of mmap extension to
'rtc'.

bill
Neil Horman
2006-07-29 21:43:34 UTC
Permalink
Post by Bill Huey (hui)
Post by Neil Horman
Post by Bill Huey (hui)
Not to poop on people's parade, but the last time I looked /dev/rtc was
a single instance device, right ? If this reasoning is true, then mplayer
and other apps that want to open it can't.
What's the story with this ?
Its always been the case. Its hardware can only support one timer (or at least
one timer period), and as such multiple users would interefere with each other.
Well, this points out a serious problem with doing an mmap extension to
/dev/rtc. It would be better to have a page mapped by another device like
Not really. The rtc driver can only have a single user regardless of weather or
not it has an mmap interface. using mmap just provides another mothod for
accessing the rtc.
Post by Bill Huey (hui)
/dev/jiffy_counter, or something like that rather than to overload the
/dev/rtc with that functionality. The semantic for this change is shadey
in the first place so you should consider that option. It basically means
that, if Xorg, adopts this interface, no userspace applications can get at
and use the 'rtc' for any existing purposes with X running (mplayer and
friends).
It's not a really usable interface for Keith's purposes because of this
and it is a serious problem. You might want to consider writting special
I think that was the consensus quite some time ago. :).
Post by Bill Huey (hui)
device to do this like I mentioned above instead of mmap extension to
'rtc'.
Sure, an mmaped jiffy counter would certainly be usefull. I think the only
thing left to be determined in this thread is if adding mmap to the rtc driver
has any merit regardless of any potential users (iow, would current users of
/dev/rtc find it helpful to have the rtc driver provide an mmap interface.)

Regards
Neil
Post by Bill Huey (hui)
bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Keith Packard
2006-07-29 22:45:19 UTC
Permalink
Post by Neil Horman
Sure, an mmaped jiffy counter would certainly be usefull. I think the only
thing left to be determined in this thread is if adding mmap to the rtc driver
has any merit regardless of any potential users (iow, would current users of
/dev/rtc find it helpful to have the rtc driver provide an mmap interface.)
A jiffy counter is sufficient for the X server; all I need is some
indication that time has passed with a resolution of 10 to 20 ms. I
check this after each X request is processed as that is the scheduling
granularity. An X request can range in time from .1us to 100 seconds, so
I really want to just check after each request rather than attempt some
heuristic.
--
***@intel.com
Edgar Toernig
2006-07-29 23:18:15 UTC
Permalink
Post by Keith Packard
A jiffy counter is sufficient for the X server; all I need is some
indication that time has passed with a resolution of 10 to 20 ms. I
check this after each X request is processed as that is the scheduling
granularity. An X request can range in time from .1us to 100 seconds, so
I really want to just check after each request rather than attempt some
heuristic.
That's exactly what the mmap interface of /dev/itimer does.
See: http://marc.theaimsgroup.com/?m=115412412427996

Example code:

volatile unsigned long *counter;
...
fd=open("/dev/itimer", O_RDWR);
write(fd, "20/1000\n", 8);
counter = mmap(0, sizeof(*counter), PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);

Now, "*counter" is incremented every 20 ms by the kernel.

Ciao, ET.
Edgar Toernig
2006-07-29 21:49:48 UTC
Permalink
Post by Bill Huey (hui)
Post by Neil Horman
Its always been the case. Its hardware can only support one timer (or at least
one timer period), and as such multiple users would interefere with each other.
Well, this points out a serious problem with doing an mmap extension to
/dev/rtc. It would be better to have a page mapped by another device like
/dev/jiffy_counter, or something like that rather than to overload the
/dev/rtc with that functionality.
You mean something like this, /dev/itimer?

http://marc.theaimsgroup.com/?m=115412412427996

Ciao, ET.
Bill Huey (hui)
2006-07-29 22:51:38 UTC
Permalink
Post by Edgar Toernig
Post by Bill Huey (hui)
Well, this points out a serious problem with doing an mmap extension to
/dev/rtc. It would be better to have a page mapped by another device like
/dev/jiffy_counter, or something like that rather than to overload the
/dev/rtc with that functionality.
You mean something like this, /dev/itimer?
http://marc.theaimsgroup.com/?m=115412412427996
[CCing Steve and Ingo on this thread]

It's a different topic than what Keith needs, but this is useful for another
set of purposes. It's something that's really useful in the RT patch since
there isn't a decent API to get at high resolution timers in userspace. What
you've written is something that I articulated to Steve Rostedt over a dinner
at OLS and is badly needed in the -rt patches IMO. I suggest targeting that
for some kind of inclusion to Ingo Molnar's patchset.

If itimer can be abstracted a bit so it serves more generically as a bidirection
communication pipe, not just to a timer (although it's good for now), but
possibly to bandwidth scheduler policies as a backend, then you have the
possibility of this driver being a real winner. The blocking read can be a
yield to get information on soft overruns for that allocation cycle and the
write can be an intelligent yield for when scheduling wheel wraps around to
soft skip a cycle or something. It'll depend on the semantics of the scheduling
policy.

Your driver can be used, extended, for many things that Linux userspace doesn't
have at this moment for proper RT programming and I suggest that you open up
a discussion with Ingo and friends at about it.

bill
Nicholas Miell
2006-07-29 23:35:51 UTC
Permalink
Post by Bill Huey (hui)
Post by Edgar Toernig
Post by Bill Huey (hui)
Well, this points out a serious problem with doing an mmap extension to
/dev/rtc. It would be better to have a page mapped by another device like
/dev/jiffy_counter, or something like that rather than to overload the
/dev/rtc with that functionality.
You mean something like this, /dev/itimer?
http://marc.theaimsgroup.com/?m=115412412427996
[CCing Steve and Ingo on this thread]
It's a different topic than what Keith needs, but this is useful for another
set of purposes. It's something that's really useful in the RT patch since
there isn't a decent API to get at high resolution timers in userspace. What
you've written is something that I articulated to Steve Rostedt over a dinner
at OLS and is badly needed in the -rt patches IMO. I suggest targeting that
for some kind of inclusion to Ingo Molnar's patchset.
Do you mind summarizing what's wrong with the existing interfaces for
those of us who didn't have the opportunity to join you for dinner at
OLS?
--
Nicholas Miell <***@comcast.net>
Bill Huey (hui)
2006-07-30 01:00:20 UTC
Permalink
Post by Nicholas Miell
Post by Bill Huey (hui)
[CCing Steve and Ingo on this thread]
It's a different topic than what Keith needs, but this is useful for another
set of purposes. It's something that's really useful in the RT patch since
there isn't a decent API to get at high resolution timers in userspace. What
you've written is something that I articulated to Steve Rostedt over a dinner
at OLS and is badly needed in the -rt patches IMO. I suggest targeting that
for some kind of inclusion to Ingo Molnar's patchset.
Do you mind summarizing what's wrong with the existing interfaces for
those of us who didn't have the opportunity to join you for dinner at
OLS?
Think edge triggered verse level triggered. Event interfaces in the Linux
kernel are sort of just that, edge triggered events. What RT folks generally
want is control over scheduling policies over a particular time period in
relation to a scheduling policy. A general kernel event interface isn't
going to cut it for those purpose and wasn't design to deal with those cases
in the first place.

bill
Nicholas Miell
2006-07-30 01:22:59 UTC
Permalink
Post by Bill Huey (hui)
Post by Nicholas Miell
Post by Bill Huey (hui)
[CCing Steve and Ingo on this thread]
It's a different topic than what Keith needs, but this is useful for another
set of purposes. It's something that's really useful in the RT patch since
there isn't a decent API to get at high resolution timers in userspace. What
you've written is something that I articulated to Steve Rostedt over a dinner
at OLS and is badly needed in the -rt patches IMO. I suggest targeting that
for some kind of inclusion to Ingo Molnar's patchset.
Do you mind summarizing what's wrong with the existing interfaces for
those of us who didn't have the opportunity to join you for dinner at
OLS?
Think edge triggered verse level triggered. Event interfaces in the Linux
kernel are sort of just that, edge triggered events. What RT folks generally
want is control over scheduling policies over a particular time period in
relation to a scheduling policy. A general kernel event interface isn't
^ Did you mean to say timer here?
Post by Bill Huey (hui)
going to cut it for those purpose and wasn't design to deal with those cases
in the first place.
So you're asking for an automatic (perhaps temporary) change in
scheduling policy when a particular timer expires (or perhaps on
occurrence of other types of events)?

I think Windows automatically boosts the priority of a thread when it
delivers an I/O completion notification, and I'm pretty sure that
Microsoft has a patent related to that.
--
Nicholas Miell <***@comcast.net>
Bill Huey (hui)
2006-07-30 01:39:36 UTC
Permalink
Post by Nicholas Miell
Post by Bill Huey (hui)
Think edge triggered verse level triggered. Event interfaces in the Linux
kernel are sort of just that, edge triggered events. What RT folks generally
want is control over scheduling policies over a particular time period in
relation to a scheduling policy. A general kernel event interface isn't
^ Did you mean to say timer here?
No, I really ment scheduling.
Post by Nicholas Miell
Post by Bill Huey (hui)
going to cut it for those purpose and wasn't design to deal with those cases
in the first place.
So you're asking for an automatic (perhaps temporary) change in
scheduling policy when a particular timer expires (or perhaps on
occurrence of other types of events)?
I think Windows automatically boosts the priority of a thread when it
delivers an I/O completion notification, and I'm pretty sure that
Microsoft has a patent related to that.
Na, different problem altogether. It's better that'd shut up.

bill
Nicholas Miell
2006-07-30 02:02:07 UTC
Permalink
Post by Bill Huey (hui)
Post by Nicholas Miell
Post by Bill Huey (hui)
Think edge triggered verse level triggered. Event interfaces in the Linux
kernel are sort of just that, edge triggered events. What RT folks generally
want is control over scheduling policies over a particular time period in
relation to a scheduling policy. A general kernel event interface isn't
^ Did you mean to say timer here?
No, I really ment scheduling.
OK, so what does control of a scheduling policy in relation to a
scheduling policy mean?
Post by Bill Huey (hui)
Post by Nicholas Miell
Post by Bill Huey (hui)
going to cut it for those purpose and wasn't design to deal with those cases
in the first place.
So you're asking for an automatic (perhaps temporary) change in
scheduling policy when a particular timer expires (or perhaps on
occurrence of other types of events)?
I think Windows automatically boosts the priority of a thread when it
delivers an I/O completion notification, and I'm pretty sure that
Microsoft has a patent related to that.
Na, different problem altogether. It's better that'd shut up.
I'm actually interested, and I imagine other people are too.
--
Nicholas Miell <***@comcast.net>
Theodore Tso
2006-07-30 14:33:42 UTC
Permalink
Post by Bill Huey (hui)
Post by Nicholas Miell
Post by Bill Huey (hui)
Think edge triggered verse level triggered. Event interfaces in the Linux
kernel are sort of just that, edge triggered events. What RT folks generally
want is control over scheduling policies over a particular time period in
relation to a scheduling policy. A general kernel event interface isn't
^ Did you mean to say timer here?
No, I really ment scheduling.
Bill,

Do you mean frequency-based scheduling? This was mentioned, IIRC, in
Gallmeister's book (Programming for the Real World, a must-read for
those interested in Posix real-time interfaces) as a likely extension
to the SCHED_RR/SCHED_FIFO scheduling policies and future additions to
the struct sched_policy used by sched_setparam() at some future point.


The basic idea here is that if you have some task which is cyclic in
nature, what might be useful would be to tell the scheduler that a
particular thread should be woken up every at a specific cyclic time;
and that thread promises it will only run for a certain amount of
time, and before that time expires, it will finish running. If it
doesn't, this is considered an overrun situation, and a number of
different things can happen at that point, including a signal which
might or might not kill the process, merely recording the event that
there was an overrun. It would be possible to have and soft and hard
overrun limits where you record the number and amount of time exceeded
of soft overruns, and upon a thread using up its promised time
quantuum plus the hard overrun limit, it gets a signal.

Since the scheduler knows when the cyclic tasks need to run, and how
much time they promise to take, in theory it might be able to do a
better job scheduling the threads, particularly if it knows that
certain threads can tolerate being scheduled earlier or later within
some time boundaries (which means even more fields in the struct
sched_policy). At least, that's the theory. The exact semantics of
what would actually be useful to application is I believe a little
unclear, and of course there is the question of whether there is
sufficient reason to try to do this as part of a system-wide
scheduler. Alternatively, it might be sufficient to do this sort of
thing at the application level across cooperating threads, in which
case it wouldn't be necessary to try to add this kind of complicated
scheduling gorp into the kernel.

In any case, I don't think this is particularly interesting to the X
folks, although there may very well be real-time applications that
would find this sort of thing useful.

Regards,

- Ted
Bill Huey (hui)
2006-07-30 22:20:52 UTC
Permalink
Post by Theodore Tso
Post by Bill Huey (hui)
No, I really ment scheduling.
Bill,
Do you mean frequency-based scheduling? This was mentioned, IIRC, in
Gallmeister's book (Programming for the Real World, a must-read for
those interested in Posix real-time interfaces) as a likely extension
to the SCHED_RR/SCHED_FIFO scheduling policies and future additions to
the struct sched_policy used by sched_setparam() at some future point.
Yes, I did.
Post by Theodore Tso
sched_policy). At least, that's the theory. The exact semantics of
what would actually be useful to application is I believe a little
unclear, and of course there is the question of whether there is
It's really up to the RT application to decide what it really wants. The
role of the kernel is to give it what it has requested within reason.
Post by Theodore Tso
sufficient reason to try to do this as part of a system-wide
scheduler. Alternatively, it might be sufficient to do this sort of
It's it's basic form yes. It's a complicated topic and frequency based
schedulers are only one type in a family of these schedulers. These kind
of scheduler are still research-ish in nature and there isn't a real way
of dealing with them in with regard to soft cycles effectively yet.

The control parameters to these systems vary from algorithm to algorithm
and they all have different control knobs outside of traditional Posix APIs.
People have written implementations based on EDF and stuff, but it seem
that folks can do a better job with scheduling decision if you had a thread
yield operation that was capable of telling the scheduler policy what to do
next with next cycle or chunk of time, especially for softer periods that
may give it's own allocated cycle to another process category. My
suggestion was that a modified 'itimer' could cover the semantic expression
of these kinds of schedulers, other kind of CPU bandwidth schedulers, as
well as be a replacement for '/dev/rtc' if it conformed that device's API.

The 'rtc' case would be a "harder", with respect time, expression of those
schedulers since that drive doesn't understand soft execution periods
and period of execution if strict. My terminology might need to be updated
or clarified and I'm open to that from others.

A new 'itimer' device with an extended API could also synchronously listen
to certain interrupts and deliver that as a latency critical event.
Another big topic of discussion.
Post by Theodore Tso
thing at the application level across cooperating threads, in which
case it wouldn't be necessary to try to add this kind of complicated
scheduling gorp into the kernel.
Scheduling policies are limited in Linux and that's probably going to
have to change in the future because of the RT patch and Xen, etc... Xen
is going to need a gang scheduler (think sleeping on a spin lock in a guest
OS).
Post by Theodore Tso
In any case, I don't think this is particularly interesting to the X
folks, although there may very well be real-time applications that
would find this sort of thing useful.
Right, the original topic has shifted. It's more interesting to me now. :)

bill
Theodore Tso
2006-07-31 15:40:25 UTC
Permalink
Post by Bill Huey (hui)
Post by Theodore Tso
sched_policy). At least, that's the theory. The exact semantics of
what would actually be useful to application is I believe a little
unclear, and of course there is the question of whether there is
It's really up to the RT application to decide what it really wants. The
role of the kernel is to give it what it has requested within reason.
Yes, but what is somewhat unclear is what knobs/parameters should be
made available to the application so it can clearly express what it
wants (i.e., can the thread be woken up early? late? How much leeway
should be allowed in terms of early/late triggering of the thread?
How does the scheduling of these cyclic threads interact with
non-cyclic SCHED_FIFO/SCHED_RR threads at other priorities? etc.) As
you say, this has been a somewhat researchy subject, and everyone has
different control knobs....
Post by Bill Huey (hui)
Post by Theodore Tso
In any case, I don't think this is particularly interesting to the X
folks, although there may very well be real-time applications that
would find this sort of thing useful.
Right, the original topic has shifted. It's more interesting to me now. :)
Heh. OK, so what are your requirements for this sort of feature, and
which application writers would be able to contribute their wishlist
for frequency-based scheduling? I will say that the RTSJ document
does define Java interfaces for FBS, so that's one possible user of
such a feature. But, I wouldn't want RTSJ to be the only thing
driving development of such a feature. (Also, I should mention that
it's not something we've been asked for up until now, so we haven't
paid much attention to up until now.)

- Ted
Edgar Toernig
2006-07-30 00:16:59 UTC
Permalink
Post by Bill Huey (hui)
Post by Edgar Toernig
You mean something like this, /dev/itimer?
http://marc.theaimsgroup.com/?m=115412412427996
[CCing Steve and Ingo on this thread]
It's a different topic than what Keith needs,
Hmm, actually, people with problems like Keith's are the target
audience, or at least were meant to be. See the mmap example
I posted in the original thread.
Post by Bill Huey (hui)
but this is useful for another set of purposes. It's something that's
really useful in the RT patch since there isn't a decent API to get at
high resolution timers in userspace.
The /dev/itimer wasn't meant for high resolution, only accurate and
reliable within the limits of the jiffy counter and easy to use. That
doesn't mean that it can't be improved to provide high resolution; only,
that this wasn't the design goal. But I think, that the API is good
enough to provide high resolution at any time without changing user
space code.

(IMHO most people consider a resolution of 1 ms to be "high enough".)
Post by Bill Huey (hui)
If itimer can be abstracted a bit so it serves more generically as a bidirection
communication pipe, not just to a timer (although it's good for now), but
possibly to bandwidth scheduler policies as a backend, then you have the
possibility of this driver being a real winner. The blocking read can be a
yield to get information on soft overruns for that allocation cycle and the
write can be an intelligent yield for when scheduling wheel wraps around to
soft skip a cycle or something. It'll depend on the semantics of the scheduling
policy.
Hm... I'm not sure what you mean. Sure, a blocking read may be a nice hint
to the scheduler because we know exactly how long we're gonna sleep. But
I think that a blocking read is used very seldom. Normally, the apps would
block via select/poll. And then the hints become looser - you only know
the latest time when the process definitely wants to run again.

Another scheduling hint could be the set interval. One could assume that
an app that sets an interval of 1/50th second does want to run regularly
every 1/50th second. But that may be hard to use for scheduling decisions,
especially when an app starts to use more than one timer.

Ciao, ET.
Bill Huey (hui)
2006-07-30 00:24:22 UTC
Permalink
Post by Edgar Toernig
Post by Bill Huey (hui)
It's a different topic than what Keith needs,
Hmm, actually, people with problems like Keith's are the target
audience, or at least were meant to be. See the mmap example
I posted in the original thread.
Post by Bill Huey (hui)
but this is useful for another set of purposes. It's something that's
really useful in the RT patch since there isn't a decent API to get at
high resolution timers in userspace.
The /dev/itimer wasn't meant for high resolution, only accurate and
reliable within the limits of the jiffy counter and easy to use. That
doesn't mean that it can't be improved to provide high resolution; only,
that this wasn't the design goal. But I think, that the API is good
enough to provide high resolution at any time without changing user
space code.
(IMHO most people consider a resolution of 1 ms to be "high enough".)
Have you thought about making it an 'rtc' replacement and getting it to
conform to the API of it to what ever degree makes sense ? then it would
be a general replacement for 'rtc' if it could be opened multipule times
(as with generic event interfaces) with different timing scenarios per
thread.
Post by Edgar Toernig
Hm... I'm not sure what you mean. Sure, a blocking read may be a nice hint
to the scheduler because we know exactly how long we're gonna sleep. But
I think that a blocking read is used very seldom. Normally, the apps would
block via select/poll. And then the hints become looser - you only know
the latest time when the process definitely wants to run again.
Another scheduling hint could be the set interval. One could assume that
an app that sets an interval of 1/50th second does want to run regularly
every 1/50th second. But that may be hard to use for scheduling decisions,
especially when an app starts to use more than one timer.
Don't worry about what I just said, really. The fact that this driver exists
make it possible for heavy modification of just about any sort.

bill
Thomas Gleixner
2006-07-29 14:02:50 UTC
Permalink
At Tue, 25 Jul 2006 17:39:01 -0400,
Post by Jim Gettys
Keith's the expert (who wrote the smart scheduler): I'd take a wild ass
guess that 10ms is good enough.
Maybe people can keep him on the cc list this time...
- Jim
I talked to Keith about this at OLS and we agreed that a coarse
counter (probably incremented along with jiffies) accessible via
a vsyscall would be enough. I'm looking into this in the next days.

tglx
Martin J. Bligh
2006-07-26 13:17:57 UTC
Permalink
Post by Neil Horman
Agreed. How about we take the /dev/rtc patch now (since its an added feature
that doesn't hurt anything if its not used, as far as tickless kernels go), and
I'll start working on doing gettimeofday in vdso for arches other than x86_64.
That will give the X guys what they wanted until such time until all the other
arches have a gettimeofday alternative that doesn't require kernel traps.
The timelag involved in rolling X into a distro and releasing
it means that we don't really need short term workarounds.
Introducing new userspace APIs is not something that
should be done casually.
john stultz
2006-08-02 03:54:19 UTC
Permalink
Post by Dave Airlie
I'm wondering why x86 doesn't have gettimeofday vDSO (does x86 have
proper vDSO support at all apart from sysenter?),
I know, I'm late to the party here. :)

Anyway, i386 doesn't have vDSO gettimeofday because its always been too
messy to do. Now that the clocksource bits are in, we can start to
really work on it.

I just uploaded my C4 release of the timekeeping code here:
http://sr71.net/~jstultz/tod/broken-out/

If you grab the following patches:
linux-2.6.18-rc3_timeofday-vsyscall-support_C4.patch
linux-2.6.18-rc3_timeofday-i386-vsyscall_C4.patch

They should apply to the current -git tree and then you can use the
following test to see an LD_PRELOAD demo (as real support needs glibc
changes).

http://sr71.net/~jstultz/tod/vsyscall-gtod_test_C4.tar.bz2


Only lightly tested, so beware, and I've only added support so far for
the TSC (so don't be surprised if you don't see a performance
improvement if you using a different clocksource).

thanks
-john
H. Peter Anvin
2006-08-02 04:26:23 UTC
Permalink
Post by john stultz
Only lightly tested, so beware, and I've only added support so far for
the TSC (so don't be surprised if you don't see a performance
improvement if you using a different clocksource).
We should be able to use HPET in userspace, too.

-hpa
john stultz
2006-08-02 04:34:52 UTC
Permalink
Post by H. Peter Anvin
Post by john stultz
Only lightly tested, so beware, and I've only added support so far for
the TSC (so don't be surprised if you don't see a performance
improvement if you using a different clocksource).
We should be able to use HPET in userspace, too.
Oh yes, HPET and Cyclone as well. It just requires mapping their mmio
page as user readable. I just haven't gotten to it yet. :)

-john

Segher Boessenkool
2006-07-25 23:26:07 UTC
Permalink
Post by Neil Horman
Post by Segher Boessenkool
But userland cannot know if there is a more efficient option to
use than this /dev/rtc way, without using VDSO/vsyscall.
Sure, but detecting if /dev/rtc via mmap is faster than
gettimeofday is an
orthogonal issue to having the choice in the first place.
No it's not. Userland can not detect things it doesn't know
about, and then when there is a great choice, it won't see it,
and use the 6000kW solution (or any other really bad thing)
instead.

Using the old old legacy stuff when there's nothing better around
is a fine idea; please just implement an x86 VDSO that does just
that. x86 is what you care about IIUC. Don't saddle up non-x86
systems that just happen to have a legacy RTC around, and perhaps
x86 systems that don't sanely expose their better interfaces, with
this quite suboptimal solution for years to come.


Segher
Neil Horman
2006-07-26 00:10:42 UTC
Permalink
Post by Segher Boessenkool
Post by Neil Horman
Post by Segher Boessenkool
But userland cannot know if there is a more efficient option to
use than this /dev/rtc way, without using VDSO/vsyscall.
Sure, but detecting if /dev/rtc via mmap is faster than
gettimeofday is an
orthogonal issue to having the choice in the first place.
No it's not. Userland can not detect things it doesn't know
about, and then when there is a great choice, it won't see it,
and use the 6000kW solution (or any other really bad thing)
instead.
You're right, it won't be easy for an application to detect if gettimeofday uses
a vdso that is more lightweight than a regular syscall, but it can measure how
much cpu a periodic call to gettimeofday uses vs. how much cpu a periodic rtc
interrupt uses. It can use that information to make an informed decision about
which interface to use. Alternatively, a package can be built with sane
defaults in mind (always use RTC vs. always use gettimeofday).
Post by Segher Boessenkool
Using the old old legacy stuff when there's nothing better around
is a fine idea; please just implement an x86 VDSO that does just
that. x86 is what you care about IIUC. Don't saddle up non-x86
systems that just happen to have a legacy RTC around, and perhaps
x86 systems that don't sanely expose their better interfaces, with
this quite suboptimal solution for years to come.
Yes, I intend to (I've got a steep learning curve, since I've not worked much
with glibc, and I've never implemented a vdso call before), but I think thats a
great idea. My point is, why not have both interfaces available? That way,
implementations which can't do any better via a vdso call can still get a
speedup through the legacy interface.

Neil
Post by Segher Boessenkool
Segher
Paul Mackerras
2006-07-25 20:03:59 UTC
Permalink
Post by H. Peter Anvin
Not really. This introduces a potentially very difficult support
user-visible interface. Consider a tickless kernel -- you might end up
taking tick interrupts ONLY to update this page, since you don't have
any way of knowing when userspace wants to look at it.
It's not that bad; if userspace is running, the cpu isn't idle, so
there isn't the motivation to go tickless on that cpu. In other
words, if every cpu has suspended ticks, then no cpu can be running
stuff that needs to look at this page.

Paul.
Segher Boessenkool
2006-07-25 23:27:46 UTC
Permalink
Post by Paul Mackerras
It's not that bad; if userspace is running, the cpu isn't idle, so
there isn't the motivation to go tickless on that cpu. In other
words, if every cpu has suspended ticks, then no cpu can be running
stuff that needs to look at this page.
If I read the patch correctly, none of those legacy RTC ticks
can ever be suspended though?


Segher
Neil Horman
2006-07-26 00:06:12 UTC
Permalink
Post by Segher Boessenkool
Post by Paul Mackerras
It's not that bad; if userspace is running, the cpu isn't idle, so
there isn't the motivation to go tickless on that cpu. In other
words, if every cpu has suspended ticks, then no cpu can be running
stuff that needs to look at this page.
If I read the patch correctly, none of those legacy RTC ticks
can ever be suspended though?
Of course they can. See rtc_do_ioctl, specifically the RTC_UIE_OFF and
RTC_PIE_OFF cases.
Neil
Post by Segher Boessenkool
Segher
Jim Gettys
2006-07-25 18:00:31 UTC
Permalink
Actually, it was Keith Packard who asked for this (and we've asked for
it before in the past).

I will note, that if my memory serves me right, the first X driver we
ever did (1984) had this feature.

Regards,
- Jim

("X is an exercise in avoiding system calls.")

P.S. my name is spelled "Gettys".
Post by Neil Horman
Hey-
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond resolution timer
in userspace, one that they could perhaps read from a memory mapped page. I was
right behind them and though that seemed like a reasonable request, so I've
taken a stab at it. This patch allows for a page to be mmaped from /dev/rtc
character interface, the first 4 bytes of which provide a regularly increasing
count, once every rtc interrupt. The frequency is of course controlled by the
regular ioctls provided by the rtc driver. I've done some basic testing on it,
and it seems to work well.
Thanks And Regards
Neil
Signed-off-by: Neil Horman
rtc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
1 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index 6e6a7c7..4ed673e 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -48,9 +48,10 @@
* CONFIG_HPET_EMULATE_RTC
* 1.12a Maciej W. Rozycki: Handle memory-mapped chips properly.
* 1.12ac Alan Cox: Allow read access to the day of week register
+ * 1.12b Neil Horman: Allow memory mapping of /dev/rtc
*/
-#define RTC_VERSION "1.12ac"
+#define RTC_VERSION "1.12b"
/*
* Note that *all* calls to CMOS_READ and CMOS_WRITE are done with
@@ -183,6 +184,8 @@ static int rtc_proc_open(struct inode *i
*/
static unsigned long rtc_status = 0; /* bitmapped status byte. */
static unsigned long rtc_freq = 0; /* Current periodic IRQ rate */
+#define BUF_SIZE (PAGE_SIZE/sizeof(unsigned long))
+static unsigned long rtc_irq_buf[BUF_SIZE] __attribute__ ((aligned (PAGE_SIZE)));
static unsigned long rtc_irq_data = 0; /* our output to the world */
static unsigned long rtc_max_user_freq = 64; /* > this, need CAP_SYS_RESOURCE */
@@ -230,6 +233,7 @@ static inline unsigned char rtc_is_updat
irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
+ unsigned long *count_ptr = (unsigned long *)rtc_irq_buf;
/*
* Can be an alarm interrupt, update complete interrupt,
* or a periodic interrupt. We store the status in the
@@ -265,6 +269,7 @@ irqreturn_t rtc_interrupt(int irq, void
kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
+ *count_ptr = (*count_ptr)++;
return IRQ_HANDLED;
}
#endif
@@ -389,6 +394,37 @@ static ssize_t rtc_read(struct file *fil
#endif
}
+static int rtc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ unsigned long rtc_addr;
+ unsigned long *count_ptr = rtc_irq_buf;
+
+ if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+ return -EINVAL;
+
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+
+ if (PAGE_SIZE > (1 << 16))
+ return -ENOSYS;
+
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+ rtc_addr = __pa(rtc_irq_buf);
+ rtc_addr &= ~(PAGE_SIZE - 1);
+ rtc_addr &= -1;
+
+ if (remap_pfn_range(vma, vma->vm_start, rtc_addr >> PAGE_SHIFT,
+ PAGE_SIZE, vma->vm_page_prot)) {
+ printk(KERN_ERR "remap_pfn_range failed in rtc.c\n");
+ return -EAGAIN;
+ }
+
+ *count_ptr = 0;
+ return 0;
+
+}
+
static int rtc_do_ioctl(unsigned int cmd, unsigned long arg, int kernel)
{
struct rtc_time wtime;
@@ -890,6 +926,7 @@ static const struct file_operations rtc_
.owner = THIS_MODULE,
.llseek = no_llseek,
.read = rtc_read,
+ .mmap = rtc_mmap,
#ifdef RTC_IRQ
.poll = rtc_poll,
#endif
#endif
+ memset(rtc_irq_buf,0,PAGE_SIZE);
+
(void) init_sysctl();
printk(KERN_INFO "Real Time Clock Driver v" RTC_VERSION "\n");
--
Jim Gettys
One Laptop Per Child
Neil Horman
2006-07-25 18:17:33 UTC
Permalink
Post by Jim Gettys
Actually, it was Keith Packard who asked for this (and we've asked for
it before in the past).
I will note, that if my memory serves me right, the first X driver we
ever did (1984) had this feature.
Regards,
- Jim
("X is an exercise in avoiding system calls.")
P.S. my name is spelled "Gettys".
Sorry, my bad (fat fingers).
Neil
Post by Jim Gettys
Post by Neil Horman
Hey-
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond resolution timer
in userspace, one that they could perhaps read from a memory mapped page. I was
right behind them and though that seemed like a reasonable request, so I've
taken a stab at it. This patch allows for a page to be mmaped from /dev/rtc
character interface, the first 4 bytes of which provide a regularly increasing
count, once every rtc interrupt. The frequency is of course controlled by the
regular ioctls provided by the rtc driver. I've done some basic testing on it,
and it seems to work well.
Thanks And Regards
Neil
Signed-off-by: Neil Horman
rtc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
1 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index 6e6a7c7..4ed673e 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -48,9 +48,10 @@
* CONFIG_HPET_EMULATE_RTC
* 1.12a Maciej W. Rozycki: Handle memory-mapped chips properly.
* 1.12ac Alan Cox: Allow read access to the day of week register
+ * 1.12b Neil Horman: Allow memory mapping of /dev/rtc
*/
-#define RTC_VERSION "1.12ac"
+#define RTC_VERSION "1.12b"
/*
* Note that *all* calls to CMOS_READ and CMOS_WRITE are done with
@@ -183,6 +184,8 @@ static int rtc_proc_open(struct inode *i
*/
static unsigned long rtc_status = 0; /* bitmapped status byte. */
static unsigned long rtc_freq = 0; /* Current periodic IRQ rate */
+#define BUF_SIZE (PAGE_SIZE/sizeof(unsigned long))
+static unsigned long rtc_irq_buf[BUF_SIZE] __attribute__ ((aligned (PAGE_SIZE)));
static unsigned long rtc_irq_data = 0; /* our output to the world */
static unsigned long rtc_max_user_freq = 64; /* > this, need CAP_SYS_RESOURCE */
@@ -230,6 +233,7 @@ static inline unsigned char rtc_is_updat
irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
+ unsigned long *count_ptr = (unsigned long *)rtc_irq_buf;
/*
* Can be an alarm interrupt, update complete interrupt,
* or a periodic interrupt. We store the status in the
@@ -265,6 +269,7 @@ irqreturn_t rtc_interrupt(int irq, void
kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
+ *count_ptr = (*count_ptr)++;
return IRQ_HANDLED;
}
#endif
@@ -389,6 +394,37 @@ static ssize_t rtc_read(struct file *fil
#endif
}
+static int rtc_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ unsigned long rtc_addr;
+ unsigned long *count_ptr = rtc_irq_buf;
+
+ if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+ return -EINVAL;
+
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+
+ if (PAGE_SIZE > (1 << 16))
+ return -ENOSYS;
+
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+ rtc_addr = __pa(rtc_irq_buf);
+ rtc_addr &= ~(PAGE_SIZE - 1);
+ rtc_addr &= -1;
+
+ if (remap_pfn_range(vma, vma->vm_start, rtc_addr >> PAGE_SHIFT,
+ PAGE_SIZE, vma->vm_page_prot)) {
+ printk(KERN_ERR "remap_pfn_range failed in rtc.c\n");
+ return -EAGAIN;
+ }
+
+ *count_ptr = 0;
+ return 0;
+
+}
+
static int rtc_do_ioctl(unsigned int cmd, unsigned long arg, int kernel)
{
struct rtc_time wtime;
@@ -890,6 +926,7 @@ static const struct file_operations rtc_
.owner = THIS_MODULE,
.llseek = no_llseek,
.read = rtc_read,
+ .mmap = rtc_mmap,
#ifdef RTC_IRQ
.poll = rtc_poll,
#endif
#endif
+ memset(rtc_irq_buf,0,PAGE_SIZE);
+
(void) init_sysctl();
printk(KERN_INFO "Real Time Clock Driver v" RTC_VERSION "\n");
--
Jim Gettys
One Laptop Per Child
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Brown, Len
2006-07-26 15:15:16 UTC
Permalink
Post by Theodore Tso
if the application doesn't need to
know exactly how many microseconds have gone by, but just whether or
not 150us has ellapsed, why calculate the necessary time? (Especially
if it requires using some ACPI interface...)
Yes, ACPI is involved in the boot-time enumeration of various timers
and counters. But at run-time; the use of any and all of them
(including the PM_TIMER supplied by ACPI hardware itself) could/should
appear generic to kernel users, who should not have to directly call
any routine with an "acpi" in it.

I believe that this is true today, and can/should stay true.

-Len
Andi Kleen
2006-07-26 15:16:53 UTC
Permalink
Post by Neil Horman
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond resolution timer
in userspace,
No, no, it's wrong. They should use gettimeofday and the kernel's job
is to make it fast enough that they can.

Or rather they likely shouldn't use gettimeofday, but clock_gettime()
with CLOCK_MONOTONIC instead to be independent of someone setting the
clock back.

Memory mapped counters are generally not flexible enough and there
are lots of reasons why the kernel might need to do special things
for time keeping. Don't expose them.

-Andi
Jim Gettys
2006-07-26 17:25:09 UTC
Permalink
Post by Andi Kleen
Post by Neil Horman
At OLS last week, During Dave Jones Userspace Sucks presentation, Jim
Geddys and some of the Xorg guys noted that they would be able to stop using gettimeofday
so frequently, if they had some other way to get a millisecond resolution timer
in userspace,
I agree with Andi here.
Post by Andi Kleen
No, no, it's wrong. They should use gettimeofday and the kernel's job
is to make it fast enough that they can.
Exactly. On modern machines, doing a procedure call to get the time (as
opposed to a system trap) is, I suspect, very tolerable. And who knows,
maybe a smart compiler inlines the procedure so it optimizes to just a
few instructions.

If behind the scenes there is a mapped page that is used to convey this
information efficiently, that's fine.

But I don't think it should be the application programmer's
responsibility to know of hackish solutions of mmapping particular
devices on particular OS hardware or software platforms. That's a
symptom of the disease, rather than a clean solution.
Post by Andi Kleen
Or rather they likely shouldn't use gettimeofday, but clock_gettime()
with CLOCK_MONOTONIC instead to be independent of someone setting the
clock back.
Turns out we already have code to handle the turn back case, but
monotonically increasing time is generally appreciated ;-).
Post by Andi Kleen
Memory mapped counters are generally not flexible enough and there
are lots of reasons why the kernel might need to do special things
for time keeping. Don't expose them.
Yup. I agree entirely.
Post by Andi Kleen
-Andi
--
Jim Gettys
One Laptop Per Child
Paul Mackerras
2006-07-27 23:53:49 UTC
Permalink
Post by Andi Kleen
No, no, it's wrong. They should use gettimeofday and the kernel's job
is to make it fast enough that they can.
Not necessarily - maybe gettimeofday's seconds + microseconds
representation is awkward for them to use, and some other kernel
interface would be more efficient for them to use, while being as easy
or easier for the kernel to compute. Jim, was that your point?

Paul.
Jim Gettys
2006-07-28 03:29:48 UTC
Permalink
The only awkward thing about the current interfaces is that you have to
go from seconds and microseconds, to milliseconds, but only really when
you represent time to X clients, which requires a bit of 64 bit of
math... It is true that since you have two values in the timeval
structure, the update might require some sort of locking, which could be
a performance lose; but there are other simple solutions to that (e.g.
simple ring representations where you rely on the store of an index
value to be atomic without requiring full locks and increment the index
after updating both values, but a simple memory barrier), but those
implementation tricks should be hidden behind an interface, and not
exposed to application programmers..

In theory, that conversion to milliseconds only actually has to be done
if the time is (significantly) different.

I can't forsee that this is a big deal on (most of) today's machines.
Last I looked, the CPU runs the same speed in kernel mode as user
mode ;-).

On the other hand, the idea of a one off Linux specific "oh, there is
this magic file you mmap, and then you can poke at a magic location",
strikes me as a one-off hack, and that Linux would be better off
spending the same effort to speed up the general interface (which might
very well do this mmap trick trick behind the scenes, as far as I'm
concerned).

The difference is one is a standard, well known interface, which to an
application programmer has very well defined semantics; the other, to be
honest, is a kludge, which may expose applications to too many details
of the hardware. For example, exact issues of cache coherency and
memory barriers differ between machines.
Regards,
- Jim


If it's to be a kludge, it might as well be a X driver kludge (which is
where we put it in the '80's).
Post by Paul Mackerras
Post by Andi Kleen
No, no, it's wrong. They should use gettimeofday and the kernel's job
is to make it fast enough that they can.
Not necessarily - maybe gettimeofday's seconds + microseconds
representation is awkward for them to use, and some other kernel
interface would be more efficient for them to use, while being as easy
or easier for the kernel to compute. Jim, was that your point?
Paul.
--
Jim Gettys
One Laptop Per Child
Neil Horman
2006-07-28 11:59:56 UTC
Permalink
Post by Jim Gettys
The only awkward thing about the current interfaces is that you have to
go from seconds and microseconds, to milliseconds, but only really when
you represent time to X clients, which requires a bit of 64 bit of
math... It is true that since you have two values in the timeval
structure, the update might require some sort of locking, which could be
a performance lose; but there are other simple solutions to that (e.g.
simple ring representations where you rely on the store of an index
value to be atomic without requiring full locks and increment the index
after updating both values, but a simple memory barrier), but those
implementation tricks should be hidden behind an interface, and not
exposed to application programmers..
In theory, that conversion to milliseconds only actually has to be done
if the time is (significantly) different.
I can't forsee that this is a big deal on (most of) today's machines.
Last I looked, the CPU runs the same speed in kernel mode as user
mode ;-).
On the other hand, the idea of a one off Linux specific "oh, there is
this magic file you mmap, and then you can poke at a magic location",
strikes me as a one-off hack, and that Linux would be better off
spending the same effort to speed up the general interface (which might
very well do this mmap trick trick behind the scenes, as far as I'm
concerned).
The difference is one is a standard, well known interface, which to an
application programmer has very well defined semantics; the other, to be
honest, is a kludge, which may expose applications to too many details
of the hardware. For example, exact issues of cache coherency and
memory barriers differ between machines.
Regards,
- Jim
If it's to be a kludge, it might as well be a X driver kludge (which is
where we put it in the '80's).
So, setting aside for the moment any potential usefullness to X, what about the
same question in the general sense? Is this a usefull interface to add to the
rtc driver in general, without consideration for what applications might use it?

Neil
Post by Jim Gettys
Post by Paul Mackerras
Post by Andi Kleen
No, no, it's wrong. They should use gettimeofday and the kernel's job
is to make it fast enough that they can.
Not necessarily - maybe gettimeofday's seconds + microseconds
representation is awkward for them to use, and some other kernel
interface would be more efficient for them to use, while being as easy
or easier for the kernel to compute. Jim, was that your point?
Paul.
--
Jim Gettys
One Laptop Per Child
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
l***@horizon.com
2006-07-29 06:56:12 UTC
Permalink
Post by Theodore Tso
If we had such an interface, then the application would look like
volatile int flag = 0;
register_timout(&time_val, &flag);
while (work to do) {
do_a_bit_of_work();
if (flag)
break;
}
Finally, a note about tickless designs. Very often such applications
don't need a constantly ticking design. For example, the X server
only needs to have the memory location incremented while it is
processing events; if the laptop is idle, there's no reason to have
the RTC generating interrupts and incrementing memory locations.
Similarly, the Metronome garbage collector would only need to poll to
see if the timeout has expired while the garbage collector is running,
which is _not_ all of the time.
Yes, you could use ioctl's to start and stop the RTC interrupt
handler, but that's just ugly, and points out that maybe the interface
should not be one of programming the RTC interrupt frequency directly,
but rather one of "increment this flag after X units of
(CPU/wallclock) time, and I don't care how it is implemented at the
hardware level."
Actually, unless you want the kernel to have to poll the timeout_flag
periodically, it's more like:

volatile bool timeout_flag = false, armed_flag = false;

register_timout(&time_val, &flag);
while (work to do) {
if (!armed_flag) {
rearm_timeout();
armed_flag = true;
}
do_a_bit_of_work();
if (timeout_flag) {
armed_flag = false;
timeout_flag = false;
break;
}
}

Personally, I use setitimer() for this. You can maintain the flags in
software, and be slightly lazy about disarming it. If you get a signal
while you shouldn't be armed, *then* disarm the timer in the kernel.
Likewise, when rearming, set the user-disarmed flag and chec if kernel-level
rearming is required.

volatile bool timeout_flag = false, armed_flag = false, sys_armed_flag = false;

void
sigalrm(int sig)
{
(void)sig;
if (!armed_flag) {
static const struct itimerval it_zero = {{0,0},{0,0}};
if (sys_armed_flag)
warn_unexpected_sigalrm();
setitimer(ITIMER_REAL, &it_zero, 0);

} else if (timeout_flag)
warn_gc_is_slow();
else
timeout_flag = true;
}

void
arm_timer()
{
static const struct itimerval it_interval = { time_val, time_val };

armed_flag = true;
if (!sys_armed_flag) {
setitimer(ITIMER_REAL, &it_interval, 0);
sys_armed_flag = true;
}
}

main_loop()
{
signal(SIGALRM, sigalrm);

while (work to do) {
arm_timer();
do_a_bit_of_work();
if (timeout_flag) {
gc();
armed_flag = false;
timeout_flag = false;
}
}
}

... where only do_a_bit_of_work can prompt the need for more gc() calls.
This really tries to minimize the number of system calls.
Robert Hancock
2006-07-29 18:29:46 UTC
Permalink
Post by Steven Rostedt
There's also something else that would be a nice addition to the kernel
API. A sleep and wakeup that is implemented without signals. Similar to
what the kernel does with wake_up. That way you can sleep till another
process/thread is done with what it was doing and wake up the other task
when done, without the use of signals. Or is there something that
already does this?
For between threads, this is what the POSIX pthread API is for
(pthread_cond_wait and friends) which is implemented using futexes these
days..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from ***@nospamshaw.ca
Home Page: http://www.roberthancock.com/
Continue reading on narkive:
Loading...