Discussion:
[PATCH] x86: ptrace: sign-extend eax with orig_eax>=0
Roland McGrath
2009-09-23 00:46:51 UTC
Permalink
The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on
32-bit task) behaves differently than a native 32-bit kernel. When
setting a register state of orig_eax>=0 and eax=-ERESTART* when the
debugged task is NOT on its way out of a 32-bit syscall, the task will
fail to do the syscall restart logic that it should do.

Test case available at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys-trap.c?cvsroot=systemtap

This happens because the 32-bit ptrace syscall sets eax=0xffffffff
when it sets orig_eax>=0. The resuming task will not sign-extend this
for the -ERESTART* check because TS_COMPAT is not set. (So the task
thinks it is restarting after a 64-bit syscall, not a 32-bit one.)

The fix is to have 32-bit ptrace calls sign-extend eax when orig_eax>=0.
The long comment in the change explains the scenarios and caveats fully.

Reported-by: ***@redhat.com
Signed-off-by: Roland McGrath <***@redhat.com>
Reviewed-by: Oleg Nesterov <***@redhat.com>
---
arch/x86/kernel/ptrace.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 8d7d5c9..ecb7a49 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1120,7 +1120,6 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
R32(edi, di);
R32(esi, si);
R32(ebp, bp);
- R32(eax, ax);
R32(eip, ip);
R32(esp, sp);

@@ -1130,6 +1129,60 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
* causes (long)orig_ax < 0 tests to fire correctly.
*/
regs->orig_ax = (long) (s32) value;
+
+ /*
+ * Whenever setting orig_eax to indicate a system call in
+ * progress, make sure an eax value set by the debugger gets
+ * sign-extended so that any ax == -ERESTART* tests fire
+ * correctly.
+ *
+ * When those tests (in handle_signal) are done directly
+ * after an actual 32-bit syscall, then TS_COMPAT is set and
+ * so syscall_get_error() does sign-extension. However, the
+ * debugger sometimes saves that state and then restores it
+ * later with the intent of picking up the old thread state
+ * that can be about to do syscall restart.
+ *
+ * When it's a 32-bit debugger, that truncates ax to 32 bits.
+ * If the debugger restores thread state and resumes after a
+ * ptrace stop when the child was not doing a new syscall, it
+ * will not have TS_COMPAT set to make syscall_get_error()
+ * notice and do the sign-extension.
+ *
+ * We can't have syscall_get_error() always sign-extend,
+ * since that's wrong for 64-bit syscalls. We want it to
+ * check TS_COMPAT rather than TIF_IA32 to avoid a false
+ * positive in the oddball case of a 32-bit task doing a
+ * syscall from a 64-bit code segment. In the "restored
+ * thread state" case, it has no way to know whether the
+ * restored state refers to a 32-bit or 64-bit syscall.
+ *
+ * So we can't win 'em all. We assume that if you are using
+ * a 32-bit debugger, you don't really care about arcane
+ * interference with a child trying to use 64-bit syscalls.
+ * (Just use a 64-bit debugger on it instead!) What we do
+ * here makes a 32-bit debugger fiddling a 32-bit task
+ * consistent with what happens on a native 32-bit kernel.
+ *
+ * NOTE! Since we have no similar logic in putreg(), we
+ * just expect a 64-bit debugger to save/restore the full
+ * 64 bits. If a 64-bit debugger were to treat a 32-bit
+ * task differently and save/restore only 32 bits per
+ * register, it would have to grok orig_eax >= 0 and know
+ * to sign-extend its saved eax when setting it as 64 bits.
+ */
+ if (regs->orig_ax >= 0)
+ regs->ax = (long) (s32) regs->ax;
+ break;
+
+ case offsetof(struct user32, regs.eax):
+ /*
+ * As above, for either order of setting both ax and orig_ax.
+ */
+ if (regs->orig_ax >= 0)
+ regs->ax = (long) (s32) value;
+ else
+ regs->ax = value;
break;

case offsetof(struct user32, regs.eflags):
Linus Torvalds
2009-09-23 01:31:16 UTC
Permalink
This post might be inappropriate. Click to display it.
Roland McGrath
2009-09-23 02:40:49 UTC
Permalink
This post might be inappropriate. Click to display it.
Roland McGrath
2009-09-23 06:18:34 UTC
Permalink
Here is the aforementioned other tack on this.

I said earlier that getreg() (i.e. 64-bit ptrace/core-dump fetches)
should sign-extend the low 32 bits of orig_ax up. But I've changed my
mind. It's true that today you can store 0xffffffff via either 64-bit
ptrace or 32-bit ptrace and then read back -1 via 64-bit ptrace. (This
wasn't always so, and so we can hope that no debugger really depends on
it.) What seems more important is that tracing and core dumps correctly
show the full orig_ax value incoming in %rax from userland, since
%rax=__NR_foo|(1UL<<32) behaves differently (i.e. -ENOSYS) than
%eax=_NR_foo in actual fact when user-mode does "syscall" with those
values. In a bogon case like that, you would like to have traces/dumps
tell you why the task is not making a proper syscall rather than lie
about what register bits it entered the kernel with.

Patches 1-3 change no ptrace-tests outcomes, i.e. don't regress on the
test cases that went with the original sign-extension changes. They
reintroduce e.g. the ability to blindly read and write back the whole
regset when at syscall-entry tracing with %rax=__NR_foo|(1UL<<32) and
have that fail with -ENOSYS as it would without tracing rather than
perturb the tracee to call sys_foo instead. (Not that this is useful.)

Patch 4 does Linus's fix for the outstanding bug. I've verified it works.


Thanks,
Roland
---
The following changes since commit 7fa07729e439a6184bd824746d06a49cca553f15:
Linus Torvalds (1):
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland.git x86/orig_ax

Roland McGrath (4):
asm-generic: syscall_get_nr returns int
x86: syscall_get_nr returns int
x86: ptrace: do not sign-extend orig_ax on write
x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0

arch/x86/include/asm/syscall.h | 14 +++++++-------
arch/x86/kernel/ptrace.c | 21 ++++++++-------------
include/asm-generic/syscall.h | 8 ++++++--
3 files changed, 21 insertions(+), 22 deletions(-)
Roland McGrath
2009-09-23 06:19:49 UTC
Permalink
Only 32 bits of system call number are meaningful, so make the
specification for syscall_get_nr() be to return int, not long.

Signed-off-by: Roland McGrath <***@redhat.com>
---
include/asm-generic/syscall.h | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index ea8087b..5c122ae 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -1,7 +1,7 @@
/*
* Access to user system call parameters and results
*
- * Copyright (C) 2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2008-2009 Red Hat, Inc. All rights reserved.
*
* This copyrighted material is made available to anyone wishing to use,
* modify, copy, or redistribute it subject to the terms and conditions
@@ -32,9 +32,13 @@ struct pt_regs;
* If @task is not executing a system call, i.e. it's blocked
* inside the kernel for a fault or signal, returns -1.
*
+ * Note this returns int even on 64-bit machines. Only 32 bits of
+ * system call number can be meaningful. If the actual arch value
+ * is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
+ *
* It's only valid to call this when @task is known to be blocked.
*/
-long syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
+int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);

/**
* syscall_rollback - roll back registers after an aborted system call
Roland McGrath
2009-09-23 06:20:15 UTC
Permalink
Make syscall_get_nr() return int, so we always sign-extend
the low 32 bits of orig_ax in checks.

Signed-off-by: Roland McGrath <***@redhat.com>
---
arch/x86/include/asm/syscall.h | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index d82f39b..8d33bc5 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -1,7 +1,7 @@
/*
* Access to user system call parameters and results
*
- * Copyright (C) 2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2008-2009 Red Hat, Inc. All rights reserved.
*
* This copyrighted material is made available to anyone wishing to use,
* modify, copy, or redistribute it subject to the terms and conditions
@@ -16,13 +16,13 @@
#include <linux/sched.h>
#include <linux/err.h>

-static inline long syscall_get_nr(struct task_struct *task,
- struct pt_regs *regs)
+/*
+ * Only the low 32 bits of orig_ax are meaningful, so we return int.
+ * This importantly ignores the high bits on 64-bit, so comparisons
+ * sign-extend the low 32 bits.
+ */
+static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
{
- /*
- * We always sign-extend a -1 value being set here,
- * so this is always either -1L or a syscall number.
- */
return regs->orig_ax;
}
Roland McGrath
2009-09-23 06:20:45 UTC
Permalink
The high 32 bits of orig_ax will be ignored when it matters,
so don't fiddle them when setting it.

Signed-off-by: Roland McGrath <***@redhat.com>
---
arch/x86/kernel/ptrace.c | 19 +------------------
1 files changed, 1 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 8d7d5c9..52222fa 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -325,16 +325,6 @@ static int putreg(struct task_struct *child,
return set_flags(child, value);

#ifdef CONFIG_X86_64
- /*
- * Orig_ax is really just a flag with small positive and
- * negative values, so make sure to always sign-extend it
- * from 32 bits so that it works correctly regardless of
- * whether we come from a 32-bit environment or not.
- */
- case offsetof(struct user_regs_struct, orig_ax):
- value = (long) (s32) value;
- break;
-
case offsetof(struct user_regs_struct,fs_base):
if (value >= TASK_SIZE_OF(child))
return -EIO;
@@ -1121,17 +1111,10 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
R32(esi, si);
R32(ebp, bp);
R32(eax, ax);
+ R32(orig_eax, orig_ax);
R32(eip, ip);
R32(esp, sp);

- case offsetof(struct user32, regs.orig_eax):
- /*
- * Sign-extend the value so that orig_eax = -1
- * causes (long)orig_ax < 0 tests to fire correctly.
- */
- regs->orig_ax = (long) (s32) value;
- break;
-
case offsetof(struct user32, regs.eflags):
return set_flags(child, value);
Roland McGrath
2009-09-23 06:21:15 UTC
Permalink
The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on
32-bit task) behaves differently than a native 32-bit kernel. When
setting a register state of orig_eax>=0 and eax=-ERESTART* when the
debugged task is NOT on its way out of a 32-bit syscall, the task will
fail to do the syscall restart logic that it should do.

Test case available at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys-trap.c?cvsroot=systemtap

This happens because the 32-bit ptrace syscall sets eax=0xffffffff
when it sets orig_eax>=0. The resuming task will not sign-extend this
for the -ERESTART* check because TS_COMPAT is not set. (So the task
thinks it is restarting after a 64-bit syscall, not a 32-bit one.)

The fix is to have 32-bit ptrace calls set TS_COMPAT when setting
orig_eax>=0. This ensures that the 32-bit syscall restart logic
will apply when the child resumes.

Signed-off-by: Roland McGrath <***@redhat.com>
---
arch/x86/kernel/ptrace.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 52222fa..7b058a2 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1111,10 +1111,22 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 value)
R32(esi, si);
R32(ebp, bp);
R32(eax, ax);
- R32(orig_eax, orig_ax);
R32(eip, ip);
R32(esp, sp);

+ case offsetof(struct user32, regs.orig_eax):
+ /*
+ * A 32-bit debugger setting orig_eax means to restore
+ * the state of the task restarting a 32-bit syscall.
+ * Make sure we interpret the -ERESTART* codes correctly
+ * in case the task is not actually still sitting at the
+ * exit from a 32-bit syscall with TS_COMPAT still set.
+ */
+ regs->orig_ax = value;
+ if (syscall_get_nr(child, regs) >= 0)
+ task_thread_info(child)->status |= TS_COMPAT;
+ break;
+
case offsetof(struct user32, regs.eflags):
return set_flags(child, value);
Ingo Molnar
2009-09-23 15:41:18 UTC
Permalink
Post by Roland McGrath
Here is the aforementioned other tack on this.
I said earlier that getreg() (i.e. 64-bit ptrace/core-dump fetches)
should sign-extend the low 32 bits of orig_ax up. But I've changed my
mind. It's true that today you can store 0xffffffff via either 64-bit
ptrace or 32-bit ptrace and then read back -1 via 64-bit ptrace. (This
wasn't always so, and so we can hope that no debugger really depends on
it.) What seems more important is that tracing and core dumps correctly
show the full orig_ax value incoming in %rax from userland, since
%rax=__NR_foo|(1UL<<32) behaves differently (i.e. -ENOSYS) than
%eax=_NR_foo in actual fact when user-mode does "syscall" with those
values. In a bogon case like that, you would like to have traces/dumps
tell you why the task is not making a proper syscall rather than lie
about what register bits it entered the kernel with.
Patches 1-3 change no ptrace-tests outcomes, i.e. don't regress on the
test cases that went with the original sign-extension changes. They
reintroduce e.g. the ability to blindly read and write back the whole
regset when at syscall-entry tracing with %rax=__NR_foo|(1UL<<32) and
have that fail with -ENOSYS as it would without tracing rather than
perturb the tracee to call sys_foo instead. (Not that this is useful.)
Patch 4 does Linus's fix for the outstanding bug. I've verified it works.
Thanks,
Roland
---
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland.git x86/orig_ax
asm-generic: syscall_get_nr returns int
x86: syscall_get_nr returns int
x86: ptrace: do not sign-extend orig_ax on write
x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0
arch/x86/include/asm/syscall.h | 14 +++++++-------
arch/x86/kernel/ptrace.c | 21 ++++++++-------------
include/asm-generic/syscall.h | 8 ++++++--
3 files changed, 21 insertions(+), 22 deletions(-)
Linus, this series looks good to me. Do you want to pull this directly
or should we test this for a few days in the x86 tree? (either solution
is fine to me)

Ingo
Linus Torvalds
2009-09-23 15:53:21 UTC
Permalink
Post by Ingo Molnar
Linus, this series looks good to me. Do you want to pull this directly
or should we test this for a few days in the x86 tree? (either solution
is fine to me)
I already pulled it, since I'm used to pulling ptrace fixes from Roland,

Linus
Ingo Molnar
2009-09-23 16:21:47 UTC
Permalink
Post by Linus Torvalds
Post by Ingo Molnar
Linus, this series looks good to me. Do you want to pull this
directly or should we test this for a few days in the x86 tree?
(either solution is fine to me)
I already pulled it, since I'm used to pulling ptrace fixes from Roland,
Ok, thanks!

Ingo

Loading...