Discussion:
[PATCHSET] freezer: fix various bugs and simplify implementation
Tejun Heo
2011-08-19 14:16:06 UTC
Permalink
Hello,

The freezer code has developed a number of convolutions and bugs.
It's now using five per-task flags - TIF_FREEZE, PF_FREEZING,
PF_NOFREEZE, PF_FROZEN, PF_FREEZER_SKIP and PF_FREEZER_NOSIG, and at
the same time has quite a few race conditions. PF_NOFREEZE
modifications can race against PM freezer, cgroup_freezer can race
against PM freezer, and so on.

This patchset tries to simplify the freezer implementation and fix the
various bugs. It makes the synchronization more straight forward and
replaces TIF_FREEZE with directly checking freeze conditions which are
in effect, which makes the whole thing much saner.

This patchset removes TIF_FREEZE and PF_FREEZING. Also,
PF_FREEZER_SKIP users are planned to move away from the flag and will
be removed. It contains the following 16 patches.

0001-freezer-fix-current-state-restoration-race-in-refrig.patch
0002-freezer-don-t-unnecessarily-set-PF_NOFREEZE-explicit.patch
0003-freezer-unexport-refrigerator-and-update-try_to_free.patch
0004-freezer-implement-and-use-kthread_freezable_should_s.patch
0005-freezer-rename-thaw_process-to-__thaw_task-and-simpl.patch
0006-freezer-make-exiting-tasks-properly-unfreezable.patch
0007-freezer-don-t-distinguish-nosig-tasks-on-thaw.patch
0008-freezer-use-dedicated-lock-instead-of-task_lock-memo.patch
0009-freezer-make-freezing-indicate-freeze-condition-in-e.patch
0010-freezer-fix-set_freezable-_with_signal-race.patch
0011-freezer-kill-PF_FREEZING.patch
0012-freezer-clean-up-freeze_processes-failure-path.patch
0013-cgroup_freezer-prepare-for-removal-of-TIF_FREEZE.patch
0014-freezer-make-freezing-test-freeze-conditions-in-effe.patch
0015-freezer-remove-now-unused-TIF_FREEZE.patch
0016-freezer-remove-should_send_signal-and-update-frozen.patch

This patchset is on top of the current linus#master (01b883358b "Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc") and
available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer

diffstat follows.

Documentation/power/freezing-of-tasks.txt | 14 +-
arch/alpha/include/asm/thread_info.h | 2
arch/arm/include/asm/thread_info.h | 2
arch/avr32/include/asm/thread_info.h | 2
arch/blackfin/include/asm/thread_info.h | 2
arch/cris/include/asm/thread_info.h | 2
arch/frv/include/asm/thread_info.h | 2
arch/h8300/include/asm/thread_info.h | 2
arch/ia64/include/asm/thread_info.h | 2
arch/m32r/include/asm/thread_info.h | 2
arch/m68k/include/asm/thread_info.h | 1
arch/microblaze/include/asm/thread_info.h | 2
arch/mips/include/asm/thread_info.h | 2
arch/mn10300/include/asm/thread_info.h | 2
arch/parisc/include/asm/thread_info.h | 2
arch/powerpc/include/asm/thread_info.h | 2
arch/s390/include/asm/thread_info.h | 2
arch/sh/include/asm/thread_info.h | 2
arch/sparc/include/asm/thread_info_32.h | 2
arch/sparc/include/asm/thread_info_64.h | 2
arch/um/include/asm/thread_info.h | 2
arch/unicore32/include/asm/thread_info.h | 2
arch/x86/include/asm/thread_info.h | 2
arch/xtensa/include/asm/thread_info.h | 2
drivers/bluetooth/btmrvl_main.c | 2
drivers/mfd/twl4030-irq.c | 3
drivers/mfd/twl6030-irq.c | 2
drivers/net/irda/stir4200.c | 2
drivers/platform/x86/thinkpad_acpi.c | 15 +-
drivers/staging/rts_pstor/rtsx.c | 2
fs/btrfs/async-thread.c | 2
fs/btrfs/disk-io.c | 8 -
fs/ext4/super.c | 3
fs/fs-writeback.c | 4
fs/gfs2/log.c | 4
fs/gfs2/quota.c | 4
fs/jbd/journal.c | 2
fs/jbd2/journal.c | 2
fs/jfs/jfs_logmgr.c | 2
fs/jfs/jfs_txnmgr.c | 4
fs/nilfs2/segment.c | 2
fs/xfs/linux-2.6/xfs_buf.c | 2
include/linux/freezer.h | 78 +++++--------
include/linux/kthread.h | 1
include/linux/sched.h | 3
kernel/cgroup_freezer.c | 51 +++-----
kernel/exit.c | 8 +
kernel/fork.c | 1
kernel/freezer.c | 179 +++++++++++++++++-------------
kernel/kthread.c | 25 ++++
kernel/power/hibernate.c | 15 --
kernel/power/process.c | 65 +++-------
kernel/power/user.c | 4
mm/backing-dev.c | 8 -
54 files changed, 247 insertions(+), 315 deletions(-)

Thanks.

--
tejun
Tejun Heo
2011-08-19 14:16:09 UTC
Permalink
There is no reason to export two functions for entering the
refrigerator. Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
__refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo <***@kernel.org>
Cc: Samuel Ortiz <***@sortiz.org>
Cc: Chris Mason <***@oracle.com>
Cc: "Theodore Ts'o" <***@mit.edu>
Cc: Steven Whitehouse <***@redhat.com>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Jan Kara <***@suse.cz>
Cc: KONISHI Ryusuke <***@lab.ntt.co.jp>
Cc: Christoph Hellwig <***@infradead.org>
---
Documentation/power/freezing-of-tasks.txt | 12 ++++++------
drivers/net/irda/stir4200.c | 2 +-
fs/btrfs/async-thread.c | 2 +-
fs/btrfs/disk-io.c | 8 ++------
fs/ext4/super.c | 3 +--
fs/gfs2/log.c | 4 ++--
fs/gfs2/quota.c | 4 ++--
fs/jbd/journal.c | 2 +-
fs/jbd2/journal.c | 2 +-
fs/jfs/jfs_logmgr.c | 2 +-
fs/jfs/jfs_txnmgr.c | 4 ++--
fs/nilfs2/segment.c | 2 +-
fs/xfs/linux-2.6/xfs_buf.c | 2 +-
include/linux/freezer.h | 17 ++++++++---------
kernel/freezer.c | 10 +++++++---
15 files changed, 37 insertions(+), 39 deletions(-)

diff --git a/Documentation/power/freezing-of-tasks.txt b/Documentation/power/freezing-of-tasks.txt
index 710c965..cbea897 100644
--- a/Documentation/power/freezing-of-tasks.txt
+++ b/Documentation/power/freezing-of-tasks.txt
@@ -21,7 +21,7 @@ freeze_processes() (defined in kernel/power/process.c) is called. It executes
try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and
either wakes them up, if they are kernel threads, or sends fake signals to them,
if they are user space processes. A task that has TIF_FREEZE set, should react
-to it by calling the function called refrigerator() (defined in
+to it by calling the function called __refrigerator() (defined in
kernel/power/process.c), which sets the task's PF_FROZEN flag, changes its state
to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is cleared for it.
Then, we say that the task is 'frozen' and therefore the set of functions
@@ -29,10 +29,10 @@ handling this mechanism is referred to as 'the freezer' (these functions are
defined in kernel/power/process.c and include/linux/freezer.h). User space
processes are generally frozen before kernel threads.

-It is not recommended to call refrigerator() directly. Instead, it is
-recommended to use the try_to_freeze() function (defined in
-include/linux/freezer.h), that checks the task's TIF_FREEZE flag and makes the
-task enter refrigerator() if the flag is set.
+__refrigerator() must not be called directly. Instead, use the
+try_to_freeze() function (defined in include/linux/freezer.h), that checks
+the task's TIF_FREEZE flag and makes the task enter __refrigerator() if the
+flag is set.

For user space processes try_to_freeze() is called automatically from the
signal-handling code, but the freezable kernel threads need to call it
@@ -61,7 +61,7 @@ wait_event_freezable() and wait_event_freezable_timeout() macros.
After the system memory state has been restored from a hibernation image and
devices have been reinitialized, the function thaw_processes() is called in
order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
-have been frozen leave refrigerator() and continue running.
+have been frozen leave __refrigerator() and continue running.

III. Which kernel threads are freezable?

diff --git a/drivers/net/irda/stir4200.c b/drivers/net/irda/stir4200.c
index 41c96b3..e880c79 100644
--- a/drivers/net/irda/stir4200.c
+++ b/drivers/net/irda/stir4200.c
@@ -750,7 +750,7 @@ static int stir_transmit_thread(void *arg)

write_reg(stir, REG_CTRL1, CTRL1_TXPWD|CTRL1_RXPWD);

- refrigerator();
+ try_to_freeze();

if (change_speed(stir, stir->speed))
break;
diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 7ec1409..98ab240 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -340,7 +340,7 @@ again:
if (freezing(current)) {
worker->working = 0;
spin_unlock_irq(&worker->lock);
- refrigerator();
+ try_to_freeze();
} else {
spin_unlock_irq(&worker->lock);
if (!kthread_should_stop()) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b3ac6..64b1c07 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1508,9 +1508,7 @@ static int cleaner_kthread(void *arg)
btrfs_run_defrag_inodes(root->fs_info);
}

- if (freezing(current)) {
- refrigerator();
- } else {
+ if (!try_to_freeze()) {
set_current_state(TASK_INTERRUPTIBLE);
if (!kthread_should_stop())
schedule();
@@ -1564,9 +1562,7 @@ sleep:
wake_up_process(root->fs_info->cleaner_kthread);
mutex_unlock(&root->fs_info->transaction_kthread_mutex);

- if (freezing(current)) {
- refrigerator();
- } else {
+ if (!try_to_freeze()) {
set_current_state(TASK_INTERRUPTIBLE);
if (!kthread_should_stop() &&
!btrfs_transaction_blocked(root->fs_info))
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4687fea..c5e2744 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2854,8 +2854,7 @@ cont_thread:
}
mutex_unlock(&eli->li_list_mtx);

- if (freezing(current))
- refrigerator();
+ try_to_freeze();

cur = jiffies;
if ((time_after_eq(cur, next_wakeup)) ||
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 85c6292..b342c71 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -951,8 +951,8 @@ int gfs2_logd(void *data)
wake_up(&sdp->sd_log_waitq);

t = gfs2_tune_get(sdp, gt_logd_secs) * HZ;
- if (freezing(current))
- refrigerator();
+
+ try_to_freeze();

do {
prepare_to_wait(&sdp->sd_logd_waitq, &wait,
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 42e8d23..2ccaaac 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1431,8 +1431,8 @@ int gfs2_quotad(void *data)
/* Check for & recover partially truncated inodes */
quotad_check_trunc_list(sdp);

- if (freezing(current))
- refrigerator();
+ try_to_freeze();
+
t = min(quotad_timeo, statfs_timeo);

prepare_to_wait(&sdp->sd_quota_wait, &wait, TASK_INTERRUPTIBLE);
diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 9fe061f..908fdf0 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -166,7 +166,7 @@ loop:
*/
jbd_debug(1, "Now suspending kjournald\n");
spin_unlock(&journal->j_state_lock);
- refrigerator();
+ try_to_freeze();
spin_lock(&journal->j_state_lock);
} else {
/*
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index f24df13..874d1c4 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -173,7 +173,7 @@ loop:
*/
jbd_debug(1, "Now suspending kjournald2\n");
write_unlock(&journal->j_state_lock);
- refrigerator();
+ try_to_freeze();
write_lock(&journal->j_state_lock);
} else {
/*
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 583636f..c8c2d35 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2348,7 +2348,7 @@ int jfsIOWait(void *arg)

if (freezing(current)) {
spin_unlock_irq(&log_redrive_lock);
- refrigerator();
+ try_to_freeze();
} else {
set_current_state(TASK_INTERRUPTIBLE);
spin_unlock_irq(&log_redrive_lock);
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
index af96060..bb8b661 100644
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -2800,7 +2800,7 @@ int jfs_lazycommit(void *arg)

if (freezing(current)) {
LAZY_UNLOCK(flags);
- refrigerator();
+ try_to_freeze();
} else {
DECLARE_WAITQUEUE(wq, current);

@@ -2994,7 +2994,7 @@ int jfs_sync(void *arg)

if (freezing(current)) {
TXN_UNLOCK();
- refrigerator();
+ try_to_freeze();
} else {
set_current_state(TASK_INTERRUPTIBLE);
TXN_UNLOCK();
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index bb24ab6..0e72ad6 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2470,7 +2470,7 @@ static int nilfs_segctor_thread(void *arg)

if (freezing(current)) {
spin_unlock(&sci->sc_state_lock);
- refrigerator();
+ try_to_freeze();
spin_lock(&sci->sc_state_lock);
} else {
DEFINE_WAIT(wait);
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index c57836d..9b6c223 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1741,7 +1741,7 @@ xfsbufd(

if (unlikely(freezing(current))) {
set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
- refrigerator();
+ try_to_freeze();
} else {
clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
}
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 1effc8b..3d73288 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -47,17 +47,16 @@ static inline bool should_send_signal(struct task_struct *p)
/* Takes and releases task alloc lock using task_lock() */
extern int thaw_process(struct task_struct *p);

-extern void refrigerator(void);
+extern bool __refrigerator(void);
extern int freeze_processes(void);
extern void thaw_processes(void);

-static inline int try_to_freeze(void)
+static inline bool try_to_freeze(void)
{
- if (freezing(current)) {
- refrigerator();
- return 1;
- } else
- return 0;
+ might_sleep();
+ if (likely(!freezing(current)))
+ return false;
+ return __refrigerator();
}

extern bool freeze_task(struct task_struct *p, bool sig_only);
@@ -170,11 +169,11 @@ static inline void set_freeze_flag(struct task_struct *p) {}
static inline void clear_freeze_flag(struct task_struct *p) {}
static inline int thaw_process(struct task_struct *p) { return 1; }

-static inline void refrigerator(void) {}
+static inline bool __refrigerator(void) { return false; }
static inline int freeze_processes(void) { BUG(); return 0; }
static inline void thaw_processes(void) {}

-static inline int try_to_freeze(void) { return 0; }
+static inline bool try_to_freeze(void) { return false; }

static inline void freezer_do_not_count(void) {}
static inline void freezer_count(void) {}
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 575f863..4d59904 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -23,10 +23,11 @@ static inline void frozen_process(void)
}

/* Refrigerator is place where frozen processes are stored :-). */
-void refrigerator(void)
+bool __refrigerator(void)
{
/* Hmm, should we be allowed to suspend when there are realtime
processes around? */
+ bool was_frozen = false;
long save;

task_lock(current);
@@ -35,7 +36,7 @@ void refrigerator(void)
task_unlock(current);
} else {
task_unlock(current);
- return;
+ return was_frozen;
}
save = current->state;
pr_debug("%s entered refrigerator\n", current->comm);
@@ -51,6 +52,7 @@ void refrigerator(void)
set_current_state(TASK_UNINTERRUPTIBLE);
if (!frozen(current))
break;
+ was_frozen = true;
schedule();
}

@@ -65,8 +67,10 @@ void refrigerator(void)
* synchronization which depends on ordered task state change.
*/
set_current_state(save);
+
+ return was_frozen;
}
-EXPORT_SYMBOL(refrigerator);
+EXPORT_SYMBOL(__refrigerator);

static void fake_signal_wake_up(struct task_struct *p)
{
--
1.7.6
Tejun Heo
2011-08-19 14:16:07 UTC
Permalink
refrigerator() saves current->state before entering frozen state and
restores it before returning using __set_current_state(); however,
this is racy, for example, please consider the following sequence.

set_current_state(TASK_INTERRUPTIBLE);
try_to_sleep();
if (kthread_should_stop())
break;
schedule();

If kthread_stop() races with ->state restoration, the restoration can
restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to
TASK_RUNNING but kthread_should_stop() may still see zero
->should_stop because there's no memory barrier between restoring
TASK_INTERRUPTIBLE and kthread_should_stop() test.

This isn't restricted to kthread_should_stop(). current->state is
often used in memory barrier based synchronization and silently
restoring it w/o mb breaks them.

Use set_current_state() instead.

Signed-off-by: Tejun Heo <***@kernel.org>
---
kernel/freezer.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 7b01de9..575f863 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -58,7 +58,13 @@ void refrigerator(void)
current->flags &= ~PF_FREEZING;

pr_debug("%s left refrigerator\n", current->comm);
- __set_current_state(save);
+
+ /*
+ * Restore saved task state before returning. The mb'd version
+ * needs to be used; otherwise, it might silently break
+ * synchronization which depends on ordered task state change.
+ */
+ set_current_state(save);
}
EXPORT_SYMBOL(refrigerator);
--
1.7.6
Oleg Nesterov
2011-08-19 15:52:22 UTC
Permalink
I'll try to read this series later.

Probably this doesn't matter since I didn't read the next patches, but
Post by Tejun Heo
refrigerator() saves current->state before entering frozen state and
restores it before returning using __set_current_state(); however,
this is racy,
Oh, yes. I even tried to ask for the explanation.
Post by Tejun Heo
set_current_state(TASK_INTERRUPTIBLE);
try_to_sleep();
if (kthread_should_stop())
break;
schedule();
Indeed, we can miss kthread->should_stop, and the patch fixes this
case.

But please look at, say, kauditd_thread(), it does

DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&kauditd_wait, &wait);

if (!skb_queue_len(&audit_skb_queue)) {
try_to_freeze();
schedule();
}

Now suppose that wake_up_interruptible(&kauditd_wait) happens, and
after that refrigerator() restores TASK_INTERRUPTIBLE.

Any reason refrigerator() should try to restore? Shouldn't we simply
change the rules? Yes, probably we will have to fix some users.

But it seems to me it is simply not possible to make this ->state
restoration correct.

Oleg.
Tejun Heo
2011-08-19 16:11:42 UTC
Permalink
Hello, Oleg.
Post by Oleg Nesterov
Indeed, we can miss kthread->should_stop, and the patch fixes this
case.
But please look at, say, kauditd_thread(), it does
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DECLARE_WAITQUEUE(wait, current);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0set_current_state(TASK_INTERRUPTIBLE);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0add_wait_queue(&kauditd_wait, &wait);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!skb_queue_len(&audit_skb_queue)) =
{
Post by Oleg Nesterov
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0try_to_freeze();
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0schedule();
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
Now suppose that wake_up_interruptible(&kauditd_wait) happens, and
after that refrigerator() restores TASK_INTERRUPTIBLE.
Any reason refrigerator() should try to restore? Shouldn't we simply
change the rules? Yes, probably we will have to fix some users.
But it seems to me it is simply not possible to make this ->state
restoration correct.
Yes, it's broken, but it's not the only thing broken. There are race
conditions everywhere. Even without freezer, a lot of kthread users
use kthread_should_stop() incorrectly - they test them without setting
sleep state and freezable kthreads are worse as the two conditions can
race each other. I think kthread is just too difficult to use. We can
try to salvage the situation and make kthread more friendly but I
think the better solution would be just converting everyone to use
workqueue or the kthread_wq wrapper. So, yeah, the whole thing is
vastly broken. Let's fix things in baby steps.

Thanks.

--=20
tejun
Rafael J. Wysocki
2011-08-19 21:08:59 UTC
Permalink
Post by Tejun Heo
refrigerator() saves current->state before entering frozen state and
restores it before returning using __set_current_state(); however,
this is racy, for example, please consider the following sequence.
set_current_state(TASK_INTERRUPTIBLE);
try_to_sleep();
Did you mean try_to_freeze() here?
Post by Tejun Heo
if (kthread_should_stop())
break;
schedule();
If kthread_stop() races with ->state restoration, the restoration can
restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to
TASK_RUNNING but kthread_should_stop() may still see zero
->should_stop because there's no memory barrier between restoring
TASK_INTERRUPTIBLE and kthread_should_stop() test.
This isn't restricted to kthread_should_stop(). current->state is
often used in memory barrier based synchronization and silently
restoring it w/o mb breaks them.
Use set_current_state() instead.
---
kernel/freezer.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 7b01de9..575f863 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -58,7 +58,13 @@ void refrigerator(void)
current->flags &= ~PF_FREEZING;
pr_debug("%s left refrigerator\n", current->comm);
- __set_current_state(save);
+
+ /*
+ * Restore saved task state before returning. The mb'd version
+ * needs to be used; otherwise, it might silently break
+ * synchronization which depends on ordered task state change.
+ */
+ set_current_state(save);
I think the change is correct.
Post by Tejun Heo
}
EXPORT_SYMBOL(refrigerator);
Thanks,
Rafael
Tejun Heo
2011-08-20 08:13:56 UTC
Permalink
Hello, Rafael.
Post by Rafael J. Wysocki
Post by Tejun Heo
refrigerator() saves current->state before entering frozen state and
restores it before returning using __set_current_state(); however,
this is racy, for example, please consider the following sequence.
set_current_state(TASK_INTERRUPTIBLE);
try_to_sleep();
Did you mean try_to_freeze() here?
Heh, yeah, I'm constantly being confused among try_to_freeze(),
try_to_sleep() and try_to_wake_up(). Will update. ;)

Thanks.
--
tejun
Tejun Heo
2011-08-19 14:16:15 UTC
Permalink
Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are
interlocked - freezing is set to request freeze and when the task
actually freezes, it clears freezing and sets frozen.

This interlocking makes things more complex than necessary - freezing
doesn't mean there's freezing condition in effect and frozen doesn't
match the task actually entering and leaving frozen state (it's
cleared by the thawing task).

This patch makes freezing indicate that freeze condition is in effect.
A task enters and stays frozen if freezing. This makes PF_FROZEN
manipulation done only by the task itself and prevents wakeup from
__thaw_task() leaking outside of refrigerator.

The only place which needs to tell freezing && !frozen is
try_to_freeze_task() to whine about tasks which don't enter frozen.
It's updated to test the condition explicitly.

Signed-off-by: Tejun Heo <***@kernel.org>
---
kernel/freezer.c | 38 +++++++++++++++++++++-----------------
kernel/power/process.c | 3 ++-
2 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index c60082d..d6165cd 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -22,14 +22,18 @@ bool __refrigerator(bool check_kthr_stop)
bool was_frozen = false;
long save;

+ /*
+ * Enter FROZEN. If NOFREEZE, schedule immediate thawing by
+ * clearing freezing.
+ */
spin_lock_irq(&freezer_lock);
if (!freezing(current)) {
spin_unlock_irq(&freezer_lock);
return was_frozen;
}
- if (!(current->flags & PF_NOFREEZE))
- current->flags |= PF_FROZEN;
- clear_freeze_flag(current);
+ if (current->flags & PF_NOFREEZE)
+ clear_freeze_flag(current);
+ current->flags |= PF_FROZEN;
spin_unlock_irq(&freezer_lock);

save = current->state;
@@ -44,7 +48,7 @@ bool __refrigerator(bool check_kthr_stop)

for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!frozen(current) ||
+ if (!freezing(current) ||
(check_kthr_stop && kthread_should_stop()))
break;
was_frozen = true;
@@ -54,6 +58,11 @@ bool __refrigerator(bool check_kthr_stop)
/* Remove the accounting blocker */
current->flags &= ~PF_FREEZING;

+ /* leave FROZEN */
+ spin_lock_irq(&freezer_lock);
+ current->flags &= ~PF_FROZEN;
+ spin_unlock_irq(&freezer_lock);
+
pr_debug("%s left refrigerator\n", current->comm);

/*
@@ -137,24 +146,19 @@ void cancel_freezing(struct task_struct *p)
spin_unlock_irqrestore(&freezer_lock, flags);
}

-/*
- * Wake up a frozen task
- *
- * task_lock() is needed to prevent the race with refrigerator() which may
- * occur if the freezing of tasks fails. Namely, without the lock, if the
- * freezing of tasks failed, thaw_tasks() might have run before a task in
- * refrigerator() could call frozen_process(), in which case the task would be
- * frozen and no one would thaw it.
- */
void __thaw_task(struct task_struct *p)
{
unsigned long flags;

+ /*
+ * Clear freezing and kick @p if FROZEN. Clearing is guaranteed to
+ * be visible to @p as waking up implies wmb. Waking up inside
+ * freezer_lock also prevents wakeups from leaking outside
+ * refrigerator.
+ */
spin_lock_irqsave(&freezer_lock, flags);
- if (frozen(p)) {
- p->flags &= ~PF_FROZEN;
+ clear_freeze_flag(p);
+ if (frozen(p))
wake_up_process(p);
- } else
- clear_freeze_flag(p);
spin_unlock_irqrestore(&freezer_lock, flags);
}
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 96d138e..f075c2f 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -118,7 +118,8 @@ static int try_to_freeze_tasks(bool sig_only)

read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (!wakeup && freezing(p) && !freezer_should_skip(p))
+ if (!wakeup && !freezer_should_skip(p) &&
+ freezing(p) && !frozen(p))
sched_show_task(p);
cancel_freezing(p);
} while_each_thread(g, p);
--
1.7.6
Oleg Nesterov
2011-08-28 17:56:01 UTC
Permalink
Post by Tejun Heo
This patch makes freezing indicate that freeze condition is in effect.
A task enters and stays frozen if freezing. This makes PF_FROZEN
manipulation done only by the task itself and prevents wakeup from
__thaw_task() leaking outside of refrigerator.
but this looks racy.
Post by Tejun Heo
@@ -44,7 +48,7 @@ bool __refrigerator(bool check_kthr_stop)
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!frozen(current) ||
+ if (!freezing(current) ||
(check_kthr_stop && kthread_should_stop()))
break;
was_frozen = true;
@@ -54,6 +58,11 @@ bool __refrigerator(bool check_kthr_stop)
/* Remove the accounting blocker */
current->flags &= ~PF_FREEZING;
+ /* leave FROZEN */
+ spin_lock_irq(&freezer_lock);
+ current->flags &= ~PF_FROZEN;
+ spin_unlock_irq(&freezer_lock);
What if freezing() is true again when we are going to clear
PF_FROZEN?

In this case the 2nd try_to_freeze_tasks() can see this task
as already frozen and return success while it is going to run.

Oleg.
Tejun Heo
2011-08-29 07:31:31 UTC
Permalink
Post by Oleg Nesterov
Post by Tejun Heo
+ /* leave FROZEN */
+ spin_lock_irq(&freezer_lock);
+ current->flags &= ~PF_FROZEN;
+ spin_unlock_irq(&freezer_lock);
What if freezing() is true again when we are going to clear
PF_FROZEN?
In this case the 2nd try_to_freeze_tasks() can see this task
as already frozen and return success while it is going to run.
Indeed, if we need to do if (frezing) goto retry; there. Will fix.

Thanks.
--
tejun
Oleg Nesterov
2011-08-29 17:44:40 UTC
Permalink
Post by Oleg Nesterov
Post by Tejun Heo
@@ -44,7 +48,7 @@ bool __refrigerator(bool check_kthr_stop)
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!frozen(current) ||
+ if (!freezing(current) ||
(check_kthr_stop && kthread_should_stop()))
break;
was_frozen = true;
@@ -54,6 +58,11 @@ bool __refrigerator(bool check_kthr_stop)
/* Remove the accounting blocker */
current->flags &= ~PF_FREEZING;
+ /* leave FROZEN */
+ spin_lock_irq(&freezer_lock);
+ current->flags &= ~PF_FROZEN;
+ spin_unlock_irq(&freezer_lock);
What if freezing() is true again when we are going to clear
PF_FROZEN?
And another problem, afaics....

So. With this change frozen(p) == T even after __thaw_task().
PF_FROZEN will be cleared eventually, but we can't know when.

IOW, we can't trust frozen() unless freezing() == T.

This means update_if_frozen() can hit BUG_ON(nfrozen > 0) if
the caller is freezer_write().

Oleg.
Tejun Heo
2011-08-19 14:16:14 UTC
Permalink
Freezer synchronizatino is needlessly complicated - it's by no means a
hot path and the priority is staying unintrusive and safe. This patch
makes it simply use a dedicated lock instead of piggy-backing on
task_lock() and playing with memory barriers.

On the failure path of try_to_freeze_tasks(), locking is moved from it
to cancel_freezing(). This makes the frozen() test racy but the race
here is a non-issue as the warning is printed for tasks which failed
to enter frozen for 20 seconds and race on PF_FROZEN at the last
moment doesn't change anything.

This simplifies freezer implementation and eases further changes
including some race fixes.

Signed-off-by: Tejun Heo <***@kernel.org>
---
kernel/freezer.c | 82 +++++++++++++++++++++---------------------------
kernel/power/process.c | 2 -
2 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index f5db7fd..c60082d 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -11,17 +11,8 @@
#include <linux/freezer.h>
#include <linux/kthread.h>

-/*
- * freezing is complete, mark current process as frozen
- */
-static inline void frozen_process(void)
-{
- if (!unlikely(current->flags & PF_NOFREEZE)) {
- current->flags |= PF_FROZEN;
- smp_wmb();
- }
- clear_freeze_flag(current);
-}
+/* protects freezing and frozen transitions */
+static DEFINE_SPINLOCK(freezer_lock);

/* Refrigerator is place where frozen processes are stored :-). */
bool __refrigerator(bool check_kthr_stop)
@@ -31,14 +22,16 @@ bool __refrigerator(bool check_kthr_stop)
bool was_frozen = false;
long save;

- task_lock(current);
- if (freezing(current)) {
- frozen_process();
- task_unlock(current);
- } else {
- task_unlock(current);
+ spin_lock_irq(&freezer_lock);
+ if (!freezing(current)) {
+ spin_unlock_irq(&freezer_lock);
return was_frozen;
}
+ if (!(current->flags & PF_NOFREEZE))
+ current->flags |= PF_FROZEN;
+ clear_freeze_flag(current);
+ spin_unlock_irq(&freezer_lock);
+
save = current->state;
pr_debug("%s entered refrigerator\n", current->comm);

@@ -99,21 +92,18 @@ static void fake_signal_wake_up(struct task_struct *p)
*/
bool freeze_task(struct task_struct *p, bool sig_only)
{
- /*
- * We first check if the task is freezing and next if it has already
- * been frozen to avoid the race with frozen_process() which first marks
- * the task as frozen and next clears its TIF_FREEZE.
- */
- if (!freezing(p)) {
- smp_rmb();
- if (frozen(p))
- return false;
-
- if (!sig_only || should_send_signal(p))
- set_freeze_flag(p);
- else
- return false;
- }
+ unsigned long flags;
+ bool ret = false;
+
+ spin_lock_irqsave(&freezer_lock, flags);
+
+ if (sig_only && !should_send_signal(p))
+ goto out_unlock;
+
+ if (frozen(p))
+ goto out_unlock;
+
+ set_freeze_flag(p);

if (should_send_signal(p)) {
fake_signal_wake_up(p);
@@ -123,26 +113,28 @@ bool freeze_task(struct task_struct *p, bool sig_only)
* TASK_RUNNING transition can't race with task state
* testing in try_to_freeze_tasks().
*/
- } else if (sig_only) {
- return false;
} else {
wake_up_state(p, TASK_INTERRUPTIBLE);
}
-
- return true;
+ ret = true;
+out_unlock:
+ spin_unlock_irqrestore(&freezer_lock, flags);
+ return ret;
}

void cancel_freezing(struct task_struct *p)
{
unsigned long flags;

+ spin_lock_irqsave(&freezer_lock, flags);
if (freezing(p)) {
pr_debug(" clean up: %s\n", p->comm);
clear_freeze_flag(p);
- spin_lock_irqsave(&p->sighand->siglock, flags);
+ spin_lock(&p->sighand->siglock);
recalc_sigpending_and_wake(p);
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ spin_unlock(&p->sighand->siglock);
}
+ spin_unlock_irqrestore(&freezer_lock, flags);
}

/*
@@ -156,15 +148,13 @@ void cancel_freezing(struct task_struct *p)
*/
void __thaw_task(struct task_struct *p)
{
- bool was_frozen;
+ unsigned long flags;

- task_lock(p);
- if ((was_frozen = frozen(p)))
+ spin_lock_irqsave(&freezer_lock, flags);
+ if (frozen(p)) {
p->flags &= ~PF_FROZEN;
- else
- clear_freeze_flag(p);
- task_unlock(p);
-
- if (was_frozen)
wake_up_process(p);
+ } else
+ clear_freeze_flag(p);
+ spin_unlock_irqrestore(&freezer_lock, flags);
}
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 3eee548..96d138e 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -118,11 +118,9 @@ static int try_to_freeze_tasks(bool sig_only)

read_lock(&tasklist_lock);
do_each_thread(g, p) {
- task_lock(p);
if (!wakeup && freezing(p) && !freezer_should_skip(p))
sched_show_task(p);
cancel_freezing(p);
- task_unlock(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
} else {
--
1.7.6
Oleg Nesterov
2011-08-28 17:51:16 UTC
Permalink
Post by Tejun Heo
it's by no means a
hot path and the priority is staying unintrusive and safe. This patch
makes it simply use a dedicated lock
Agreed. but could you explain why it should be irq-safe? This is not
clear from the changelog.
Post by Tejun Heo
+ if (!(current->flags & PF_NOFREEZE))
+ current->flags |= PF_FROZEN;
it is not clear why do we check PF_NOFREEZE... but OK, iiuc you
remove this check later anyway.



Off-topic, but fake_signal_wake_up() is not safe if the caller
try_to_freeze_cgroup(). Unlike try_to_freeze_tasks() (which holds
tasklist) we can race with the exiting thread, ->sighand can be
NULL.

Oleg.
Oleg Nesterov
2011-08-28 18:21:24 UTC
Permalink
Post by Oleg Nesterov
Off-topic, but fake_signal_wake_up() is not safe if the caller
try_to_freeze_cgroup(). Unlike try_to_freeze_tasks() (which holds
tasklist) we can race with the exiting thread, ->sighand can be
NULL.
Although with PF_NOFREEZE in do_exit() from 6/16, this race is only
theoretical.

And, forgot to mention, __thaw_task() looks racy too, but i think
this recalc_sigpending_and_wake() can simply go away.

Oleg.
Tejun Heo
2011-08-29 07:20:54 UTC
Permalink
Hello,
Post by Oleg Nesterov
Post by Tejun Heo
it's by no means a
hot path and the priority is staying unintrusive and safe. This patch
makes it simply use a dedicated lock
Agreed. but could you explain why it should be irq-safe? This is not
clear from the changelog.
It doesn't need to be. cgroup_freezer assumes irq-safety so I didn't
want to change it. I was thinking about dropping irq-safety from all
of them later on.
Post by Oleg Nesterov
Off-topic, but fake_signal_wake_up() is not safe if the caller
try_to_freeze_cgroup(). Unlike try_to_freeze_tasks() (which holds
tasklist) we can race with the exiting thread, ->sighand can be
NULL.
Indeed, guess we'll need to grab tasklist_lock around
try_to_freeze_cgroup() too.

Thanks.
--
tejun
Tejun Heo
2011-08-19 14:16:17 UTC
Permalink
With the previous changes, there's no meaningful difference between
PF_FREEZING and PF_FROZEN. Remove PF_FREEZING and use PF_FROZEN
instead in task_contributes_to_load().

Signed-off-by: Tejun Heo <***@kernel.org>
---
include/linux/sched.h | 3 +--
kernel/freezer.c | 6 ------
2 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4ac2c05..1bb3356 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -219,7 +219,7 @@ extern char ___assert_task_state[1 - 2*!!(
((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
#define task_contributes_to_load(task) \
((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
- (task->flags & PF_FREEZING) == 0)
+ (task->flags & PF_FROZEN) == 0)

#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
@@ -1769,7 +1769,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_MEMALLOC 0x00000800 /* Allocating memory */
#define PF_NPROC_EXCEEDED 0x00001000 /* set_user noticed that RLIMIT_NPROC was exceeded */
#define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */
-#define PF_FREEZING 0x00004000 /* freeze in progress. do not account to load */
#define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */
#define PF_FROZEN 0x00010000 /* frozen for system suspend */
#define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 501f1b7..82332bb 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -43,9 +43,6 @@ bool __refrigerator(bool check_kthr_stop)
recalc_sigpending(); /* We sent fake signal, clean it up */
spin_unlock_irq(&current->sighand->siglock);

- /* prevent accounting of that task to load */
- current->flags |= PF_FREEZING;
-
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!freezing(current) ||
@@ -55,9 +52,6 @@ bool __refrigerator(bool check_kthr_stop)
schedule();
}

- /* Remove the accounting blocker */
- current->flags &= ~PF_FREEZING;
-
/* leave FROZEN */
spin_lock_irq(&freezer_lock);
current->flags &= ~PF_FROZEN;
--
1.7.6
Tejun Heo
2011-08-19 14:16:22 UTC
Permalink
should_send_signal() is only used in freezer.c. Exporting them only
increases chance of abuse. Open code the two users and remove it.

Update frozen() to return bool.

Signed-off-by: Tejun Heo <***@kernel.org>
---
include/linux/freezer.h | 9 ++-------
kernel/freezer.c | 2 +-
2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 11b2bd9..122b5ce 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -15,7 +15,7 @@ extern bool pm_nosig_freezing; /* PM nosig freezing in effect */
/*
* Check if a process has been frozen
*/
-static inline int frozen(struct task_struct *p)
+static inline bool frozen(struct task_struct *p)
{
return p->flags & PF_FROZEN;
}
@@ -32,11 +32,6 @@ static inline bool freezing(struct task_struct *p)
return freezing_slow_path(p);
}

-static inline bool should_send_signal(struct task_struct *p)
-{
- return !(p->flags & PF_FREEZER_NOSIG);
-}
-
/* Takes and releases task alloc lock using task_lock() */
extern void __thaw_task(struct task_struct *t);

@@ -156,7 +151,7 @@ static inline bool set_freezable_with_signal(void)
__retval; \
})
#else /* !CONFIG_FREEZER */
-static inline int frozen(struct task_struct *p) { return 0; }
+static inline bool frozen(struct task_struct *p) { return false; }
static inline bool freezing(struct task_struct *p) { return false; }

static inline bool __refrigerator(bool check_kthr_stop) { return false; }
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 995bddf..466ea6b 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -129,7 +129,7 @@ bool freeze_task(struct task_struct *p, bool sig_only)
return false;
}

- if (should_send_signal(p)) {
+ if (!(p->flags & PF_FREEZER_NOSIG)) {
fake_signal_wake_up(p);
/*
* fake_signal_wake_up() goes through p's scheduler
--
1.7.6
Tejun Heo
2011-08-19 14:16:21 UTC
Permalink
Signed-off-by: Tejun Heo <***@kernel.org>
Cc: Arnd Bergmann <***@arndb.de>
Cc: linux-***@vger.kernel.org
---
arch/alpha/include/asm/thread_info.h | 2 --
arch/arm/include/asm/thread_info.h | 2 --
arch/avr32/include/asm/thread_info.h | 2 --
arch/blackfin/include/asm/thread_info.h | 2 --
arch/cris/include/asm/thread_info.h | 2 --
arch/frv/include/asm/thread_info.h | 2 --
arch/h8300/include/asm/thread_info.h | 2 --
arch/ia64/include/asm/thread_info.h | 2 --
arch/m32r/include/asm/thread_info.h | 2 --
arch/m68k/include/asm/thread_info.h | 1 -
arch/microblaze/include/asm/thread_info.h | 2 --
arch/mips/include/asm/thread_info.h | 2 --
arch/mn10300/include/asm/thread_info.h | 2 --
arch/parisc/include/asm/thread_info.h | 2 --
arch/powerpc/include/asm/thread_info.h | 2 --
arch/s390/include/asm/thread_info.h | 2 --
arch/sh/include/asm/thread_info.h | 2 --
arch/sparc/include/asm/thread_info_32.h | 2 --
arch/sparc/include/asm/thread_info_64.h | 2 --
arch/um/include/asm/thread_info.h | 2 --
arch/unicore32/include/asm/thread_info.h | 2 --
arch/x86/include/asm/thread_info.h | 2 --
arch/xtensa/include/asm/thread_info.h | 2 --
23 files changed, 0 insertions(+), 45 deletions(-)

diff --git a/arch/alpha/include/asm/thread_info.h b/arch/alpha/include/asm/thread_info.h
index 6f32f9c..7e55102 100644
--- a/arch/alpha/include/asm/thread_info.h
+++ b/arch/alpha/include/asm/thread_info.h
@@ -79,7 +79,6 @@ register struct thread_info *__current_thread_info __asm__("$8");
#define TIF_UAC_SIGBUS 12
#define TIF_MEMDIE 13 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 14 /* restore signal mask in do_signal */
-#define TIF_FREEZE 16 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1<<TIF_SIGPENDING)
@@ -87,7 +86,6 @@ register struct thread_info *__current_thread_info __asm__("$8");
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

/* Work to do on interrupt/exception return. */
#define _TIF_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 7b5cc8d..0f30c3a 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -142,7 +142,6 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define TIF_POLLING_NRFLAG 16
#define TIF_USING_IWMMXT 17
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
-#define TIF_FREEZE 19
#define TIF_RESTORE_SIGMASK 20
#define TIF_SECCOMP 21

@@ -152,7 +151,6 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_USING_IWMMXT (1 << TIF_USING_IWMMXT)
-#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)

diff --git a/arch/avr32/include/asm/thread_info.h b/arch/avr32/include/asm/thread_info.h
index 7a9c03d..e5deda4 100644
--- a/arch/avr32/include/asm/thread_info.h
+++ b/arch/avr32/include/asm/thread_info.h
@@ -85,7 +85,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_RESTORE_SIGMASK 7 /* restore signal mask in do_signal */
#define TIF_CPU_GOING_TO_SLEEP 8 /* CPU is entering sleep 0 mode */
#define TIF_NOTIFY_RESUME 9 /* callback before returning to user */
-#define TIF_FREEZE 29
#define TIF_DEBUG 30 /* debugging enabled */
#define TIF_USERSPACE 31 /* true if FS sets userspace */

@@ -98,7 +97,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
#define _TIF_CPU_GOING_TO_SLEEP (1 << TIF_CPU_GOING_TO_SLEEP)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
-#define _TIF_FREEZE (1 << TIF_FREEZE)

/* Note: The masks below must never span more than 16 bits! */

diff --git a/arch/blackfin/include/asm/thread_info.h b/arch/blackfin/include/asm/thread_info.h
index 02560fd..53ad100 100644
--- a/arch/blackfin/include/asm/thread_info.h
+++ b/arch/blackfin/include/asm/thread_info.h
@@ -100,7 +100,6 @@ static inline struct thread_info *current_thread_info(void)
TIF_NEED_RESCHED */
#define TIF_MEMDIE 4 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 5 /* restore signal mask in do_signal() */
-#define TIF_FREEZE 6 /* is freezing for suspend */
#define TIF_IRQ_SYNC 7 /* sync pipeline stage */
#define TIF_NOTIFY_RESUME 8 /* callback before returning to user */
#define TIF_SINGLESTEP 9
@@ -111,7 +110,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_NEED_RESCHED (1<<TIF_NEED_RESCHED)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
-#define _TIF_FREEZE (1<<TIF_FREEZE)
#define _TIF_IRQ_SYNC (1<<TIF_IRQ_SYNC)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
#define _TIF_SINGLESTEP (1<<TIF_SINGLESTEP)
diff --git a/arch/cris/include/asm/thread_info.h b/arch/cris/include/asm/thread_info.h
index 332f19c..29b9288 100644
--- a/arch/cris/include/asm/thread_info.h
+++ b/arch/cris/include/asm/thread_info.h
@@ -86,7 +86,6 @@ struct thread_info {
#define TIF_RESTORE_SIGMASK 9 /* restore signal mask in do_signal() */
#define TIF_POLLING_NRFLAG 16 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
-#define TIF_FREEZE 18 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
@@ -94,7 +93,6 @@ struct thread_info {
#define _TIF_NEED_RESCHED (1<<TIF_NEED_RESCHED)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */
#define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */
diff --git a/arch/frv/include/asm/thread_info.h b/arch/frv/include/asm/thread_info.h
index cefbe73..92d83ea 100644
--- a/arch/frv/include/asm/thread_info.h
+++ b/arch/frv/include/asm/thread_info.h
@@ -111,7 +111,6 @@ register struct thread_info *__current_thread_info asm("gr15");
#define TIF_RESTORE_SIGMASK 5 /* restore signal mask in do_signal() */
#define TIF_POLLING_NRFLAG 16 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
-#define TIF_FREEZE 18 /* freezing for suspend */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@@ -120,7 +119,6 @@ register struct thread_info *__current_thread_info asm("gr15");
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1 << TIF_FREEZE)

#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */
#define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */
diff --git a/arch/h8300/include/asm/thread_info.h b/arch/h8300/include/asm/thread_info.h
index d6f1784..9c126e0 100644
--- a/arch/h8300/include/asm/thread_info.h
+++ b/arch/h8300/include/asm/thread_info.h
@@ -90,7 +90,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_MEMDIE 4 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 5 /* restore signal mask in do_signal() */
#define TIF_NOTIFY_RESUME 6 /* callback before returning to user */
-#define TIF_FREEZE 16 /* is freezing for suspend */

/* as above, but as bit values */
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
@@ -99,7 +98,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */

diff --git a/arch/ia64/include/asm/thread_info.h b/arch/ia64/include/asm/thread_info.h
index ff0cc84..e054bcc 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -113,7 +113,6 @@ struct thread_info {
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
#define TIF_MCA_INIT 18 /* this task is processing MCA or INIT */
#define TIF_DB_DISABLED 19 /* debug trap disabled for fsyscall */
-#define TIF_FREEZE 20 /* is freezing for suspend */
#define TIF_RESTORE_RSE 21 /* user RBS is newer than kernel RBS */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
@@ -126,7 +125,6 @@ struct thread_info {
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_MCA_INIT (1 << TIF_MCA_INIT)
#define _TIF_DB_DISABLED (1 << TIF_DB_DISABLED)
-#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_RESTORE_RSE (1 << TIF_RESTORE_RSE)

/* "work to do on user-return" bits */
diff --git a/arch/m32r/include/asm/thread_info.h b/arch/m32r/include/asm/thread_info.h
index 0227dba..bf8fa3c 100644
--- a/arch/m32r/include/asm/thread_info.h
+++ b/arch/m32r/include/asm/thread_info.h
@@ -138,7 +138,6 @@ static inline unsigned int get_thread_fault_code(void)
#define TIF_USEDFPU 16 /* FPU was used by this task this quantum (SMP) */
#define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
-#define TIF_FREEZE 19 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1<<TIF_SIGPENDING)
@@ -149,7 +148,6 @@ static inline unsigned int get_thread_fault_code(void)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_USEDFPU (1<<TIF_USEDFPU)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */
#define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */
diff --git a/arch/m68k/include/asm/thread_info.h b/arch/m68k/include/asm/thread_info.h
index 7909889..294df15 100644
--- a/arch/m68k/include/asm/thread_info.h
+++ b/arch/m68k/include/asm/thread_info.h
@@ -103,7 +103,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_DELAYED_TRACE 14 /* single step a syscall */
#define TIF_SYSCALL_TRACE 15 /* syscall trace active */
#define TIF_MEMDIE 16 /* is terminating due to OOM killer */
-#define TIF_FREEZE 17 /* thread is freezing for suspend */
#define TIF_RESTORE_SIGMASK 18 /* restore signal mask in do_signal */

#endif /* _ASM_M68K_THREAD_INFO_H */
diff --git a/arch/microblaze/include/asm/thread_info.h b/arch/microblaze/include/asm/thread_info.h
index b73da2a..1a8ab6a 100644
--- a/arch/microblaze/include/asm/thread_info.h
+++ b/arch/microblaze/include/asm/thread_info.h
@@ -125,7 +125,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_MEMDIE 6 /* is terminating due to OOM killer */
#define TIF_SYSCALL_AUDIT 9 /* syscall auditing active */
#define TIF_SECCOMP 10 /* secure computing */
-#define TIF_FREEZE 14 /* Freezing for suspend */

/* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_POLLING_NRFLAG 16
@@ -137,7 +136,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_IRET (1 << TIF_IRET)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)

diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 97f8bf6..0d85d8e 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -117,7 +117,6 @@ register struct thread_info *__current_thread_info __asm__("$28");
#define TIF_USEDFPU 16 /* FPU was used by this task this quantum (SMP) */
#define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
-#define TIF_FREEZE 19
#define TIF_FIXADE 20 /* Fix address errors in software */
#define TIF_LOGADE 21 /* Log address errors to syslog */
#define TIF_32BIT_REGS 22 /* also implies 16/32 fprs */
@@ -141,7 +140,6 @@ register struct thread_info *__current_thread_info __asm__("$28");
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_USEDFPU (1<<TIF_USEDFPU)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1<<TIF_FREEZE)
#define _TIF_FIXADE (1<<TIF_FIXADE)
#define _TIF_LOGADE (1<<TIF_LOGADE)
#define _TIF_32BIT_REGS (1<<TIF_32BIT_REGS)
diff --git a/arch/mn10300/include/asm/thread_info.h b/arch/mn10300/include/asm/thread_info.h
index 87c2130..28cf521 100644
--- a/arch/mn10300/include/asm/thread_info.h
+++ b/arch/mn10300/include/asm/thread_info.h
@@ -165,7 +165,6 @@ extern void free_thread_info(struct thread_info *);
#define TIF_RESTORE_SIGMASK 5 /* restore signal mask in do_signal() */
#define TIF_POLLING_NRFLAG 16 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
-#define TIF_FREEZE 18 /* freezing for suspend */

#define _TIF_SYSCALL_TRACE +(1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME +(1 << TIF_NOTIFY_RESUME)
@@ -174,7 +173,6 @@ extern void free_thread_info(struct thread_info *);
#define _TIF_SINGLESTEP +(1 << TIF_SINGLESTEP)
#define _TIF_RESTORE_SIGMASK +(1 << TIF_RESTORE_SIGMASK)
#define _TIF_POLLING_NRFLAG +(1 << TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE +(1 << TIF_FREEZE)

#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */
#define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */
diff --git a/arch/parisc/include/asm/thread_info.h b/arch/parisc/include/asm/thread_info.h
index aa8de72..6d9c7c7 100644
--- a/arch/parisc/include/asm/thread_info.h
+++ b/arch/parisc/include/asm/thread_info.h
@@ -58,7 +58,6 @@ struct thread_info {
#define TIF_32BIT 4 /* 32 bit binary */
#define TIF_MEMDIE 5 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 6 /* restore saved signal mask */
-#define TIF_FREEZE 7 /* is freezing for suspend */
#define TIF_NOTIFY_RESUME 8 /* callback before returning to user */
#define TIF_SINGLESTEP 9 /* single stepping? */
#define TIF_BLOCKSTEP 10 /* branch stepping? */
@@ -69,7 +68,6 @@ struct thread_info {
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_32BIT (1 << TIF_32BIT)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
-#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 836f231..96471494 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -109,7 +109,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */
#define TIF_NOERROR 12 /* Force successful syscall return */
#define TIF_NOTIFY_RESUME 13 /* callback before returning to user */
-#define TIF_FREEZE 14 /* Freezing for suspend */
#define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */
#define TIF_RUNLATCH 16 /* Is the runlatch enabled? */

@@ -127,7 +126,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_RESTOREALL (1<<TIF_RESTOREALL)
#define _TIF_NOERROR (1<<TIF_NOERROR)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
-#define _TIF_FREEZE (1<<TIF_FREEZE)
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_RUNLATCH (1<<TIF_RUNLATCH)
#define _TIF_SYSCALL_T_OR_A (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
index 1a5dbb6..589708a 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -101,7 +101,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 19 /* restore signal mask in do_signal() */
#define TIF_SINGLE_STEP 20 /* This task is single stepped */
-#define TIF_FREEZE 21 /* thread is freezing for suspend */

#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
@@ -118,7 +117,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_31BIT (1<<TIF_31BIT)
#define _TIF_SINGLE_STEP (1<<TIF_FREEZE)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#ifdef CONFIG_64BIT
#define is_32bit_task() (test_thread_flag(TIF_31BIT))
diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h
index ea2d508..20ee40a 100644
--- a/arch/sh/include/asm/thread_info.h
+++ b/arch/sh/include/asm/thread_info.h
@@ -122,7 +122,6 @@ extern void init_thread_xstate(void);
#define TIF_SYSCALL_TRACEPOINT 8 /* for ftrace syscall instrumentation */
#define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
-#define TIF_FREEZE 19 /* Freezing for suspend */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
@@ -133,7 +132,6 @@ extern void init_thread_xstate(void);
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1 << TIF_FREEZE)

/*
* _TIF_ALLWORK_MASK and _TIF_WORK_MASK need to fit within 2 bytes, or we
diff --git a/arch/sparc/include/asm/thread_info_32.h b/arch/sparc/include/asm/thread_info_32.h
index fa57532..5cc5888 100644
--- a/arch/sparc/include/asm/thread_info_32.h
+++ b/arch/sparc/include/asm/thread_info_32.h
@@ -133,7 +133,6 @@ BTFIXUPDEF_CALL(void, free_thread_info, struct thread_info *)
#define TIF_POLLING_NRFLAG 9 /* true if poll_idle() is polling
* TIF_NEED_RESCHED */
#define TIF_MEMDIE 10 /* is terminating due to OOM killer */
-#define TIF_FREEZE 11 /* is freezing for suspend */

/* as above, but as bit values */
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
@@ -147,7 +146,6 @@ BTFIXUPDEF_CALL(void, free_thread_info, struct thread_info *)
#define _TIF_DO_NOTIFY_RESUME_MASK (_TIF_NOTIFY_RESUME | \
_TIF_SIGPENDING | \
_TIF_RESTORE_SIGMASK)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#endif /* __KERNEL__ */

diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h
index 60d86be..01d057f 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -225,7 +225,6 @@ register struct thread_info *current_thread_info_reg asm("g6");
/* flag bit 12 is available */
#define TIF_MEMDIE 13 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 14
-#define TIF_FREEZE 15 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
@@ -237,7 +236,6 @@ register struct thread_info *current_thread_info_reg asm("g6");
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#define _TIF_USER_WORK_MASK ((0xff << TI_FLAG_WSAVED_SHIFT) | \
_TIF_DO_NOTIFY_RESUME_MASK | \
diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 5bd1bad..200c4ab 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -71,7 +71,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_MEMDIE 5 /* is terminating due to OOM killer */
#define TIF_SYSCALL_AUDIT 6
#define TIF_RESTORE_SIGMASK 7
-#define TIF_FREEZE 16 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
@@ -80,6 +79,5 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_MEMDIE (1 << TIF_MEMDIE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
-#define _TIF_FREEZE (1 << TIF_FREEZE)

#endif
diff --git a/arch/unicore32/include/asm/thread_info.h b/arch/unicore32/include/asm/thread_info.h
index c270e9e..89f7557 100644
--- a/arch/unicore32/include/asm/thread_info.h
+++ b/arch/unicore32/include/asm/thread_info.h
@@ -135,14 +135,12 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
#define TIF_SYSCALL_TRACE 8
#define TIF_MEMDIE 18
-#define TIF_FREEZE 19
#define TIF_RESTORE_SIGMASK 20

#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
-#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)

/*
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a1fe5c1..32125af 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -90,7 +90,6 @@ struct thread_info {
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_DEBUG 21 /* uses debug registers */
#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
-#define TIF_FREEZE 23 /* is freezing for suspend */
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
@@ -112,7 +111,6 @@ struct thread_info {
#define _TIF_FORK (1 << TIF_FORK)
#define _TIF_DEBUG (1 << TIF_DEBUG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
-#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
diff --git a/arch/xtensa/include/asm/thread_info.h b/arch/xtensa/include/asm/thread_info.h
index 7be8acc..6abbedd 100644
--- a/arch/xtensa/include/asm/thread_info.h
+++ b/arch/xtensa/include/asm/thread_info.h
@@ -132,7 +132,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_MEMDIE 5 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 6 /* restore signal mask in do_signal() */
#define TIF_POLLING_NRFLAG 16 /* true if poll_idle() is polling TIF_NEED_RESCHED */
-#define TIF_FREEZE 17 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1<<TIF_SIGPENDING)
@@ -141,7 +140,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_IRET (1<<TIF_IRET)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
-#define _TIF_FREEZE (1<<TIF_FREEZE)

#define _TIF_WORK_MASK 0x0000FFFE /* work to do on interrupt/exception return */
#define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */
--
1.7.6
Tejun Heo
2011-08-19 14:16:20 UTC
Permalink
Using TIF_FREEZE for freezing worked when there was only single
freezing condition (the system-wide one); however, now there also is
the cgroup_freezer and single bit flag is getting clumsy.
thaw_processes() is already testing whether cgroup freezing in in
effect to avoid thawing tasks which were frozen by both system and
cgroup freezers.

This is racy (nothing prevents race against cgroup freezing) and
fragile. A much simpler way is to test actual freeze conditions from
freezing() - ie. directly test whether system or cgroup freezing is in
effect.

This patch adds variables to indicate whether and what type of system
freezing is in effect and reimplements freezing() such that it
directly tests whether any of the two freezing conditions is active
and the task should freeze. On fast path, freezing() is still very
cheap, it only tests system_freezing_cnt.

This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
freeze/thaw operations more usual - updating state variables for the
new state and nudging target tasks so that they notice the new state
and comply. As long as the nudging happens after state update, it's
race-free.

* This allows use of freezing() in freeze_task(). Replace the open
coded tests with freezing().

* p != current test is added to warning printing conditions in
try_to_freeze_tasks() failure path. This is necessary as freezing()
is now true for the task which initiated freezing too.

Signed-off-by: Tejun Heo <***@kernel.org>
---
include/linux/freezer.h | 33 +++++++++----------------
kernel/cgroup_freezer.c | 8 +++++-
kernel/fork.c | 1 -
kernel/freezer.c | 62 ++++++++++++++++++++++++++++++----------------
kernel/power/process.c | 15 ++++++++---
5 files changed, 70 insertions(+), 49 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index bcc0637..11b2bd9 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -5,8 +5,13 @@

#include <linux/sched.h>
#include <linux/wait.h>
+#include <linux/atomic.h>

#ifdef CONFIG_FREEZER
+extern atomic_t system_freezing_cnt; /* nr of freezing conds in effect */
+extern bool pm_freezing; /* PM freezing in effect */
+extern bool pm_nosig_freezing; /* PM nosig freezing in effect */
+
/*
* Check if a process has been frozen
*/
@@ -15,28 +20,16 @@ static inline int frozen(struct task_struct *p)
return p->flags & PF_FROZEN;
}

-/*
- * Check if there is a request to freeze a process
- */
-static inline int freezing(struct task_struct *p)
-{
- return test_tsk_thread_flag(p, TIF_FREEZE);
-}
+extern bool freezing_slow_path(struct task_struct *p);

/*
- * Request that a process be frozen
- */
-static inline void set_freeze_flag(struct task_struct *p)
-{
- set_tsk_thread_flag(p, TIF_FREEZE);
-}
-
-/*
- * Sometimes we may need to cancel the previous 'freeze' request
+ * Check if there is a request to freeze a process
*/
-static inline void clear_freeze_flag(struct task_struct *p)
+static inline bool freezing(struct task_struct *p)
{
- clear_tsk_thread_flag(p, TIF_FREEZE);
+ if (likely(!atomic_read(&system_freezing_cnt)))
+ return false;
+ return freezing_slow_path(p);
}

static inline bool should_send_signal(struct task_struct *p)
@@ -164,9 +157,7 @@ static inline bool set_freezable_with_signal(void)
})
#else /* !CONFIG_FREEZER */
static inline int frozen(struct task_struct *p) { return 0; }
-static inline int freezing(struct task_struct *p) { return 0; }
-static inline void set_freeze_flag(struct task_struct *p) {}
-static inline void clear_freeze_flag(struct task_struct *p) {}
+static inline bool freezing(struct task_struct *p) { return false; }

static inline bool __refrigerator(bool check_kthr_stop) { return false; }
static inline int freeze_processes(void) { BUG(); return 0; }
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 0478c35..4e82525 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -145,7 +145,11 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup_subsys *ss,
static void freezer_destroy(struct cgroup_subsys *ss,
struct cgroup *cgroup)
{
- kfree(cgroup_freezer(cgroup));
+ struct freezer *freezer = cgroup_freezer(cgroup);
+
+ if (freezer->state != CGROUP_THAWED)
+ atomic_dec(&system_freezing_cnt);
+ kfree(freezer);
}

/*
@@ -311,9 +315,11 @@ static int freezer_change_state(struct cgroup *cgroup,

switch (goal_state) {
case CGROUP_THAWED:
+ atomic_dec(&system_freezing_cnt);
unfreeze_cgroup(cgroup, freezer);
break;
case CGROUP_FROZEN:
+ atomic_inc(&system_freezing_cnt);
retval = try_to_freeze_cgroup(cgroup, freezer);
break;
default:
diff --git a/kernel/fork.c b/kernel/fork.c
index 8e6b6f4..fa7beb3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1000,7 +1000,6 @@ static void copy_flags(unsigned long clone_flags, struct task_struct *p)
new_flags |= PF_FORKNOEXEC;
new_flags |= PF_STARTING;
p->flags = new_flags;
- clear_freeze_flag(p);
}

SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 7506ef3..995bddf 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -11,9 +11,41 @@
#include <linux/freezer.h>
#include <linux/kthread.h>

+/* total number of freezing conditions in effect */
+atomic_t system_freezing_cnt = ATOMIC_INIT(0);
+EXPORT_SYMBOL(system_freezing_cnt);
+
+/* indicate whether PM freezing is in effect, protected by pm_mutex */
+bool pm_freezing;
+bool pm_nosig_freezing;
+
/* protects freezing and frozen transitions */
static DEFINE_SPINLOCK(freezer_lock);

+/**
+ * freezing_slow_path - slow path for testing whether a task needs to be frozen
+ * @p: task to be tested
+ *
+ * This function is called by freezing() if system_freezing_cnt isn't zero
+ * and tests whether @p needs to enter and stay in frozen state. Can be
+ * called under any context. The freezers are responsible for ensuring the
+ * target tasks see the updated state.
+ */
+bool freezing_slow_path(struct task_struct *p)
+{
+ if (p->flags & PF_NOFREEZE)
+ return false;
+
+ if (pm_nosig_freezing || cgroup_freezing(p))
+ return true;
+
+ if (pm_freezing && !(p->flags & PF_FREEZER_NOSIG))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL(freezing_slow_path);
+
/* Refrigerator is place where frozen processes are stored :-). */
bool __refrigerator(bool check_kthr_stop)
{
@@ -23,16 +55,10 @@ bool __refrigerator(bool check_kthr_stop)
long save;

/*
- * Enter FROZEN. If NOFREEZE, schedule immediate thawing by
- * clearing freezing.
+ * No point in checking freezing() again - the caller already did.
+ * Proceed to enter FROZEN.
*/
spin_lock_irq(&freezer_lock);
- if (!freezing(current)) {
- spin_unlock_irq(&freezer_lock);
- return was_frozen;
- }
- if (current->flags & PF_NOFREEZE)
- clear_freeze_flag(current);
current->flags |= PF_FROZEN;
spin_unlock_irq(&freezer_lock);

@@ -96,18 +122,12 @@ static void fake_signal_wake_up(struct task_struct *p)
bool freeze_task(struct task_struct *p, bool sig_only)
{
unsigned long flags;
- bool ret = false;

spin_lock_irqsave(&freezer_lock, flags);
-
- if ((p->flags & PF_NOFREEZE) ||
- (sig_only && !should_send_signal(p)))
- goto out_unlock;
-
- if (frozen(p))
- goto out_unlock;
-
- set_freeze_flag(p);
+ if (!freezing(p) || frozen(p)) {
+ spin_unlock_irqrestore(&freezer_lock, flags);
+ return false;
+ }

if (should_send_signal(p)) {
fake_signal_wake_up(p);
@@ -120,10 +140,9 @@ bool freeze_task(struct task_struct *p, bool sig_only)
} else {
wake_up_state(p, TASK_INTERRUPTIBLE);
}
- ret = true;
-out_unlock:
+
spin_unlock_irqrestore(&freezer_lock, flags);
- return ret;
+ return true;
}

void __thaw_task(struct task_struct *p)
@@ -140,7 +159,6 @@ void __thaw_task(struct task_struct *p)
* avoid leaving dangling TIF_SIGPENDING behind.
*/
spin_lock_irqsave(&freezer_lock, flags);
- clear_freeze_flag(p);
if (frozen(p)) {
wake_up_process(p);
} else {
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 1bbc363..fe06ddf 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -101,7 +101,7 @@ static int try_to_freeze_tasks(bool sig_only)
read_lock(&tasklist_lock);
do_each_thread(g, p) {
if (!wakeup && !freezer_should_skip(p) &&
- freezing(p) && !frozen(p))
+ p != current && freezing(p) && !frozen(p))
sched_show_task(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
@@ -120,13 +120,18 @@ int freeze_processes(void)
{
int error;

+ if (!pm_freezing)
+ atomic_inc(&system_freezing_cnt);
+
printk("Freezing user space processes ... ");
+ pm_freezing = true;
error = try_to_freeze_tasks(true);
if (error)
goto Exit;
printk("done.\n");

printk("Freezing remaining freezable tasks ... ");
+ pm_nosig_freezing = true;
error = try_to_freeze_tasks(false);
if (error)
goto Exit;
@@ -146,6 +151,11 @@ void thaw_processes(void)
{
struct task_struct *g, *p;

+ if (pm_freezing)
+ atomic_dec(&system_freezing_cnt);
+ pm_freezing = false;
+ pm_nosig_freezing = false;
+
oom_killer_enable();

printk("Restarting tasks ... ");
@@ -154,9 +164,6 @@ void thaw_processes(void)

read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (cgroup_freezing(p))
- continue;
-
__thaw_task(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
--
1.7.6
Paul Menage
2011-08-19 15:43:22 UTC
Permalink
Post by Tejun Heo
Using TIF_FREEZE for freezing worked when there was only single
freezing condition (the system-wide one); however, now there also is
the cgroup_freezer and single bit flag is getting clumsy.
thaw_processes() is already testing whether cgroup freezing in in
effect to avoid thawing tasks which were frozen by both system and
cgroup freezers.
This is racy (nothing prevents race against cgroup freezing) and
fragile. =A0A much simpler way is to test actual freeze conditions fr=
om
Post by Tejun Heo
freezing() - ie. directly test whether system or cgroup freezing is i=
n
Post by Tejun Heo
effect.
This patch adds variables to indicate whether and what type of system
freezing is in effect and reimplements freezing() such that it
directly tests whether any of the two freezing conditions is active
and the task should freeze. =A0On fast path, freezing() is still very
cheap, it only tests system_freezing_cnt.
This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
freeze/thaw operations more usual - updating state variables for the
new state and nudging target tasks so that they notice the new state
and comply. =A0As long as the nudging happens after state update, it'=
s
Post by Tejun Heo
race-free.
* This allows use of freezing() in freeze_task(). =A0Replace the open
=A0coded tests with freezing().
* p !=3D current test is added to warning printing conditions in
=A0try_to_freeze_tasks() failure path. =A0This is necessary as freezi=
ng()
Post by Tejun Heo
=A0is now true for the task which initiated freezing too.
---
=A0include/linux/freezer.h | =A0 33 +++++++++----------------
=A0kernel/cgroup_freezer.c | =A0 =A08 +++++-
=A0kernel/fork.c =A0 =A0 =A0 =A0 =A0 | =A0 =A01 -
=A0kernel/freezer.c =A0 =A0 =A0 =A0| =A0 62 +++++++++++++++++++++++++=
+++++----------------
Post by Tejun Heo
=A0kernel/power/process.c =A0| =A0 15 ++++++++---
=A05 files changed, 70 insertions(+), 49 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index bcc0637..11b2bd9 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -5,8 +5,13 @@
=A0#include <linux/sched.h>
=A0#include <linux/wait.h>
+#include <linux/atomic.h>
=A0#ifdef CONFIG_FREEZER
+extern atomic_t system_freezing_cnt; =A0 /* nr of freezing conds in =
effect */
Post by Tejun Heo
+extern bool pm_freezing; =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* PM freezing =
in effect */
Post by Tejun Heo
+extern bool pm_nosig_freezing; =A0 =A0 =A0 =A0 /* PM nosig freezing =
in effect */
Post by Tejun Heo
+
=A0/*
=A0* Check if a process has been frozen
=A0*/
@@ -15,28 +20,16 @@ static inline int frozen(struct task_struct *p)
=A0 =A0 =A0 =A0return p->flags & PF_FROZEN;
=A0}
-/*
- * Check if there is a request to freeze a process
- */
-static inline int freezing(struct task_struct *p)
-{
- =A0 =A0 =A0 return test_tsk_thread_flag(p, TIF_FREEZE);
-}
+extern bool freezing_slow_path(struct task_struct *p);
=A0/*
- * Request that a process be frozen
- */
-static inline void set_freeze_flag(struct task_struct *p)
-{
- =A0 =A0 =A0 set_tsk_thread_flag(p, TIF_FREEZE);
-}
-
-/*
- * Sometimes we may need to cancel the previous 'freeze' request
+ * Check if there is a request to freeze a process
=A0*/
-static inline void clear_freeze_flag(struct task_struct *p)
+static inline bool freezing(struct task_struct *p)
=A0{
- =A0 =A0 =A0 clear_tsk_thread_flag(p, TIF_FREEZE);
+ =A0 =A0 =A0 if (likely(!atomic_read(&system_freezing_cnt)))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return false;
+ =A0 =A0 =A0 return freezing_slow_path(p);
=A0}
=A0static inline bool should_send_signal(struct task_struct *p)
@@ -164,9 +157,7 @@ static inline bool set_freezable_with_signal(void=
)
Post by Tejun Heo
=A0})
=A0#else /* !CONFIG_FREEZER */
=A0static inline int frozen(struct task_struct *p) { return 0; }
-static inline int freezing(struct task_struct *p) { return 0; }
-static inline void set_freeze_flag(struct task_struct *p) {}
-static inline void clear_freeze_flag(struct task_struct *p) {}
+static inline bool freezing(struct task_struct *p) { return false; }
=A0static inline bool __refrigerator(bool check_kthr_stop) { return f=
alse; }
Post by Tejun Heo
=A0static inline int freeze_processes(void) { BUG(); return 0; }
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 0478c35..4e82525 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -145,7 +145,11 @@ static struct cgroup_subsys_state *freezer_creat=
e(struct cgroup_subsys *ss,
Post by Tejun Heo
=A0static void freezer_destroy(struct cgroup_subsys *ss,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct cgroup =
*cgroup)
Post by Tejun Heo
=A0{
- =A0 =A0 =A0 kfree(cgroup_freezer(cgroup));
+ =A0 =A0 =A0 struct freezer *freezer =3D cgroup_freezer(cgroup);
+
+ =A0 =A0 =A0 if (freezer->state !=3D CGROUP_THAWED)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_dec(&system_freezing_cnt);
+ =A0 =A0 =A0 kfree(freezer);
=A0}
=A0/*
@@ -311,9 +315,11 @@ static int freezer_change_state(struct cgroup *c=
group,
Post by Tejun Heo
=A0 =A0 =A0 =A0switch (goal_state) {
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_dec(&system_freezing_cnt);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0unfreeze_cgroup(cgroup, freezer);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_inc(&system_freezing_cnt);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0retval =3D try_to_freeze_cgroup(cgroup=
, freezer);
Post by Tejun Heo
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
diff --git a/kernel/fork.c b/kernel/fork.c
index 8e6b6f4..fa7beb3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1000,7 +1000,6 @@ static void copy_flags(unsigned long clone_flag=
s, struct task_struct *p)
Post by Tejun Heo
=A0 =A0 =A0 =A0new_flags |=3D PF_FORKNOEXEC;
=A0 =A0 =A0 =A0new_flags |=3D PF_STARTING;
=A0 =A0 =A0 =A0p->flags =3D new_flags;
- =A0 =A0 =A0 clear_freeze_flag(p);
=A0}
=A0SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 7506ef3..995bddf 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -11,9 +11,41 @@
=A0#include <linux/freezer.h>
=A0#include <linux/kthread.h>
+/* total number of freezing conditions in effect */
+atomic_t system_freezing_cnt =3D ATOMIC_INIT(0);
+EXPORT_SYMBOL(system_freezing_cnt);
+
+/* indicate whether PM freezing is in effect, protected by pm_mutex =
*/
Post by Tejun Heo
+bool pm_freezing;
+bool pm_nosig_freezing;
+
=A0/* protects freezing and frozen transitions */
=A0static DEFINE_SPINLOCK(freezer_lock);
+/**
+ * freezing_slow_path - slow path for testing whether a task needs t=
o be frozen
Post by Tejun Heo
+ *
+ * This function is called by freezing() if system_freezing_cnt isn'=
t zero
Can be
Post by Tejun Heo
+ * called under any context. =A0The freezers are responsible for ens=
uring the
Post by Tejun Heo
+ * target tasks see the updated state.
+ */
+bool freezing_slow_path(struct task_struct *p)
+{
+ =A0 =A0 =A0 if (p->flags & PF_NOFREEZE)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return false;
+
+ =A0 =A0 =A0 if (pm_nosig_freezing || cgroup_freezing(p))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return true;
+
+ =A0 =A0 =A0 if (pm_freezing && !(p->flags & PF_FREEZER_NOSIG))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return true;
+
+ =A0 =A0 =A0 return false;
+}
+EXPORT_SYMBOL(freezing_slow_path);
+
=A0/* Refrigerator is place where frozen processes are stored :-). */
=A0bool __refrigerator(bool check_kthr_stop)
=A0{
@@ -23,16 +55,10 @@ bool __refrigerator(bool check_kthr_stop)
=A0 =A0 =A0 =A0long save;
=A0 =A0 =A0 =A0/*
- =A0 =A0 =A0 =A0* Enter FROZEN. =A0If NOFREEZE, schedule immediate t=
hawing by
Post by Tejun Heo
- =A0 =A0 =A0 =A0* clearing freezing.
+ =A0 =A0 =A0 =A0* No point in checking freezing() again - the caller=
already did.
Post by Tejun Heo
+ =A0 =A0 =A0 =A0* Proceed to enter FROZEN.
=A0 =A0 =A0 =A0 */
=A0 =A0 =A0 =A0spin_lock_irq(&freezer_lock);
- =A0 =A0 =A0 if (!freezing(current)) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock_irq(&freezer_lock);
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return was_frozen;
- =A0 =A0 =A0 }
- =A0 =A0 =A0 if (current->flags & PF_NOFREEZE)
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 clear_freeze_flag(current);
=A0 =A0 =A0 =A0current->flags |=3D PF_FROZEN;
=A0 =A0 =A0 =A0spin_unlock_irq(&freezer_lock);
@@ -96,18 +122,12 @@ static void fake_signal_wake_up(struct task_stru=
ct *p)
Post by Tejun Heo
=A0bool freeze_task(struct task_struct *p, bool sig_only)
=A0{
=A0 =A0 =A0 =A0unsigned long flags;
- =A0 =A0 =A0 bool ret =3D false;
=A0 =A0 =A0 =A0spin_lock_irqsave(&freezer_lock, flags);
-
- =A0 =A0 =A0 if ((p->flags & PF_NOFREEZE) ||
- =A0 =A0 =A0 =A0 =A0 (sig_only && !should_send_signal(p)))
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out_unlock;
-
- =A0 =A0 =A0 if (frozen(p))
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out_unlock;
-
- =A0 =A0 =A0 set_freeze_flag(p);
+ =A0 =A0 =A0 if (!freezing(p) || frozen(p)) {
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock_irqrestore(&freezer_lock, f=
lags);
Post by Tejun Heo
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return false;
+ =A0 =A0 =A0 }
=A0 =A0 =A0 =A0if (should_send_signal(p)) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fake_signal_wake_up(p);
@@ -120,10 +140,9 @@ bool freeze_task(struct task_struct *p, bool sig=
_only)
Post by Tejun Heo
=A0 =A0 =A0 =A0} else {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up_state(p, TASK_INTERRUPTIBLE);
=A0 =A0 =A0 =A0}
- =A0 =A0 =A0 ret =3D true;
+
=A0 =A0 =A0 =A0spin_unlock_irqrestore(&freezer_lock, flags);
- =A0 =A0 =A0 return ret;
+ =A0 =A0 =A0 return true;
=A0}
=A0void __thaw_task(struct task_struct *p)
@@ -140,7 +159,6 @@ void __thaw_task(struct task_struct *p)
=A0 =A0 =A0 =A0 * avoid leaving dangling TIF_SIGPENDING behind.
=A0 =A0 =A0 =A0 */
=A0 =A0 =A0 =A0spin_lock_irqsave(&freezer_lock, flags);
- =A0 =A0 =A0 clear_freeze_flag(p);
=A0 =A0 =A0 =A0if (frozen(p)) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up_process(p);
=A0 =A0 =A0 =A0} else {
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 1bbc363..fe06ddf 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -101,7 +101,7 @@ static int try_to_freeze_tasks(bool sig_only)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0read_lock(&tasklist_lock);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0do_each_thread(g, p) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!wakeup && !freeze=
r_should_skip(p) &&
Post by Tejun Heo
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 freezing(p) && =
!frozen(p))
Post by Tejun Heo
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 p !=3D current =
&& freezing(p) && !frozen(p))
Post by Tejun Heo
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sched_=
show_task(p);
Post by Tejun Heo
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} while_each_thread(g, p);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0read_unlock(&tasklist_lock);
@@ -120,13 +120,18 @@ int freeze_processes(void)
=A0{
=A0 =A0 =A0 =A0int error;
+ =A0 =A0 =A0 if (!pm_freezing)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_inc(&system_freezing_cnt);
+
=A0 =A0 =A0 =A0printk("Freezing user space processes ... ");
+ =A0 =A0 =A0 pm_freezing =3D true;
=A0 =A0 =A0 =A0error =3D try_to_freeze_tasks(true);
=A0 =A0 =A0 =A0if (error)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto Exit;
=A0 =A0 =A0 =A0printk("done.\n");
=A0 =A0 =A0 =A0printk("Freezing remaining freezable tasks ... ");
+ =A0 =A0 =A0 pm_nosig_freezing =3D true;
=A0 =A0 =A0 =A0error =3D try_to_freeze_tasks(false);
=A0 =A0 =A0 =A0if (error)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto Exit;
@@ -146,6 +151,11 @@ void thaw_processes(void)
=A0{
=A0 =A0 =A0 =A0struct task_struct *g, *p;
+ =A0 =A0 =A0 if (pm_freezing)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_dec(&system_freezing_cnt);
+ =A0 =A0 =A0 pm_freezing =3D false;
+ =A0 =A0 =A0 pm_nosig_freezing =3D false;
+
=A0 =A0 =A0 =A0oom_killer_enable();
=A0 =A0 =A0 =A0printk("Restarting tasks ... ");
@@ -154,9 +164,6 @@ void thaw_processes(void)
=A0 =A0 =A0 =A0read_lock(&tasklist_lock);
=A0 =A0 =A0 =A0do_each_thread(g, p) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (cgroup_freezing(p))
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue;
-
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0__thaw_task(p);
=A0 =A0 =A0 =A0} while_each_thread(g, p);
=A0 =A0 =A0 =A0read_unlock(&tasklist_lock);
--
1.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kerne=
l" in
Post by Tejun Heo
More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
Please read the FAQ at =A0http://www.tux.org/lkml/
Oleg Nesterov
2011-08-29 15:49:44 UTC
Permalink
Post by Tejun Heo
@@ -311,9 +315,11 @@ static int freezer_change_state(struct cgroup *cgroup,
switch (goal_state) {
+ atomic_dec(&system_freezing_cnt);
unfreeze_cgroup(cgroup, freezer);
break;
+ atomic_inc(&system_freezing_cnt);
This is harmless, but afaics is not exactly right. CGROUP_FROZEN doesn't
need system_freezing_cnt != 0, everything is already frozen and we just
provoke freezing_slow_path() without any reason. Right?
Post by Tejun Heo
@@ -120,13 +120,18 @@ int freeze_processes(void)
{
int error;
+ if (!pm_freezing)
+ atomic_inc(&system_freezing_cnt);
+
printk("Freezing user space processes ... ");
+ pm_freezing = true;
and
Post by Tejun Heo
@@ -146,6 +151,11 @@ void thaw_processes(void)
{
struct task_struct *g, *p;
+ if (pm_freezing)
+ atomic_dec(&system_freezing_cnt);
+ pm_freezing = false;
I simply can't understand this... Why freeze_processes/thaw_processes
check pm_freezing?

IIUC, the calls to freeze/thaw should be serialized anyway (probably
pm_mutex ?). Otherwise this check can't help anyway. Say, _if_ it
is possible to call freeze_processes() with pm_freezing == T, then
the failure path or subsequent thaw_processes() will do the unbalanced
atomic_dec().

Oleg.
Oleg Nesterov
2011-08-29 15:56:40 UTC
Permalink
I'm afraid I wasn't clear....
Post by Oleg Nesterov
Post by Tejun Heo
@@ -311,9 +315,11 @@ static int freezer_change_state(struct cgroup *cgroup,
switch (goal_state) {
+ atomic_dec(&system_freezing_cnt);
unfreeze_cgroup(cgroup, freezer);
break;
+ atomic_inc(&system_freezing_cnt);
This is harmless, but afaics is not exactly right. CGROUP_FROZEN doesn't
need system_freezing_cnt != 0, everything is already frozen and we just
provoke freezing_slow_path() without any reason. Right?
Of course, this atomic_inc() is right, we are going to call
try_to_freeze_cgroup(). But probably it makes sense to do atomic_dec()
when freezer->state becomes CGROUP_FROZEN.

Oleg.
Oleg Nesterov
2011-08-29 16:30:47 UTC
Permalink
Post by Oleg Nesterov
I'm afraid I wasn't clear....
Post by Oleg Nesterov
Post by Tejun Heo
@@ -311,9 +315,11 @@ static int freezer_change_state(struct cgroup *cgroup,
switch (goal_state) {
+ atomic_dec(&system_freezing_cnt);
unfreeze_cgroup(cgroup, freezer);
break;
+ atomic_inc(&system_freezing_cnt);
This is harmless, but afaics is not exactly right. CGROUP_FROZEN doesn't
need system_freezing_cnt != 0, everything is already frozen and we just
provoke freezing_slow_path() without any reason. Right?
Of course, this atomic_inc() is right, we are going to call
try_to_freeze_cgroup(). But probably it makes sense to do atomic_dec()
when freezer->state becomes CGROUP_FROZEN.
Damn, I was wrong again.

No, we can't do this. __refrigerator() should handle the spurious
wakeups correctly, it checks checks freezing() in the main loop.

Oleg.
Oleg Nesterov
2011-08-29 16:17:50 UTC
Permalink
Post by Oleg Nesterov
Post by Tejun Heo
@@ -120,13 +120,18 @@ int freeze_processes(void)
{
int error;
+ if (!pm_freezing)
+ atomic_inc(&system_freezing_cnt);
+
printk("Freezing user space processes ... ");
+ pm_freezing = true;
and
Post by Tejun Heo
@@ -146,6 +151,11 @@ void thaw_processes(void)
{
struct task_struct *g, *p;
+ if (pm_freezing)
+ atomic_dec(&system_freezing_cnt);
+ pm_freezing = false;
I simply can't understand this... Why freeze_processes/thaw_processes
check pm_freezing?
Ah, wait, I seem to understand.
Post by Oleg Nesterov
IIUC, the calls to freeze/thaw should be serialized anyway
Yes, and that is why this should actually work, I think.

Sorry for noise...

Oleg.
Tejun Heo
2011-08-19 14:16:16 UTC
Permalink
A kthread doing set_freezable*() may race with on-going PM freeze and
the freezer might think all tasks are frozen while the new freezable
kthread is marrily proceeding to execute code paths which aren't
supposed to be executing during PM freeze.

This can be fixed by modifying and testing the related PF flags inside
freezer_lock and removing mostly unnecessary early tests from the
callers of freeze/thaw_task().

* Remove all unnecessary tests from kerne/power/process.c and add
PF_NOFREEZE test to freeze_task(), which is meaningful as it avoids
sending unsolicted wakeups to nofreeze tasks.

* Reimplement set_freezable[_with_signal]() using __set_freezable()
such that freezable PF flags are modified under freezer_lock and
try_to_freeze() is called afterwards. Combined with the above
change, this eliminates race condition against freezing.

Signed-off-by: Tejun Heo <***@kernel.org>
---
include/linux/freezer.h | 9 +++++----
kernel/freezer.c | 28 +++++++++++++++++++++++++++-
kernel/power/process.c | 16 +---------------
3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 80a455d..59a6e97 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -61,6 +61,7 @@ static inline bool try_to_freeze(void)

extern bool freeze_task(struct task_struct *p, bool sig_only);
extern void cancel_freezing(struct task_struct *p);
+extern bool __set_freezable(bool with_signal);

#ifdef CONFIG_CGROUP_FREEZER
extern int cgroup_freezing_or_frozen(struct task_struct *task);
@@ -118,18 +119,18 @@ static inline int freezer_should_skip(struct task_struct *p)
/*
* Tell the freezer that the current task should be frozen by it
*/
-static inline void set_freezable(void)
+static inline bool set_freezable(void)
{
- current->flags &= ~PF_NOFREEZE;
+ return __set_freezable(false);
}

/*
* Tell the freezer that the current task should be frozen by it and that it
* should send a fake signal to the task to freeze it.
*/
-static inline void set_freezable_with_signal(void)
+static inline bool set_freezable_with_signal(void)
{
- current->flags &= ~(PF_NOFREEZE | PF_FREEZER_NOSIG);
+ return __set_freezable(true);
}

/*
diff --git a/kernel/freezer.c b/kernel/freezer.c
index d6165cd..501f1b7 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -106,7 +106,8 @@ bool freeze_task(struct task_struct *p, bool sig_only)

spin_lock_irqsave(&freezer_lock, flags);

- if (sig_only && !should_send_signal(p))
+ if ((p->flags & PF_NOFREEZE) ||
+ (sig_only && !should_send_signal(p)))
goto out_unlock;

if (frozen(p))
@@ -162,3 +163,28 @@ void __thaw_task(struct task_struct *p)
wake_up_process(p);
spin_unlock_irqrestore(&freezer_lock, flags);
}
+
+/**
+ * __set_freezable - make %current freezable
+ * @with_signal: do we want %TIF_SIGPENDING for notification too?
+ *
+ * Mark %current freezable and enter refrigerator if necessary.
+ */
+bool __set_freezable(bool with_signal)
+{
+ might_sleep();
+
+ /*
+ * Modify flags while holding freezer_lock. This ensures the
+ * freezer notices that we aren't frozen yet or the freezing
+ * condition is visible to try_to_freeze() below.
+ */
+ spin_lock_irq(&freezer_lock);
+ current->flags &= ~PF_NOFREEZE;
+ if (with_signal)
+ current->flags &= ~PF_FREEZER_NOSIG;
+ spin_unlock_irq(&freezer_lock);
+
+ return try_to_freeze();
+}
+EXPORT_SYMBOL(__set_freezable);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index f075c2f..7618909 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -22,14 +22,6 @@
*/
#define TIMEOUT (20 * HZ)

-static inline int freezable(struct task_struct * p)
-{
- if ((p == current) ||
- (p->flags & PF_NOFREEZE))
- return 0;
- return 1;
-}
-
static int try_to_freeze_tasks(bool sig_only)
{
struct task_struct *g, *p;
@@ -52,10 +44,7 @@ static int try_to_freeze_tasks(bool sig_only)
todo = 0;
read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (frozen(p) || !freezable(p))
- continue;
-
- if (!freeze_task(p, sig_only))
+ if (p == current || !freeze_task(p, sig_only))
continue;

/*
@@ -171,9 +160,6 @@ void thaw_processes(void)

read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (!freezable(p))
- continue;
-
if (cgroup_freezing_or_frozen(p))
continue;
--
1.7.6
Oleg Nesterov
2011-08-28 18:01:06 UTC
Permalink
Post by Tejun Heo
A kthread doing set_freezable*() may race with on-going PM freeze and
the freezer might think all tasks are frozen while the new freezable
kthread is marrily proceeding to execute code paths which aren't
supposed to be executing during PM freeze.
Yes,
Post by Tejun Heo
+bool __set_freezable(bool with_signal)
+{
+ might_sleep();
+
+ /*
+ * Modify flags while holding freezer_lock. This ensures the
+ * freezer notices that we aren't frozen yet or the freezing
+ * condition is visible to try_to_freeze() below.
+ */
+ spin_lock_irq(&freezer_lock);
+ current->flags &= ~PF_NOFREEZE;
+ if (with_signal)
+ current->flags &= ~PF_FREEZER_NOSIG;
+ spin_unlock_irq(&freezer_lock);
+
+ return try_to_freeze();
+}
You know, I was really, really puzzled by this change. Because it
"obviously can fix nothing", and try_to_freeze() makes no sense
(ignoring the very unlikely case when TIF_FREEZING was set right
after we drop freezer_lock).

But I guess this works along with 14/16 which removes TIF_FREEZING.

Oleg.
Tejun Heo
2011-08-29 07:38:28 UTC
Permalink
Post by Oleg Nesterov
You know, I was really, really puzzled by this change. Because it
"obviously can fix nothing", and try_to_freeze() makes no sense
(ignoring the very unlikely case when TIF_FREEZING was set right
after we drop freezer_lock).
But I guess this works along with 14/16 which removes TIF_FREEZING.
Ah, yeah, another case of bad sequencing. Sorry about that.

Thanks.
--
tejun
Tejun Heo
2011-08-19 14:16:19 UTC
Permalink
TIF_FREEZE will be removed soon and freezing() will directly test
whether any freezing condition is in effect. Make the following
changes in preparation.

* Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
return bool.

* Make cgroup_freezing() access task_freezer() under rcu read lock
instead of task_lock(). This makes the state dereferencing racy
against task moving to another cgroup; however, it was already racy
without this change as ->state dereference wasn't synchronized.
This will be later dealt with using attach hooks.

* freezer->state is now set before trying to push tasks into the
target state.

Signed-off-by: Tejun Heo <***@kernel.org>
Cc: Paul Menage <***@google.com>
Cc: Li Zefan <***@cn.fujitsu.com>
---
include/linux/freezer.h | 6 +++---
kernel/cgroup_freezer.c | 36 ++++++++++++------------------------
kernel/power/process.c | 2 +-
3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 13559a8..bcc0637 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -63,11 +63,11 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
extern bool __set_freezable(bool with_signal);

#ifdef CONFIG_CGROUP_FREEZER
-extern int cgroup_freezing_or_frozen(struct task_struct *task);
+extern bool cgroup_freezing(struct task_struct *task);
#else /* !CONFIG_CGROUP_FREEZER */
-static inline int cgroup_freezing_or_frozen(struct task_struct *task)
+static inline bool cgroup_freezing(struct task_struct *task)
{
- return 0;
+ return false;
}
#endif /* !CONFIG_CGROUP_FREEZER */

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7fa0ec..0478c35 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -48,19 +48,17 @@ static inline struct freezer *task_freezer(struct task_struct *task)
struct freezer, css);
}

-static inline int __cgroup_freezing_or_frozen(struct task_struct *task)
+bool cgroup_freezing(struct task_struct *task)
{
- enum freezer_state state = task_freezer(task)->state;
- return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
-}
+ enum freezer_state state;
+ bool ret;

-int cgroup_freezing_or_frozen(struct task_struct *task)
-{
- int result;
- task_lock(task);
- result = __cgroup_freezing_or_frozen(task);
- task_unlock(task);
- return result;
+ rcu_read_lock();
+ state = task_freezer(task)->state;
+ ret = state == CGROUP_FREEZING || state == CGROUP_FROZEN;
+ rcu_read_unlock();
+
+ return ret;
}

/*
@@ -102,9 +100,6 @@ struct cgroup_subsys freezer_subsys;
* freezer_can_attach():
* cgroup_mutex (held by caller of can_attach)
*
- * cgroup_freezing_or_frozen():
- * task->alloc_lock (to get task's cgroup)
- *
* freezer_fork() (preserving fork() performance means can't take cgroup_mutex):
* freezer->lock
* sighand->siglock (if the cgroup is freezing)
@@ -177,13 +172,7 @@ static int freezer_can_attach(struct cgroup_subsys *ss,

static int freezer_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
{
- rcu_read_lock();
- if (__cgroup_freezing_or_frozen(tsk)) {
- rcu_read_unlock();
- return -EBUSY;
- }
- rcu_read_unlock();
- return 0;
+ return cgroup_freezing(tsk) ? -EBUSY : 0;
}

static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
@@ -279,7 +268,6 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
struct task_struct *task;
unsigned int num_cant_freeze_now = 0;

- freezer->state = CGROUP_FREEZING;
cgroup_iter_start(cgroup, &it);
while ((task = cgroup_iter_next(cgroup, &it))) {
if (!freeze_task(task, true))
@@ -303,8 +291,6 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
while ((task = cgroup_iter_next(cgroup, &it)))
__thaw_task(task);
cgroup_iter_end(cgroup, &it);
-
- freezer->state = CGROUP_THAWED;
}

static int freezer_change_state(struct cgroup *cgroup,
@@ -321,6 +307,8 @@ static int freezer_change_state(struct cgroup *cgroup,
if (goal_state == freezer->state)
goto out;

+ freezer->state = goal_state;
+
switch (goal_state) {
case CGROUP_THAWED:
unfreeze_cgroup(cgroup, freezer);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index f8012f2..1bbc363 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -154,7 +154,7 @@ void thaw_processes(void)

read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (cgroup_freezing_or_frozen(p))
+ if (cgroup_freezing(p))
continue;

__thaw_task(p);
--
1.7.6
Paul Menage
2011-08-19 15:40:52 UTC
Permalink
Post by Tejun Heo
TIF_FREEZE will be removed soon and freezing() will directly test
whether any freezing condition is in effect. =A0Make the following
changes in preparation.
* Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
=A0return bool.
* Make cgroup_freezing() access task_freezer() under rcu read lock
=A0instead of task_lock(). =A0This makes the state dereferencing racy
=A0against task moving to another cgroup; however, it was already rac=
y
Post by Tejun Heo
=A0without this change as ->state dereference wasn't synchronized.
=A0This will be later dealt with using attach hooks.
* freezer->state is now set before trying to push tasks into the
=A0target state.
---
=A0include/linux/freezer.h | =A0 =A06 +++---
=A0kernel/cgroup_freezer.c | =A0 36 ++++++++++++---------------------=
---
Post by Tejun Heo
=A0kernel/power/process.c =A0| =A0 =A02 +-
=A03 files changed, 16 insertions(+), 28 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 13559a8..bcc0637 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -63,11 +63,11 @@ extern bool freeze_task(struct task_struct *p, bo=
ol sig_only);
Post by Tejun Heo
=A0extern bool __set_freezable(bool with_signal);
=A0#ifdef CONFIG_CGROUP_FREEZER
-extern int cgroup_freezing_or_frozen(struct task_struct *task);
+extern bool cgroup_freezing(struct task_struct *task);
=A0#else /* !CONFIG_CGROUP_FREEZER */
-static inline int cgroup_freezing_or_frozen(struct task_struct *task=
)
Post by Tejun Heo
+static inline bool cgroup_freezing(struct task_struct *task)
=A0{
- =A0 =A0 =A0 return 0;
+ =A0 =A0 =A0 return false;
=A0}
=A0#endif /* !CONFIG_CGROUP_FREEZER */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7fa0ec..0478c35 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -48,19 +48,17 @@ static inline struct freezer *task_freezer(struct=
task_struct *task)
Post by Tejun Heo
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct freezer=
, css);
Post by Tejun Heo
=A0}
-static inline int __cgroup_freezing_or_frozen(struct task_struct *ta=
sk)
Post by Tejun Heo
+bool cgroup_freezing(struct task_struct *task)
=A0{
- =A0 =A0 =A0 enum freezer_state state =3D task_freezer(task)->state;
- =A0 =A0 =A0 return (state =3D=3D CGROUP_FREEZING) || (state =3D=3D =
CGROUP_FROZEN);
Post by Tejun Heo
-}
+ =A0 =A0 =A0 enum freezer_state state;
+ =A0 =A0 =A0 bool ret;
-int cgroup_freezing_or_frozen(struct task_struct *task)
-{
- =A0 =A0 =A0 int result;
- =A0 =A0 =A0 task_lock(task);
- =A0 =A0 =A0 result =3D __cgroup_freezing_or_frozen(task);
- =A0 =A0 =A0 task_unlock(task);
- =A0 =A0 =A0 return result;
+ =A0 =A0 =A0 rcu_read_lock();
+ =A0 =A0 =A0 state =3D task_freezer(task)->state;
+ =A0 =A0 =A0 ret =3D state =3D=3D CGROUP_FREEZING || state =3D=3D CG=
ROUP_FROZEN;
Post by Tejun Heo
+ =A0 =A0 =A0 rcu_read_unlock();
+
+ =A0 =A0 =A0 return ret;
=A0}
=A0/*
@@ -102,9 +100,6 @@ struct cgroup_subsys freezer_subsys;
=A0* cgroup_mutex (held by caller of can_attach)
=A0*
- * task->alloc_lock (to get task's cgroup)
- *
=A0* freezer_fork() (preserving fork() performance means can't take c=
=A0* freezer->lock
=A0* =A0sighand->siglock (if the cgroup is freezing)
@@ -177,13 +172,7 @@ static int freezer_can_attach(struct cgroup_subs=
ys *ss,
Post by Tejun Heo
=A0static int freezer_can_attach_task(struct cgroup *cgrp, struct tas=
k_struct *tsk)
Post by Tejun Heo
=A0{
- =A0 =A0 =A0 rcu_read_lock();
- =A0 =A0 =A0 if (__cgroup_freezing_or_frozen(tsk)) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 rcu_read_unlock();
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EBUSY;
- =A0 =A0 =A0 }
- =A0 =A0 =A0 rcu_read_unlock();
- =A0 =A0 =A0 return 0;
+ =A0 =A0 =A0 return cgroup_freezing(tsk) ? -EBUSY : 0;
=A0}
=A0static void freezer_fork(struct cgroup_subsys *ss, struct task_str=
uct *task)
Post by Tejun Heo
@@ -279,7 +268,6 @@ static int try_to_freeze_cgroup(struct cgroup *cg=
roup, struct freezer *freezer)
Post by Tejun Heo
=A0 =A0 =A0 =A0struct task_struct *task;
=A0 =A0 =A0 =A0unsigned int num_cant_freeze_now =3D 0;
- =A0 =A0 =A0 freezer->state =3D CGROUP_FREEZING;
=A0 =A0 =A0 =A0cgroup_iter_start(cgroup, &it);
=A0 =A0 =A0 =A0while ((task =3D cgroup_iter_next(cgroup, &it))) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!freeze_task(task, true))
@@ -303,8 +291,6 @@ static void unfreeze_cgroup(struct cgroup *cgroup=
, struct freezer *freezer)
Post by Tejun Heo
=A0 =A0 =A0 =A0while ((task =3D cgroup_iter_next(cgroup, &it)))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0__thaw_task(task);
=A0 =A0 =A0 =A0cgroup_iter_end(cgroup, &it);
-
- =A0 =A0 =A0 freezer->state =3D CGROUP_THAWED;
=A0}
=A0static int freezer_change_state(struct cgroup *cgroup,
@@ -321,6 +307,8 @@ static int freezer_change_state(struct cgroup *cg=
roup,
Post by Tejun Heo
=A0 =A0 =A0 =A0if (goal_state =3D=3D freezer->state)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out;
+ =A0 =A0 =A0 freezer->state =3D goal_state;
+
=A0 =A0 =A0 =A0switch (goal_state) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0unfreeze_cgroup(cgroup, freezer);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index f8012f2..1bbc363 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -154,7 +154,7 @@ void thaw_processes(void)
=A0 =A0 =A0 =A0read_lock(&tasklist_lock);
=A0 =A0 =A0 =A0do_each_thread(g, p) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (cgroup_freezing_or_frozen(p))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (cgroup_freezing(p))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0__thaw_task(p);
--
1.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kerne=
l" in
Post by Tejun Heo
More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
Please read the FAQ at =A0http://www.tux.org/lkml/
Oleg Nesterov
2011-08-28 17:39:54 UTC
Permalink
Hi Tejun.

Well, yet another sorry for delay ;)
Post by Tejun Heo
@@ -279,7 +268,6 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
struct task_struct *task;
unsigned int num_cant_freeze_now = 0;
- freezer->state = CGROUP_FREEZING;
...
Post by Tejun Heo
@@ -321,6 +307,8 @@ static int freezer_change_state(struct cgroup *cgroup,
if (goal_state == freezer->state)
goto out;
+ freezer->state = goal_state;
+
This doesn't look right at all... Unless I misssed something.

A user writes "FROZEN" into the control file, our goal is CGROUP_FROZEN.

But. freezer_change_state() should set CGROUP_FREEZING, not CGROUP_FROZEN.
This cgrp is not frozen yet, _FROZEN should be set by update_if_frozen(),
assuming that the user reads the state waiting until the operation
completes.

Oleg.
Tejun Heo
2011-08-29 06:30:26 UTC
Permalink
Hello,
Post by Oleg Nesterov
Post by Tejun Heo
@@ -321,6 +307,8 @@ static int freezer_change_state(struct cgroup *cgroup,
if (goal_state == freezer->state)
goto out;
+ freezer->state = goal_state;
+
This doesn't look right at all... Unless I misssed something.
A user writes "FROZEN" into the control file, our goal is CGROUP_FROZEN.
But. freezer_change_state() should set CGROUP_FREEZING, not CGROUP_FROZEN.
This cgrp is not frozen yet, _FROZEN should be set by update_if_frozen(),
assuming that the user reads the state waiting until the operation
completes.
Oops, right. That should have been CGROUP_FREEZING. Will fix.

Thanks.
--
tejun
Tejun Heo
2011-08-19 14:16:18 UTC
Permalink
freeze_processes() failure path is rather messy. Freezing is canceled
for workqueues and tasks which aren't frozen yet but frozen tasks are
left alone and should be thawed by the caller and of course some
callers (xen and kexec) didn't do it.

This patch updates __thaw_task() to handle cancelation correctly and
makes thaw_processes() call thaw_processes() on failure instead so
that the system is fully thawed on failure. Unnecessary
thaw_processes() calls are removed from kernel/power/hibernate.c and
user.c.

Signed-off-by: Tejun Heo <***@kernel.org>
---
include/linux/freezer.h | 1 -
kernel/freezer.c | 25 +++++++++----------------
kernel/power/hibernate.c | 15 ++-------------
kernel/power/process.c | 10 ++--------
kernel/power/user.c | 4 +---
5 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 59a6e97..13559a8 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -60,7 +60,6 @@ static inline bool try_to_freeze(void)
}

extern bool freeze_task(struct task_struct *p, bool sig_only);
-extern void cancel_freezing(struct task_struct *p);
extern bool __set_freezable(bool with_signal);

#ifdef CONFIG_CGROUP_FREEZER
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 82332bb..7506ef3 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -126,21 +126,6 @@ out_unlock:
return ret;
}

-void cancel_freezing(struct task_struct *p)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&freezer_lock, flags);
- if (freezing(p)) {
- pr_debug(" clean up: %s\n", p->comm);
- clear_freeze_flag(p);
- spin_lock(&p->sighand->siglock);
- recalc_sigpending_and_wake(p);
- spin_unlock(&p->sighand->siglock);
- }
- spin_unlock_irqrestore(&freezer_lock, flags);
-}
-
void __thaw_task(struct task_struct *p)
{
unsigned long flags;
@@ -150,11 +135,19 @@ void __thaw_task(struct task_struct *p)
* be visible to @p as waking up implies wmb. Waking up inside
* freezer_lock also prevents wakeups from leaking outside
* refrigerator.
+ *
+ * If !FROZEN, @p hasn't reached refrigerator, recalc sigpending to
+ * avoid leaving dangling TIF_SIGPENDING behind.
*/
spin_lock_irqsave(&freezer_lock, flags);
clear_freeze_flag(p);
- if (frozen(p))
+ if (frozen(p)) {
wake_up_process(p);
+ } else {
+ spin_lock(&p->sighand->siglock);
+ recalc_sigpending_and_wake(p);
+ spin_unlock(&p->sighand->siglock);
+ }
spin_unlock_irqrestore(&freezer_lock, flags);
}

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 8f7b1db..5fed34d 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -586,17 +586,6 @@ static void power_down(void)
while(1);
}

-static int prepare_processes(void)
-{
- int error = 0;
-
- if (freeze_processes()) {
- error = -EBUSY;
- thaw_processes();
- }
- return error;
-}
-
/**
* hibernate - Carry out system hibernation, including saving the image.
*/
@@ -629,7 +618,7 @@ int hibernate(void)
sys_sync();
printk("done.\n");

- error = prepare_processes();
+ error = freeze_processes();
if (error)
goto Finish;

@@ -776,7 +765,7 @@ static int software_resume(void)
goto close_finish;

pr_debug("PM: Preparing processes for restore.\n");
- error = prepare_processes();
+ error = freeze_processes();
if (error) {
swsusp_close(FMODE_READ);
goto Done;
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 7618909..f8012f2 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -91,11 +91,6 @@ static int try_to_freeze_tasks(bool sig_only)
elapsed_csecs = elapsed_csecs64;

if (todo) {
- /* This does not unfreeze processes that are already frozen
- * (we have slightly ugly calling convention in that respect,
- * and caller must call thaw_processes() if something fails),
- * but it cleans up leftover PF_FREEZE requests.
- */
printk("\n");
printk(KERN_ERR "Freezing of tasks %s after %d.%02d seconds "
"(%d tasks refusing to freeze, wq_busy=%d):\n",
@@ -103,14 +98,11 @@ static int try_to_freeze_tasks(bool sig_only)
elapsed_csecs / 100, elapsed_csecs % 100,
todo - wq_busy, wq_busy);

- thaw_workqueues();
-
read_lock(&tasklist_lock);
do_each_thread(g, p) {
if (!wakeup && !freezer_should_skip(p) &&
freezing(p) && !frozen(p))
sched_show_task(p);
- cancel_freezing(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
} else {
@@ -142,6 +134,8 @@ int freeze_processes(void)

oom_killer_disable();
Exit:
+ if (error)
+ thaw_processes();
BUG_ON(in_atomic());
printk("\n");

diff --git a/kernel/power/user.c b/kernel/power/user.c
index 42ddbc6..afe7737 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -256,10 +256,8 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
break;

error = freeze_processes();
- if (error) {
- thaw_processes();
+ if (error)
usermodehelper_enable();
- }
if (!error)
data->frozen = 1;
break;
--
1.7.6
Oleg Nesterov
2011-08-28 18:09:01 UTC
Permalink
Post by Tejun Heo
void __thaw_task(struct task_struct *p)
{
unsigned long flags;
@@ -150,11 +135,19 @@ void __thaw_task(struct task_struct *p)
* freezer_lock also prevents wakeups from leaking outside
* refrigerator.
+ *
+ * avoid leaving dangling TIF_SIGPENDING behind.
*/
spin_lock_irqsave(&freezer_lock, flags);
clear_freeze_flag(p);
- if (frozen(p))
+ if (frozen(p)) {
wake_up_process(p);
+ } else {
+ spin_lock(&p->sighand->siglock);
+ recalc_sigpending_and_wake(p);
This was copied from cancel_, yes. but what this
recalc_sigpending_and_wake() can or should do?

It looks simply pointless. It can't clear TIF_SIGPENDING,
it can only do the unneeded/spurious wake_up(INTERRUPTIBLE).

Oleg.
Tejun Heo
2011-08-29 07:28:00 UTC
Permalink
Hello,
Post by Oleg Nesterov
This was copied from cancel_, yes. but what this
recalc_sigpending_and_wake() can or should do?
It looks simply pointless. It can't clear TIF_SIGPENDING,
it can only do the unneeded/spurious wake_up(INTERRUPTIBLE).
Yeah, I just copied it over. It seems it got confused between
recalc_sigpending() and recalc_sigpending_and_wake() where the latter
can't clear TIF_SIGPENDING. The signallable freezable kthread seems
quite fragile. Once TIF_SIGPENDING is set, the only way for it to be
cleared for a kthread is going through refrigerator and if freezing
gets canceled before that happens, the kthread will have stuck
TIF_SIGPENDING failing interruptible sleeps. We'll probably should
move recalc_sigpending() to the top of refrigerator.

But that said, I don't see any in-kernel user of
set_freezable_with_signal(). Rafael, is there scheduled to be some
new user of this API? It would be awesome if we can drop it.

Thanks.
--
tejun
Rafael J. Wysocki
2011-08-29 07:40:37 UTC
Permalink
Post by Tejun Heo
Hello,
Post by Oleg Nesterov
This was copied from cancel_, yes. but what this
recalc_sigpending_and_wake() can or should do?
It looks simply pointless. It can't clear TIF_SIGPENDING,
it can only do the unneeded/spurious wake_up(INTERRUPTIBLE).
Yeah, I just copied it over. It seems it got confused between
recalc_sigpending() and recalc_sigpending_and_wake() where the latter
can't clear TIF_SIGPENDING. The signallable freezable kthread seems
quite fragile. Once TIF_SIGPENDING is set, the only way for it to be
cleared for a kthread is going through refrigerator and if freezing
gets canceled before that happens, the kthread will have stuck
TIF_SIGPENDING failing interruptible sleeps. We'll probably should
move recalc_sigpending() to the top of refrigerator.
But that said, I don't see any in-kernel user of
set_freezable_with_signal(). Rafael, is there scheduled to be some
new user of this API?
I'm not aware of any.
Post by Tejun Heo
It would be awesome if we can drop it.
Indeed.

Thanks,
Rafael
Tejun Heo
2011-08-19 14:16:13 UTC
Permalink
There's no point in thawing nosig tasks before others. There's no
ordering requirement between the two groups on thaw, which the staged
thawing can't guarantee anyway. Simplify thaw_processes() by removing
the distinction and collapsing thaw_tasks() into thaw_processes().
This will help further updates to freezer.

Signed-off-by: Tejun Heo <***@kernel.org>
---
kernel/power/process.c | 20 +++++++-------------
1 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index ddfaba4..3eee548 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -160,34 +160,28 @@ int freeze_processes(void)
return error;
}

-static void thaw_tasks(bool nosig_only)
+void thaw_processes(void)
{
struct task_struct *g, *p;

+ oom_killer_enable();
+
+ printk("Restarting tasks ... ");
+
+ thaw_workqueues();
+
read_lock(&tasklist_lock);
do_each_thread(g, p) {
if (!freezable(p))
continue;

- if (nosig_only && should_send_signal(p))
- continue;
-
if (cgroup_freezing_or_frozen(p))
continue;

__thaw_task(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
-}
-
-void thaw_processes(void)
-{
- oom_killer_enable();

- printk("Restarting tasks ... ");
- thaw_workqueues();
- thaw_tasks(true);
- thaw_tasks(false);
schedule();
printk("done.\n");
}
--
1.7.6
Rafael J. Wysocki
2011-08-19 21:14:52 UTC
Permalink
Post by Tejun Heo
There's no point in thawing nosig tasks before others. There's no
ordering requirement between the two groups on thaw, which the staged
thawing can't guarantee anyway. Simplify thaw_processes() by removing
the distinction and collapsing thaw_tasks() into thaw_processes().
This will help further updates to freezer.
I'm not sure if I like this patch.

Right now there are no ordering requirements between the two groups
of processes, but if we decide to freeze filesystems on suspend,
we'll need to thaw them between nosig and sig I suppose.

Thanks,
Rafael
Post by Tejun Heo
---
kernel/power/process.c | 20 +++++++-------------
1 files changed, 7 insertions(+), 13 deletions(-)
diff --git a/kernel/power/process.c b/kernel/power/process.c
index ddfaba4..3eee548 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -160,34 +160,28 @@ int freeze_processes(void)
return error;
}
-static void thaw_tasks(bool nosig_only)
+void thaw_processes(void)
{
struct task_struct *g, *p;
+ oom_killer_enable();
+
+ printk("Restarting tasks ... ");
+
+ thaw_workqueues();
+
read_lock(&tasklist_lock);
do_each_thread(g, p) {
if (!freezable(p))
continue;
- if (nosig_only && should_send_signal(p))
- continue;
-
if (cgroup_freezing_or_frozen(p))
continue;
__thaw_task(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
-}
-
-void thaw_processes(void)
-{
- oom_killer_enable();
- printk("Restarting tasks ... ");
- thaw_workqueues();
- thaw_tasks(true);
- thaw_tasks(false);
schedule();
printk("done.\n");
}
Tejun Heo
2011-08-20 08:10:10 UTC
Permalink
Hello, Rafael.
Post by Rafael J. Wysocki
Post by Tejun Heo
There's no point in thawing nosig tasks before others. There's no
ordering requirement between the two groups on thaw, which the staged
thawing can't guarantee anyway. Simplify thaw_processes() by removing
the distinction and collapsing thaw_tasks() into thaw_processes().
This will help further updates to freezer.
I'm not sure if I like this patch.
Right now there are no ordering requirements between the two groups
of processes, but if we decide to freeze filesystems on suspend,
we'll need to thaw them between nosig and sig I suppose.
Hmmm... I'm not really following. How does staged wake up affect
thawing filesystems? Staged freezing makes sense as a crude way to
define dependency during freezing - ie. userland and freezable tasks
can't have dependency in their own groups but the former can depend on
the latter on the way to refrigerator.

However, during thawing, it doesn't make any difference regardless of
what was frozen when and how they depend on each other. They might as
well have cyclic dependency and waking them in any order wouldn't make
any difference. The task which dependes on another task to do
something would simply block until that task wakes up and resolves the
dependency; moreover, performing staged wakeups doesn't really
guarantee execution order. It's different from staged freezing in
that way - staged thawing doesn't have the synchronization phase
between the two stages. Tasks which were woken up earlier can easily
start executing after tasks which were woken up later.

The only guaranteed effect of staged wakeups is that tasks in the
earlier group would have had its ->state set to TASK_RUNNING before
the tasks of the second group. This again is a moot point because

* __refrigerator() restores task->state afterwards overwriting the
TASK_RUNNING once the task starts executing (in unknown order).
This is fundamentally broken and should be fixed so that task is
left in TASK_RUNNING when leaving the refrigerator.

* However, if you leave it at TASK_RUNNING, it doesn't make any
difference w.r.t. synchronization. The only way task->state can
participate in synchronization is through wake_up() - ie. through
other tasks setting its state to TASK_RUNNING, so if the
refrigerator leaves tast state at TASK_RUNNING on return, it can't
hinder any synchronization.

So, AFAICS, no matter which way it's looked at, it just doesn't make
any difference.

Thanks.
--
tejun
Rafael J. Wysocki
2011-08-20 08:39:28 UTC
Permalink
Post by Tejun Heo
Hello, Rafael.
Post by Rafael J. Wysocki
Post by Tejun Heo
There's no point in thawing nosig tasks before others. There's no
ordering requirement between the two groups on thaw, which the staged
thawing can't guarantee anyway. Simplify thaw_processes() by removing
the distinction and collapsing thaw_tasks() into thaw_processes().
This will help further updates to freezer.
I'm not sure if I like this patch.
Right now there are no ordering requirements between the two groups
of processes, but if we decide to freeze filesystems on suspend,
we'll need to thaw them between nosig and sig I suppose.
Hmmm... I'm not really following. How does staged wake up affect
thawing filesystems? Staged freezing makes sense as a crude way to
define dependency during freezing - ie. userland and freezable tasks
can't have dependency in their own groups but the former can depend on
the latter on the way to refrigerator.
However, during thawing, it doesn't make any difference regardless of
what was frozen when and how they depend on each other. They might as
well have cyclic dependency and waking them in any order wouldn't make
any difference. The task which dependes on another task to do
something would simply block until that task wakes up and resolves the
dependency; moreover, performing staged wakeups doesn't really
guarantee execution order. It's different from staged freezing in
that way - staged thawing doesn't have the synchronization phase
between the two stages. Tasks which were woken up earlier can easily
start executing after tasks which were woken up later.
The only guaranteed effect of staged wakeups is that tasks in the
earlier group would have had its ->state set to TASK_RUNNING before
the tasks of the second group. This again is a moot point because
* __refrigerator() restores task->state afterwards overwriting the
TASK_RUNNING once the task starts executing (in unknown order).
This is fundamentally broken and should be fixed so that task is
left in TASK_RUNNING when leaving the refrigerator.
* However, if you leave it at TASK_RUNNING, it doesn't make any
difference w.r.t. synchronization. The only way task->state can
participate in synchronization is through wake_up() - ie. through
other tasks setting its state to TASK_RUNNING, so if the
refrigerator leaves tast state at TASK_RUNNING on return, it can't
hinder any synchronization.
So, AFAICS, no matter which way it's looked at, it just doesn't make
any difference.
OK

Rafael
Tejun Heo
2011-08-19 14:16:10 UTC
Permalink
Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked. Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads. They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo <***@kernel.org>
Cc: Jens Axboe <***@kernel.dk>
Cc: Henrique de Moraes Holschuh <ibm-***@hmh.eng.br>
---
drivers/platform/x86/thinkpad_acpi.c | 15 ++++++---------
fs/fs-writeback.c | 4 +---
include/linux/freezer.h | 6 +++---
include/linux/kthread.h | 1 +
kernel/freezer.c | 6 ++++--
kernel/kthread.c | 25 +++++++++++++++++++++++++
mm/backing-dev.c | 8 ++------
7 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index 7bd829f..34f167e 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -2456,8 +2456,9 @@ static int hotkey_kthread(void *data)
u32 poll_mask, event_mask;
unsigned int si, so;
unsigned long t;
- unsigned int change_detector, must_reset;
+ unsigned int change_detector;
unsigned int poll_freq;
+ bool was_frozen;

mutex_lock(&hotkey_thread_mutex);

@@ -2488,14 +2489,14 @@ static int hotkey_kthread(void *data)
t = 100; /* should never happen... */
}
t = msleep_interruptible(t);
- if (unlikely(kthread_should_stop()))
+ if (unlikely(kthread_freezable_should_stop(&was_frozen)))
break;
- must_reset = try_to_freeze();
- if (t > 0 && !must_reset)
+
+ if (t > 0 && !was_frozen)
continue;

mutex_lock(&hotkey_thread_data_mutex);
- if (must_reset || hotkey_config_change != change_detector) {
+ if (was_frozen || hotkey_config_change != change_detector) {
/* forget old state on thaw or config change */
si = so;
t = 0;
@@ -2528,10 +2529,6 @@ exit:
static void hotkey_poll_stop_sync(void)
{
if (tpacpi_hotkey_task) {
- if (frozen(tpacpi_hotkey_task) ||
- freezing(tpacpi_hotkey_task))
- thaw_process(tpacpi_hotkey_task);
-
kthread_stop(tpacpi_hotkey_task);
tpacpi_hotkey_task = NULL;
mutex_lock(&hotkey_thread_mutex);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 04cf3b9..b36edb8 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -922,7 +922,7 @@ int bdi_writeback_thread(void *data)

trace_writeback_thread_start(bdi);

- while (!kthread_should_stop()) {
+ while (!kthread_freezable_should_stop(NULL)) {
/*
* Remove own delayed wake-up timer, since we are already awake
* and we'll take care of the preriodic write-back.
@@ -952,8 +952,6 @@ int bdi_writeback_thread(void *data)
*/
schedule();
}
-
- try_to_freeze();
}

/* Flush any work that raced with us exiting */
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 3d73288..8ee31e5 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -47,7 +47,7 @@ static inline bool should_send_signal(struct task_struct *p)
/* Takes and releases task alloc lock using task_lock() */
extern int thaw_process(struct task_struct *p);

-extern bool __refrigerator(void);
+extern bool __refrigerator(bool check_kthr_stop);
extern int freeze_processes(void);
extern void thaw_processes(void);

@@ -56,7 +56,7 @@ static inline bool try_to_freeze(void)
might_sleep();
if (likely(!freezing(current)))
return false;
- return __refrigerator();
+ return __refrigerator(false);
}

extern bool freeze_task(struct task_struct *p, bool sig_only);
@@ -169,7 +169,7 @@ static inline void set_freeze_flag(struct task_struct *p) {}
static inline void clear_freeze_flag(struct task_struct *p) {}
static inline int thaw_process(struct task_struct *p) { return 1; }

-static inline bool __refrigerator(void) { return false; }
+static inline bool __refrigerator(bool check_kthr_stop) { return false; }
static inline int freeze_processes(void) { BUG(); return 0; }
static inline void thaw_processes(void) {}

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 1e923e5..6c1903d 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -35,6 +35,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
void kthread_bind(struct task_struct *k, unsigned int cpu);
int kthread_stop(struct task_struct *k);
int kthread_should_stop(void);
+bool kthread_freezable_should_stop(bool *was_frozen);
void *kthread_data(struct task_struct *k);

int kthreadd(void *unused);
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 4d59904..656492c 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -9,6 +9,7 @@
#include <linux/module.h>
#include <linux/syscalls.h>
#include <linux/freezer.h>
+#include <linux/kthread.h>

/*
* freezing is complete, mark current process as frozen
@@ -23,7 +24,7 @@ static inline void frozen_process(void)
}

/* Refrigerator is place where frozen processes are stored :-). */
-bool __refrigerator(void)
+bool __refrigerator(bool check_kthr_stop)
{
/* Hmm, should we be allowed to suspend when there are realtime
processes around? */
@@ -50,7 +51,8 @@ bool __refrigerator(void)

for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!frozen(current))
+ if (!frozen(current) ||
+ (check_kthr_stop && kthread_should_stop()))
break;
was_frozen = true;
schedule();
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4ba7ccc..a6cbeea 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -59,6 +59,31 @@ int kthread_should_stop(void)
EXPORT_SYMBOL(kthread_should_stop);

/**
+ * kthread_freezable_should_stop - should this freezable kthread return now?
+ * @was_frozen: optional out parameter, indicates whether %current was frozen
+ *
+ * kthread_should_stop() for freezable kthreads, which will enter
+ * refrigerator if necessary. This function is safe from kthread_stop() /
+ * freezer deadlock and freezable kthreads should use this function instead
+ * of calling try_to_freeze() directly.
+ */
+bool kthread_freezable_should_stop(bool *was_frozen)
+{
+ bool frozen = false;
+
+ might_sleep();
+
+ if (unlikely(freezing(current)))
+ frozen = __refrigerator(true);
+
+ if (was_frozen)
+ *was_frozen = frozen;
+
+ return kthread_should_stop();
+}
+EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
+
+/**
* kthread_data - return data value specified on kthread creation
* @task: kthread task in question
*
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d6edf8d..d73a5aa 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -586,14 +586,10 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)

/*
* Finally, kill the kernel thread. We don't need to be RCU
- * safe anymore, since the bdi is gone from visibility. Force
- * unfreeze of the thread before calling kthread_stop(), otherwise
- * it would never exet if it is currently stuck in the refrigerator.
+ * safe anymore, since the bdi is gone from visibility.
*/
- if (bdi->wb.task) {
- thaw_process(bdi->wb.task);
+ if (bdi->wb.task)
kthread_stop(bdi->wb.task);
- }
}

/*
--
1.7.6
Henrique de Moraes Holschuh
2011-08-19 20:07:12 UTC
Permalink
Post by Tejun Heo
Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().
This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked. Both thaw_process() users are converted to
use the new function.
Note that this deadlock condition exists for many of freezable
kthreads. They need to be converted to use the new should_stop or
freezable workqueue.
Tested with synthetic test case.
For the thinkpad-acpi bits:
Acked-by: Henrique de Moraes Holschuh <***@hmh.eng.br>
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
Oleg Nesterov
2011-08-21 19:14:21 UTC
Permalink
Post by Tejun Heo
+bool kthread_freezable_should_stop(bool *was_frozen)
+{
+ bool frozen = false;
+
+ might_sleep();
+
+ if (unlikely(freezing(current)))
+ frozen = __refrigerator(true);
+
+ if (was_frozen)
+ *was_frozen = frozen;
+
+ return kthread_should_stop();
+}
Imho, nice interface...

So, the caller can not miss the kthread_stop() request and freeze.
And the change in __refrigerator() means that kthread_should_stop()
acts as thaw_process() (in some sense).

But. can't __refrigerator() race with thaw_process() in this case?
Post by Tejun Heo
+bool __refrigerator(bool check_kthr_stop)
{
/* Hmm, should we be allowed to suspend when there are realtime
processes around? */
@@ -50,7 +51,8 @@ bool __refrigerator(void)
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!frozen(current))
+ if (!frozen(current) ||
+ (check_kthr_stop && kthread_should_stop()))
break;
OK, but then we do

current->flags &= ~PF_FREEZING;

since PF_FROZEN wasn't cleared this can race with

p->flags &= ~PF_FROZEN;

No?

Oleg.
Tejun Heo
2011-08-22 09:53:57 UTC
Permalink
Hey, Oleg.
Post by Oleg Nesterov
But. can't __refrigerator() race with thaw_process() in this case?
Post by Tejun Heo
+bool __refrigerator(bool check_kthr_stop)
{
/* Hmm, should we be allowed to suspend when there are realtime
processes around? */
@@ -50,7 +51,8 @@ bool __refrigerator(void)
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!frozen(current))
+ if (!frozen(current) ||
+ (check_kthr_stop && kthread_should_stop()))
break;
OK, but then we do
current->flags &= ~PF_FREEZING;
since PF_FROZEN wasn't cleared this can race with
p->flags &= ~PF_FROZEN;
No?
Indeed, I removed PF_FREEZING and moved PF_FROZEN to __refrigerator()
in later patches, so it's already gone. Patches could have been
sequenced better, I suppose. But, sequencing fixes for different
races was already pretty challenging. :)

Thanks.
--
tejun
Oleg Nesterov
2011-08-23 15:42:39 UTC
Permalink
Hi Tejun,
Post by Tejun Heo
Post by Oleg Nesterov
OK, but then we do
current->flags &= ~PF_FREEZING;
since PF_FROZEN wasn't cleared this can race with
p->flags &= ~PF_FROZEN;
No?
Indeed, I removed PF_FREEZING and moved PF_FROZEN to __refrigerator()
in later patches, so it's already gone.
OK, thanks.
Post by Tejun Heo
Patches could have been
sequenced better, I suppose. But, sequencing fixes for different
races was already pretty challenging. :)
... and it also makes sense to try to read the whole seried before
asking the question.

although this is not simple. Especially because currently I can't
read it faster than one patch per day ;)

Oleg.
Tejun Heo
2011-08-19 14:16:12 UTC
Permalink
There's no point in freezing an exiting task. The current code
seemingly tries that by calling clear_freeze_flag() from exit_mm() but
it's racy as freeze might happen afterwards.

This patch removes the racy clear_freeze_flag() makes do_exit() set
PF_NOFREEZE after PTRACE_EVENT_EXIT, after which freezing doesn't make
sense.

Signed-off-by: Tejun Heo <***@kernel.org>
---
kernel/exit.c | 8 ++++++--
kernel/power/process.c | 3 +--
2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..ac58259 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -679,8 +679,6 @@ static void exit_mm(struct task_struct * tsk)
tsk->mm = NULL;
up_read(&mm->mmap_sem);
enter_lazy_tlb(mm, current);
- /* We don't want this task to be frozen prematurely */
- clear_freeze_flag(tsk);
if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
atomic_dec(&mm->oom_disable_count);
task_unlock(tsk);
@@ -915,6 +913,12 @@ NORET_TYPE void do_exit(long code)

ptrace_event(PTRACE_EVENT_EXIT, code);

+ /*
+ * With ptrace notification done, there's no point in freezing from
+ * here on. Disallow freezing.
+ */
+ current->flags |= PF_NOFREEZE;
+
validate_creds_for_do_exit(tsk);

/*
diff --git a/kernel/power/process.c b/kernel/power/process.c
index bec09c3..ddfaba4 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -25,8 +25,7 @@
static inline int freezable(struct task_struct * p)
{
if ((p == current) ||
- (p->flags & PF_NOFREEZE) ||
- (p->exit_state != 0))
+ (p->flags & PF_NOFREEZE))
return 0;
return 1;
}
--
1.7.6
Oleg Nesterov
2011-08-23 15:52:21 UTC
Permalink
Hi.

Not a comment, but the question. Probably falls into the "read the
whole series" category too.
Post by Tejun Heo
There's no point in freezing an exiting task.
This is not clear to me. Probably this is fine, I do not know what
the callers of freeze_processes() actually expect.
Post by Tejun Heo
@@ -915,6 +913,12 @@ NORET_TYPE void do_exit(long code)
ptrace_event(PTRACE_EVENT_EXIT, code);
+ /*
+ * With ptrace notification done, there's no point in freezing from
+ * here on. Disallow freezing.
+ */
+ current->flags |= PF_NOFREEZE;
OK, but what PF_NOFREEZE actually means?

Apart from "dont try to freeze" it means "no need to freeze", yes?

IOW, try_to_freeze_tasks() can succeed even if we have a lot of
exitinig task which can make some activity, say, disk i/o. Is this
correct?

Once again, I do not understand the problem-space at all, jus I am
curious.

Thanks,

Oleg.
Tejun Heo
2011-08-23 19:44:51 UTC
Permalink
Hello,
Post by Oleg Nesterov
Post by Tejun Heo
@@ -915,6 +913,12 @@ NORET_TYPE void do_exit(long code)
ptrace_event(PTRACE_EVENT_EXIT, code);
+ /*
+ * With ptrace notification done, there's no point in freezing from
+ * here on. Disallow freezing.
+ */
+ current->flags |= PF_NOFREEZE;
OK, but what PF_NOFREEZE actually means?
Apart from "dont try to freeze" it means "no need to freeze", yes?
Yes.
Post by Oleg Nesterov
IOW, try_to_freeze_tasks() can succeed even if we have a lot of
exitinig task which can make some activity, say, disk i/o. Is this
correct?
Hmmm... can it cause disk IOs after that point? I skimmed through and
couldn't spot one (the original code made simliar assumption albeit a
bit later).

Thanks.
--
tejun
Oleg Nesterov
2011-08-24 14:14:31 UTC
Permalink
Post by Tejun Heo
Hello,
Post by Oleg Nesterov
Post by Tejun Heo
@@ -915,6 +913,12 @@ NORET_TYPE void do_exit(long code)
ptrace_event(PTRACE_EVENT_EXIT, code);
+ /*
+ * With ptrace notification done, there's no point in freezing from
+ * here on. Disallow freezing.
+ */
+ current->flags |= PF_NOFREEZE;
OK, but what PF_NOFREEZE actually means?
Apart from "dont try to freeze" it means "no need to freeze", yes?
Yes.
Post by Oleg Nesterov
IOW, try_to_freeze_tasks() can succeed even if we have a lot of
exitinig task which can make some activity, say, disk i/o. Is this
correct?
Hmmm... can it cause disk IOs after that point? I skimmed through and
couldn't spot one
I am not sure. But, say, exit_files(). We can't know what f_op->flush()
f_op->release() can do in general. Even without i/o the exiting task can
do a lot of different things.
Post by Tejun Heo
(the original code made simliar assumption albeit a
bit later).
Yes, and this looks "safer". I think exit_mm()->clear_freeze_flag() was
simply unneeded, but exit_state != 0 in freezable() looks understandable.
do_each_thread() (in general) can't see the threads with exit_state != 0
anyway, but we should skip zombies.


Let me repeat, I do not know what the callers of try_to_freeze_tasks()
actually need, probably this is fine.

Oleg.
Tejun Heo
2011-08-25 15:59:57 UTC
Permalink
Hello, Oleg.
Post by Oleg Nesterov
Post by Tejun Heo
Hmmm... can it cause disk IOs after that point? I skimmed through and
couldn't spot one
I am not sure. But, say, exit_files(). We can't know what f_op->flush()
f_op->release() can do in general. Even without i/o the exiting task can
do a lot of different things.
Yeah, what 'freeze' should do is a bit vague on the edges, I think.
The freezer can't really halt the whole system operation including
IOs. Tasks aren't the only source which can kick those off. There
are other asynchronous sources, timers, IRQs and bottom halves. For
actual disk IOs to at least SCSI/IDE devices, freezing isn't even
necessary. Queue processing is suspended via device suspend
mechanism. There were several drivers which didn't have proper device
suspend mechanism implemented (maybe we should implement generic queue
quiescing in block layer) but I don't know whether they are still like
that.

Anyways, freezing has never been a full solution to quiescing the
system prior to entering hibernation. It does make sense for userland
processes but depending on it for quiescing kernel activities
including IO can't be reliable. It should really be achieved by
device suspend (and maybe some midlayer suspending too) and I think
we're already pretty close to that. Rafael, can you please enlighten
us on the subject?

Another freezer user is the cgroup, for this, allowing killing to
finish is quite useful as it can be used for runaway cgroup control.

Thanks.
--
tejun
Oleg Nesterov
2011-08-25 16:56:50 UTC
Permalink
Post by Tejun Heo
Yeah, what 'freeze' should do is a bit vague on the edges, I think.
The freezer can't really halt the whole system operation including
IOs. Tasks aren't the only source which can kick those off. There
are other asynchronous sources,
Of course.

But still I can't understand why it is better to consider the exiting
task as "frozen" from the very beginning, right after PTRACE_EVENT_EXIT.
do_exit() does a lot of misc things, and this patch simply makes it
"invisible" to the freezer. This looks "unsafe" even if this is fine
for suspend/etc.

To me, try_to_freeze_tasks() should succed when all threads either
sleep in refrigerator(), or ->state = TASK_DEAD (the final schedule()
was called). Until then try_to_freeze_tasks() should retry.

But since we can't see the threads after exit_notify (in general),
the current ->exit_state check looks reasonable.

But again, again, I won't argue.
Post by Tejun Heo
Rafael, can you please enlighten
us on the subject?
Please ;)
Post by Tejun Heo
Another freezer user is the cgroup,
Yes. And I don't understand this case too. I mean, the fact we ignore
the exiting tasks.

Oleg.
Rafael J. Wysocki
2011-08-25 21:01:41 UTC
Permalink
Post by Oleg Nesterov
Post by Tejun Heo
Yeah, what 'freeze' should do is a bit vague on the edges, I think.
The freezer can't really halt the whole system operation including
IOs. Tasks aren't the only source which can kick those off. There
are other asynchronous sources,
Of course.
But still I can't understand why it is better to consider the exiting
task as "frozen" from the very beginning, right after PTRACE_EVENT_EXIT.
do_exit() does a lot of misc things, and this patch simply makes it
"invisible" to the freezer. This looks "unsafe" even if this is fine
for suspend/etc.
Suspend needs the freezer to ensure that processes (user space mostly)
won't interact with drivers in any way while devices are being suspended,
so if a process is in a state in which it won't talk to any driver
and make changes to filesystems any more, it's irrelevant from the
suspend's point of view.
Post by Oleg Nesterov
To me, try_to_freeze_tasks() should succed when all threads either
sleep in refrigerator(), or ->state = TASK_DEAD (the final schedule()
was called). Until then try_to_freeze_tasks() should retry.
But since we can't see the threads after exit_notify (in general),
the current ->exit_state check looks reasonable.
But again, again, I won't argue.
Post by Tejun Heo
Rafael, can you please enlighten us on the subject?
Please ;)
Is the above sufficient?

Rafael
Tejun Heo
2011-08-25 21:54:44 UTC
Permalink
Hello,
Post by Rafael J. Wysocki
Post by Oleg Nesterov
But still I can't understand why it is better to consider the exiting
task as "frozen" from the very beginning, right after PTRACE_EVENT_EXIT.
do_exit() does a lot of misc things, and this patch simply makes it
"invisible" to the freezer. This looks "unsafe" even if this is fine
for suspend/etc.
Suspend needs the freezer to ensure that processes (user space mostly)
won't interact with drivers in any way while devices are being suspended,
so if a process is in a state in which it won't talk to any driver
and make changes to filesystems any more, it's irrelevant from the
suspend's point of view.
Hmm... but freezer can't really achieve that. There are other sources
of activities. But anyways, yeah, I think Oleg is right here and we
should be setting NOFREEZE on transition to EXIT_DEAD.
Post by Rafael J. Wysocki
Post by Oleg Nesterov
Post by Tejun Heo
Rafael, can you please enlighten us on the subject?
Please ;)
Is the above sufficient?
One thing I'm curious about is how many drivers do we have left which
depend on freezer as opposed to implementing proper quiescing
mechanism using PM hooks? Are there still a lot left?

Thanks.
--
tejun
Tejun Heo
2011-08-25 21:52:03 UTC
Permalink
Hello, Oleg.
Post by Oleg Nesterov
But still I can't understand why it is better to consider the exiting
task as "frozen" from the very beginning, right after PTRACE_EVENT_EXIT.
do_exit() does a lot of misc things, and this patch simply makes it
"invisible" to the freezer. This looks "unsafe" even if this is fine
for suspend/etc.
To me, try_to_freeze_tasks() should succed when all threads either
sleep in refrigerator(), or ->state = TASK_DEAD (the final schedule()
was called). Until then try_to_freeze_tasks() should retry.
Hmmm... yeah, maybe moving it right after setting ->state would be
safer. Care to send a patch?
Post by Oleg Nesterov
But since we can't see the threads after exit_notify (in general),
the current ->exit_state check looks reasonable.
I prefer setting NOFREEZE after point of no return from exit path.
The intention is clearer that way, I think.
Post by Oleg Nesterov
Post by Tejun Heo
Another freezer user is the cgroup,
Yes. And I don't understand this case too. I mean, the fact we ignore
the exiting tasks.
Because it's basically a mass (or meta) job control mechanism and it's
nicer to be able to kill processes after freezing berserk cgroups.

Thanks.
--
tejun
Matt Helsley
2011-08-24 22:34:12 UTC
Permalink
Post by Tejun Heo
There's no point in freezing an exiting task. The current code
seemingly tries that by calling clear_freeze_flag() from exit_mm() but
it's racy as freeze might happen afterwards.
This patch removes the racy clear_freeze_flag() makes do_exit() set
PF_NOFREEZE after PTRACE_EVENT_EXIT, after which freezing doesn't make
sense.
---
kernel/exit.c | 8 ++++++--
kernel/power/process.c | 3 +--
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..ac58259 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -679,8 +679,6 @@ static void exit_mm(struct task_struct * tsk)
tsk->mm = NULL;
up_read(&mm->mmap_sem);
enter_lazy_tlb(mm, current);
- /* We don't want this task to be frozen prematurely */
- clear_freeze_flag(tsk);
This patch doesn't look quite right. Perhaps that's because I'm
unclear why any of the freezer flags matter in the do_exit() path. How
can the task enter the refrigerator in do_exit()? Isn't it true that
we don't enter the syscall return path and thus can't freeze
anyway (the last thing it should do is schedule())? Or is this another
kthread quirk?

If we can't enter the refrigerator then why fiddle with these flags
here at all? That might explain why this race never introduced
problems.
Post by Tejun Heo
if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
atomic_dec(&mm->oom_disable_count);
task_unlock(tsk);
@@ -915,6 +913,12 @@ NORET_TYPE void do_exit(long code)
ptrace_event(PTRACE_EVENT_EXIT, code);
+ /*
+ * With ptrace notification done, there's no point in freezing from
+ * here on. Disallow freezing.
+ */
+ current->flags |= PF_NOFREEZE;
+
validate_creds_for_do_exit(tsk);
OK, now I'm assuming my reasoning above is wrong; that it's possible to
enter the refrigerator in do_exit().

Couldn't TIF_FREEZE already be set? Setting PF_NOFREEZE only ensures
the flag won't get set "after" this point. To fix this I think we'd need
something more like:

current->flags |= PF_NOFREEZE;
smp_wmb();
clear_freeze_flag(tsk);

Otherwise TIF_FREEZE might not get cleared and, despite PF_NOFREEZE,
we could somehow enter the refrigerator. Also, I'm not sure the
smp_wmb() is sufficient since it will only affect the order the writes
appear and does not affect whether other cpus will see PF_NOFREEZE
before we clear the TIF_FREEZE flag. Perhaps we need task_lock() here.
Post by Tejun Heo
/*
diff --git a/kernel/power/process.c b/kernel/power/process.c
index bec09c3..ddfaba4 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -25,8 +25,7 @@
static inline int freezable(struct task_struct * p)
{
if ((p == current) ||
- (p->flags & PF_NOFREEZE) ||
- (p->exit_state != 0))
+ (p->flags & PF_NOFREEZE))
This patch also fiddles with the timing of adjusting or checking three
different pieces of task state:

exit_state
flags (PF_NOFREEZE)
thread flags (TIF_FREEZE)

each of which get set/cleared at different times from the new
line adding PF_NOFREEZE to the flags. So I'm a little nervous that
we might be missing some important details.

Cheers,
-Matt Helsley
Oleg Nesterov
2011-08-25 15:25:33 UTC
Permalink
Post by Matt Helsley
Post by Tejun Heo
There's no point in freezing an exiting task. The current code
seemingly tries that by calling clear_freeze_flag() from exit_mm() but
it's racy as freeze might happen afterwards.
This patch removes the racy clear_freeze_flag() makes do_exit() set
PF_NOFREEZE after PTRACE_EVENT_EXIT, after which freezing doesn't make
sense.
---
kernel/exit.c | 8 ++++++--
kernel/power/process.c | 3 +--
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..ac58259 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -679,8 +679,6 @@ static void exit_mm(struct task_struct * tsk)
tsk->mm = NULL;
up_read(&mm->mmap_sem);
enter_lazy_tlb(mm, current);
- /* We don't want this task to be frozen prematurely */
- clear_freeze_flag(tsk);
This patch doesn't look quite right. Perhaps that's because I'm
unclear why any of the freezer flags matter in the do_exit() path. How
can the task enter the refrigerator in do_exit()?
it can't.
Post by Matt Helsley
If we can't enter the refrigerator then why fiddle with these flags
here at all? That might explain why this race never introduced
problems.
One problem is, try_to_freeze_tasks() shouldn't fail if we have a zombie.
Post by Matt Helsley
Couldn't TIF_FREEZE already be set?
Yes,
Post by Matt Helsley
Setting PF_NOFREEZE only ensures
the flag won't get set "after" this point. To fix this I think we'd need
current->flags |= PF_NOFREEZE;
smp_wmb();
clear_freeze_flag(tsk);
We don't really need this, we are not going to freeze.




However. I thought about this too. In the long term I think we do need
some cleanups to avoid the wrong TIF_SIGPENDING... but currently this
is off-topic.

Oleg.
Tejun Heo
2011-08-25 16:11:19 UTC
Permalink
Hello, Matt.
Post by Matt Helsley
Post by Tejun Heo
- /* We don't want this task to be frozen prematurely */
- clear_freeze_flag(tsk);
This patch doesn't look quite right. Perhaps that's because I'm
unclear why any of the freezer flags matter in the do_exit() path. How
can the task enter the refrigerator in do_exit()? Isn't it true that
we don't enter the syscall return path and thus can't freeze
anyway (the last thing it should do is schedule())? Or is this another
kthread quirk?
There are paths where try_to_freeze() is called explicitly. None of
those is called from exit path AFAICT but it is a theoretical
possibility and I think it isn't a terrible idea to explicitly declare
an exiting task unfreezable.
Post by Matt Helsley
Couldn't TIF_FREEZE already be set? Setting PF_NOFREEZE only ensures
the flag won't get set "after" this point. To fix this I think we'd need
current->flags |= PF_NOFREEZE;
smp_wmb();
clear_freeze_flag(tsk);
TIF_FREEZE being set doesn't matter all that much. Ultimately,
PF_NOFREEZE is observed by the task itself. freezing() is false - it
either doesn't enter refrigerator at all or exits immediately without
scheduling out.
Post by Matt Helsley
This patch also fiddles with the timing of adjusting or checking three
exit_state
flags (PF_NOFREEZE)
thread flags (TIF_FREEZE)
each of which get set/cleared at different times from the new
line adding PF_NOFREEZE to the flags. So I'm a little nervous that
we might be missing some important details.
Sure, that's one of the problems I wanted to clarify with this patch
series. The rules are quite simple now.

* Whether a task enters and stays inside freezer is determined by
freezing().

* Related PF_ flags are manipulated only by the task itself and thus
the latest state is always visible when the task tests freezing().

* Freezer trying to freeze or thaw a task updates states and
freezing() result reflects the change considering both the task's
freezing settings and the freezing conditions in effect. Freezer is
responsible for ensuring the target task goes through at least one
freezing() test after the state is updated.

So, no race condition.

Thanks.
--
tejun
Tejun Heo
2011-08-19 14:16:11 UTC
Permalink
thaw_process() now has only internal users - system and cgroup
freezers. Remove the unnecessary return value, rename, unexport and
collapse __thaw_process() into it. This will help further updates to
the freezer code.

Signed-off-by: Tejun Heo <***@kernel.org>
Cc: Paul Menage <***@google.com>
---
include/linux/freezer.h | 3 +--
kernel/cgroup_freezer.c | 7 +++----
kernel/freezer.c | 30 +++++++++++-------------------
kernel/power/process.c | 2 +-
4 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 8ee31e5..80a455d 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -45,7 +45,7 @@ static inline bool should_send_signal(struct task_struct *p)
}

/* Takes and releases task alloc lock using task_lock() */
-extern int thaw_process(struct task_struct *p);
+extern void __thaw_task(struct task_struct *t);

extern bool __refrigerator(bool check_kthr_stop);
extern int freeze_processes(void);
@@ -167,7 +167,6 @@ static inline int frozen(struct task_struct *p) { return 0; }
static inline int freezing(struct task_struct *p) { return 0; }
static inline void set_freeze_flag(struct task_struct *p) {}
static inline void clear_freeze_flag(struct task_struct *p) {}
-static inline int thaw_process(struct task_struct *p) { return 1; }

static inline bool __refrigerator(bool check_kthr_stop) { return false; }
static inline int freeze_processes(void) { BUG(); return 0; }
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e691818..e7fa0ec 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -130,7 +130,7 @@ struct cgroup_subsys freezer_subsys;
* write_lock css_set_lock (cgroup iterator start)
* task->alloc_lock
* read_lock css_set_lock (cgroup iterator start)
- * task->alloc_lock (inside thaw_process(), prevents race with refrigerator())
+ * task->alloc_lock (inside __thaw_task(), prevents race with refrigerator())
* sighand->siglock
*/
static struct cgroup_subsys_state *freezer_create(struct cgroup_subsys *ss,
@@ -300,9 +300,8 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
struct task_struct *task;

cgroup_iter_start(cgroup, &it);
- while ((task = cgroup_iter_next(cgroup, &it))) {
- thaw_process(task);
- }
+ while ((task = cgroup_iter_next(cgroup, &it)))
+ __thaw_task(task);
cgroup_iter_end(cgroup, &it);

freezer->state = CGROUP_THAWED;
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 656492c..f5db7fd 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -145,18 +145,8 @@ void cancel_freezing(struct task_struct *p)
}
}

-static int __thaw_process(struct task_struct *p)
-{
- if (frozen(p)) {
- p->flags &= ~PF_FROZEN;
- return 1;
- }
- clear_freeze_flag(p);
- return 0;
-}
-
/*
- * Wake up a frozen process
+ * Wake up a frozen task
*
* task_lock() is needed to prevent the race with refrigerator() which may
* occur if the freezing of tasks fails. Namely, without the lock, if the
@@ -164,15 +154,17 @@ static int __thaw_process(struct task_struct *p)
* refrigerator() could call frozen_process(), in which case the task would be
* frozen and no one would thaw it.
*/
-int thaw_process(struct task_struct *p)
+void __thaw_task(struct task_struct *p)
{
+ bool was_frozen;
+
task_lock(p);
- if (__thaw_process(p) == 1) {
- task_unlock(p);
- wake_up_process(p);
- return 1;
- }
+ if ((was_frozen = frozen(p)))
+ p->flags &= ~PF_FROZEN;
+ else
+ clear_freeze_flag(p);
task_unlock(p);
- return 0;
+
+ if (was_frozen)
+ wake_up_process(p);
}
-EXPORT_SYMBOL(thaw_process);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 0cf3a27..bec09c3 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -176,7 +176,7 @@ static void thaw_tasks(bool nosig_only)
if (cgroup_freezing_or_frozen(p))
continue;

- thaw_process(p);
+ __thaw_task(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
}
--
1.7.6
Paul Menage
2011-08-19 15:37:20 UTC
Permalink
Post by Tejun Heo
thaw_process() now has only internal users - system and cgroup
freezers. =A0Remove the unnecessary return value, rename, unexport an=
d
Post by Tejun Heo
collapse __thaw_process() into it. =A0This will help further updates =
to
Post by Tejun Heo
the freezer code.
Probably better to have Matt Helsley look at this?

Paul
Post by Tejun Heo
---
=A0include/linux/freezer.h | =A0 =A03 +--
=A0kernel/cgroup_freezer.c | =A0 =A07 +++----
=A0kernel/freezer.c =A0 =A0 =A0 =A0| =A0 30 +++++++++++--------------=
-----
Post by Tejun Heo
=A0kernel/power/process.c =A0| =A0 =A02 +-
=A04 files changed, 16 insertions(+), 26 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 8ee31e5..80a455d 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -45,7 +45,7 @@ static inline bool should_send_signal(struct task_s=
truct *p)
Post by Tejun Heo
=A0}
=A0/* Takes and releases task alloc lock using task_lock() */
-extern int thaw_process(struct task_struct *p);
+extern void __thaw_task(struct task_struct *t);
=A0extern bool __refrigerator(bool check_kthr_stop);
=A0extern int freeze_processes(void);
@@ -167,7 +167,6 @@ static inline int frozen(struct task_struct *p) {=
return 0; }
Post by Tejun Heo
=A0static inline int freezing(struct task_struct *p) { return 0; }
=A0static inline void set_freeze_flag(struct task_struct *p) {}
=A0static inline void clear_freeze_flag(struct task_struct *p) {}
-static inline int thaw_process(struct task_struct *p) { return 1; }
=A0static inline bool __refrigerator(bool check_kthr_stop) { return f=
alse; }
Post by Tejun Heo
=A0static inline int freeze_processes(void) { BUG(); return 0; }
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e691818..e7fa0ec 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -130,7 +130,7 @@ struct cgroup_subsys freezer_subsys;
=A0* =A0 write_lock css_set_lock (cgroup iterator start)
=A0* =A0 =A0task->alloc_lock
=A0* =A0 read_lock css_set_lock (cgroup iterator start)
- * =A0 =A0task->alloc_lock (inside thaw_process(), prevents race wit=
h refrigerator())
Post by Tejun Heo
+ * =A0 =A0task->alloc_lock (inside __thaw_task(), prevents race with=
refrigerator())
Post by Tejun Heo
=A0* =A0 =A0 sighand->siglock
=A0*/
=A0static struct cgroup_subsys_state *freezer_create(struct cgroup_su=
bsys *ss,
Post by Tejun Heo
@@ -300,9 +300,8 @@ static void unfreeze_cgroup(struct cgroup *cgroup=
, struct freezer *freezer)
Post by Tejun Heo
=A0 =A0 =A0 =A0struct task_struct *task;
=A0 =A0 =A0 =A0cgroup_iter_start(cgroup, &it);
- =A0 =A0 =A0 while ((task =3D cgroup_iter_next(cgroup, &it))) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 thaw_process(task);
- =A0 =A0 =A0 }
+ =A0 =A0 =A0 while ((task =3D cgroup_iter_next(cgroup, &it)))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 __thaw_task(task);
=A0 =A0 =A0 =A0cgroup_iter_end(cgroup, &it);
=A0 =A0 =A0 =A0freezer->state =3D CGROUP_THAWED;
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 656492c..f5db7fd 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -145,18 +145,8 @@ void cancel_freezing(struct task_struct *p)
=A0 =A0 =A0 =A0}
=A0}
-static int __thaw_process(struct task_struct *p)
-{
- =A0 =A0 =A0 if (frozen(p)) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 p->flags &=3D ~PF_FROZEN;
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 1;
- =A0 =A0 =A0 }
- =A0 =A0 =A0 clear_freeze_flag(p);
- =A0 =A0 =A0 return 0;
-}
-
=A0/*
- * Wake up a frozen process
+ * Wake up a frozen task
=A0*
=A0* task_lock() is needed to prevent the race with refrigerator() wh=
ich may
Post by Tejun Heo
=A0* occur if the freezing of tasks fails. =A0Namely, without the loc=
k, if the
Post by Tejun Heo
@@ -164,15 +154,17 @@ static int __thaw_process(struct task_struct *p=
)
Post by Tejun Heo
=A0* refrigerator() could call frozen_process(), in which case the ta=
sk would be
Post by Tejun Heo
=A0* frozen and no one would thaw it.
=A0*/
-int thaw_process(struct task_struct *p)
+void __thaw_task(struct task_struct *p)
=A0{
+ =A0 =A0 =A0 bool was_frozen;
+
=A0 =A0 =A0 =A0task_lock(p);
- =A0 =A0 =A0 if (__thaw_process(p) =3D=3D 1) {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 task_unlock(p);
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 wake_up_process(p);
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 1;
- =A0 =A0 =A0 }
+ =A0 =A0 =A0 if ((was_frozen =3D frozen(p)))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 p->flags &=3D ~PF_FROZEN;
+ =A0 =A0 =A0 else
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 clear_freeze_flag(p);
=A0 =A0 =A0 =A0task_unlock(p);
- =A0 =A0 =A0 return 0;
+
+ =A0 =A0 =A0 if (was_frozen)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 wake_up_process(p);
=A0}
-EXPORT_SYMBOL(thaw_process);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 0cf3a27..bec09c3 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -176,7 +176,7 @@ static void thaw_tasks(bool nosig_only)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (cgroup_freezing_or_frozen(p))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 thaw_process(p);
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 __thaw_task(p);
=A0 =A0 =A0 =A0} while_each_thread(g, p);
=A0 =A0 =A0 =A0read_unlock(&tasklist_lock);
=A0}
--
1.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kerne=
l" in
Post by Tejun Heo
More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
Please read the FAQ at =A0http://www.tux.org/lkml/
Matt Helsley
2011-08-24 02:28:31 UTC
Permalink
Post by Tejun Heo
thaw_process() now has only internal users - system and cgroup
freezers. Remove the unnecessary return value, rename, unexport and
collapse __thaw_process() into it. This will help further updates to
the freezer code.
<snip>
Post by Tejun Heo
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e691818..e7fa0ec 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
<snip (Looks fine)>
Post by Tejun Heo
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 656492c..f5db7fd 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
<snip>
Post by Tejun Heo
@@ -164,15 +154,17 @@ static int __thaw_process(struct task_struct *p)
* refrigerator() could call frozen_process(), in which case the task would be
* frozen and no one would thaw it.
*/
-int thaw_process(struct task_struct *p)
+void __thaw_task(struct task_struct *p)
{
+ bool was_frozen;
+
task_lock(p);
- if (__thaw_process(p) == 1) {
- task_unlock(p);
- wake_up_process(p);
- return 1;
- }
+ if ((was_frozen = frozen(p)))
Style nit: I think this form is preferable:

was_frozen = frozen(p);
if (was_frozen)
...

However I know Rafael already pulled this so I don't
know if it's worth bothering.

Cheers,
-Matt Helsley
Tejun Heo
2011-08-19 14:16:08 UTC
Permalink
Some drivers set PF_NOFREEZE in their kthread functions which is
completely unnecessary and racy - some part of freezer code doesn't
consider cases where PF_NOFREEZE is set asynchronous to freezer
operations.

In general, there's no reason to allow setting PF_NOFREEZE explicitly.
Remove them and change the documentation to note that setting
PF_NOFREEZE directly isn't allowed.

Signed-off-by: Tejun Heo <***@kernel.org>
Cc: Marcel Holtmann <***@holtmann.org>
Cc: "Gustavo F. Padovan" <***@profusion.mobi>
Cc: Samuel Ortiz <***@linux.intel.com>
Cc: wwang <***@realsil.com.cn>
---
Documentation/power/freezing-of-tasks.txt | 2 +-
drivers/bluetooth/btmrvl_main.c | 2 --
drivers/mfd/twl4030-irq.c | 3 ---
drivers/mfd/twl6030-irq.c | 2 --
drivers/staging/rts_pstor/rtsx.c | 2 --
5 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/Documentation/power/freezing-of-tasks.txt b/Documentation/power/freezing-of-tasks.txt
index 38b5724..710c965 100644
--- a/Documentation/power/freezing-of-tasks.txt
+++ b/Documentation/power/freezing-of-tasks.txt
@@ -67,7 +67,7 @@ III. Which kernel threads are freezable?

Kernel threads are not freezable by default. However, a kernel thread may clear
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
-directly is strongly discouraged). From this point it is regarded as freezable
+directly is not allowed). From this point it is regarded as freezable
and must call try_to_freeze() in a suitable place.

IV. Why do we do that?
diff --git a/drivers/bluetooth/btmrvl_main.c b/drivers/bluetooth/btmrvl_main.c
index 548d1d9..57312d4 100644
--- a/drivers/bluetooth/btmrvl_main.c
+++ b/drivers/bluetooth/btmrvl_main.c
@@ -473,8 +473,6 @@ static int btmrvl_service_main_thread(void *data)

init_waitqueue_entry(&wait, current);

- current->flags |= PF_NOFREEZE;
-
for (;;) {
add_wait_queue(&thread->wait_q, &wait);

diff --git a/drivers/mfd/twl4030-irq.c b/drivers/mfd/twl4030-irq.c
index 8a7ee31..6aae831 100644
--- a/drivers/mfd/twl4030-irq.c
+++ b/drivers/mfd/twl4030-irq.c
@@ -289,9 +289,6 @@ static int twl4030_irq_thread(void *data)
static unsigned i2c_errors;
static const unsigned max_i2c_errors = 100;

-
- current->flags |= PF_NOFREEZE;
-
while (!kthread_should_stop()) {
int ret;
int module_irq;
diff --git a/drivers/mfd/twl6030-irq.c b/drivers/mfd/twl6030-irq.c
index eb3b5f8..6fd7795 100644
--- a/drivers/mfd/twl6030-irq.c
+++ b/drivers/mfd/twl6030-irq.c
@@ -96,8 +96,6 @@ static int twl6030_irq_thread(void *data)
static const unsigned max_i2c_errors = 100;
int ret;

- current->flags |= PF_NOFREEZE;
-
while (!kthread_should_stop()) {
int i;
union {
diff --git a/drivers/staging/rts_pstor/rtsx.c b/drivers/staging/rts_pstor/rtsx.c
index 16c73fb..a1b9093 100644
--- a/drivers/staging/rts_pstor/rtsx.c
+++ b/drivers/staging/rts_pstor/rtsx.c
@@ -466,8 +466,6 @@ static int rtsx_control_thread(void *__dev)
struct rtsx_chip *chip = dev->chip;
struct Scsi_Host *host = rtsx_to_host(dev);

- current->flags |= PF_NOFREEZE;
-
for (;;) {
if (wait_for_completion_interruptible(&dev->cmnd_ready))
break;
--
1.7.6
Gustavo Padovan
2011-08-19 16:43:06 UTC
Permalink
Hi Tejun,
Post by Tejun Heo
Some drivers set PF_NOFREEZE in their kthread functions which is
completely unnecessary and racy - some part of freezer code doesn't
consider cases where PF_NOFREEZE is set asynchronous to freezer
operations.
In general, there's no reason to allow setting PF_NOFREEZE explicitly.
Remove them and change the documentation to note that setting
PF_NOFREEZE directly isn't allowed.
---
Documentation/power/freezing-of-tasks.txt | 2 +-
drivers/bluetooth/btmrvl_main.c | 2 --
drivers/mfd/twl4030-irq.c | 3 ---
drivers/mfd/twl6030-irq.c | 2 --
drivers/staging/rts_pstor/rtsx.c | 2 --
5 files changed, 1 insertions(+), 10 deletions(-)
Acked-by: Gustavo F. Padovan <***@profusion.mobi>

Gustavo
Samuel Ortiz
2011-08-22 15:05:23 UTC
Permalink
Hi Tejun,
Post by Tejun Heo
Some drivers set PF_NOFREEZE in their kthread functions which is
completely unnecessary and racy - some part of freezer code doesn't
consider cases where PF_NOFREEZE is set asynchronous to freezer
operations.
In general, there's no reason to allow setting PF_NOFREEZE explicitly.
Remove them and change the documentation to note that setting
PF_NOFREEZE directly isn't allowed.
For the mfd parts:

Acked-by: Samuel Ortiz <***@linux.intel.com>

Cheers,
Samuel.
--
Intel Open Source Technology Centre
http://oss.intel.com/
Tejun Heo
2011-08-19 14:23:32 UTC
Permalink
Ummm... got mass delivery failure notifications for ***@google.com.
cc'ing the other cgroup maintainer. Li Zefan, do you know what's
wrong with the address? If it has changed, it would probably be a
good idea to update MAINTAINERS. The original thread can be found at

http://thread.gmane.org/gmane.linux.kernel/1181594

Thank you.
--
tejun
Paul Menage
2011-08-19 15:34:44 UTC
Permalink
=2E
cc'ing the other cgroup maintainer. =A0Li Zefan, do you know what's
wrong with the address?
Yep, I left Google at the end of last week. There's a patch in -mm to
update my address to ***@paulmenage.org.

Paul
Tejun Heo
2011-08-19 16:25:56 UTC
Permalink
Hello, Matt. Paul suggested pinging you about the patchset.

=A0http://thread.gmane.org/gmane.linux.kernel/1181594

It's fixes/simplifications for freezer (w/ more to come).

Thanks.

--=20
tejun
Matt Helsley
2011-08-24 01:10:43 UTC
Permalink
Post by Tejun Heo
Hello, Matt. Paul suggested pinging you about the patchset.
=20
=C2=A0http://thread.gmane.org/gmane.linux.kernel/1181594
=20
It's fixes/simplifications for freezer (w/ more to come).
=20
Thanks.
OK, I'll give it a look. Thanks!

Cheers,
-Matt
Rafael J. Wysocki
2011-08-19 21:00:55 UTC
Permalink
Post by Tejun Heo
Hello,
The freezer code has developed a number of convolutions and bugs.
It's now using five per-task flags - TIF_FREEZE, PF_FREEZING,
PF_NOFREEZE, PF_FROZEN, PF_FREEZER_SKIP and PF_FREEZER_NOSIG, and at
the same time has quite a few race conditions. PF_NOFREEZE
modifications can race against PM freezer, cgroup_freezer can race
against PM freezer, and so on.
This patchset tries to simplify the freezer implementation and fix the
various bugs. It makes the synchronization more straight forward and
replaces TIF_FREEZE with directly checking freeze conditions which are
in effect, which makes the whole thing much saner.
This patchset removes TIF_FREEZE and PF_FREEZING. Also,
PF_FREEZER_SKIP users are planned to move away from the flag and will
be removed. It contains the following 16 patches.
0001-freezer-fix-current-state-restoration-race-in-refrig.patch
0002-freezer-don-t-unnecessarily-set-PF_NOFREEZE-explicit.patch
0003-freezer-unexport-refrigerator-and-update-try_to_free.patch
0004-freezer-implement-and-use-kthread_freezable_should_s.patch
0005-freezer-rename-thaw_process-to-__thaw_task-and-simpl.patch
0006-freezer-make-exiting-tasks-properly-unfreezable.patch
0007-freezer-don-t-distinguish-nosig-tasks-on-thaw.patch
0008-freezer-use-dedicated-lock-instead-of-task_lock-memo.patch
0009-freezer-make-freezing-indicate-freeze-condition-in-e.patch
0010-freezer-fix-set_freezable-_with_signal-race.patch
0011-freezer-kill-PF_FREEZING.patch
0012-freezer-clean-up-freeze_processes-failure-path.patch
0013-cgroup_freezer-prepare-for-removal-of-TIF_FREEZE.patch
0014-freezer-make-freezing-test-freeze-conditions-in-effe.patch
0015-freezer-remove-now-unused-TIF_FREEZE.patch
0016-freezer-remove-should_send_signal-and-update-frozen.patch
This patchset is on top of the current linus#master (01b883358b "Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc") and
available in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
Good work, thanks for doing this. I'd like this to go through my
tree if you don't mind, so please let me know when the patchset is
ready for me to pull from your branch.

Also please CC linux-pm on PM-related patches.

Thanks,
Rafael
Tejun Heo
2011-08-20 08:14:53 UTC
Permalink
Hey,
Post by Rafael J. Wysocki
Post by Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
Good work, thanks for doing this. I'd like this to go through my
tree if you don't mind, so please let me know when the patchset is
ready for me to pull from your branch.
Will update the patchset w/ acked-by's and send pull request soon.
Post by Rafael J. Wysocki
Also please CC linux-pm on PM-related patches.
Will do so.

Thanks.
--
tejun
Srivatsa S. Bhat
2011-09-05 08:52:42 UTC
Permalink
Post by Tejun Heo
Hello,
The freezer code has developed a number of convolutions and bugs.
It's now using five per-task flags - TIF_FREEZE, PF_FREEZING,
PF_NOFREEZE, PF_FROZEN, PF_FREEZER_SKIP and PF_FREEZER_NOSIG, and at
the same time has quite a few race conditions. PF_NOFREEZE
modifications can race against PM freezer, cgroup_freezer can race
against PM freezer, and so on.
This patchset tries to simplify the freezer implementation and fix the
various bugs. It makes the synchronization more straight forward and
replaces TIF_FREEZE with directly checking freeze conditions which are
in effect, which makes the whole thing much saner.
This patchset removes TIF_FREEZE and PF_FREEZING. Also,
PF_FREEZER_SKIP users are planned to move away from the flag and will
be removed. It contains the following 16 patches.
0001-freezer-fix-current-state-restoration-race-in-refrig.patch
0002-freezer-don-t-unnecessarily-set-PF_NOFREEZE-explicit.patch
0003-freezer-unexport-refrigerator-and-update-try_to_free.patch
0004-freezer-implement-and-use-kthread_freezable_should_s.patch
0005-freezer-rename-thaw_process-to-__thaw_task-and-simpl.patch
0006-freezer-make-exiting-tasks-properly-unfreezable.patch
0007-freezer-don-t-distinguish-nosig-tasks-on-thaw.patch
0008-freezer-use-dedicated-lock-instead-of-task_lock-memo.patch
0009-freezer-make-freezing-indicate-freeze-condition-in-e.patch
0010-freezer-fix-set_freezable-_with_signal-race.patch
0011-freezer-kill-PF_FREEZING.patch
0012-freezer-clean-up-freeze_processes-failure-path.patch
0013-cgroup_freezer-prepare-for-removal-of-TIF_FREEZE.patch
0014-freezer-make-freezing-test-freeze-conditions-in-effe.patch
0015-freezer-remove-now-unused-TIF_FREEZE.patch
0016-freezer-remove-should_send_signal-and-update-frozen.patch
This patchset is on top of the current linus#master (01b883358b "Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc") and
available in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
Hi,

I was testing out Tejun's above mentioned freezer patchset in different scenarios.
While running CPU hot-plug stress test and kernel compilation in the background
and simultaneously testing the suspend infrastructure using the pm_test framework
(at the freezer level), after a few minutes, it reported failure to freeze
tasks within 20 seconds.

This could be a CPU hotplug issue too, since a "possible circular locking dependency
detected" warning was encountered, some time before task freezing failure was hit.

Here is a an excerpt of the log:

Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.058681] Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.067717] invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.074901] ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.082551] ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.090199] ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.097847] Call Trace:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.100383] [<ffffffff81532de5>] schedule_timeout+0x235/0x320
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.106308] [<ffffffff810a8630>] ? __lock_acquired+0x280/0x2f0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.112315] [<ffffffff8153292c>] ? wait_for_common+0x3c/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.118227] [<ffffffff81532a03>] ? wait_for_common+0x113/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.124224] [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.130052] [<ffffffff81064de0>] ? try_to_wake_up+0x300/0x300
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.135969] [<ffffffff8107d64a>] ? mod_timer+0x15a/0x2c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.141445] [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.147445] [<ffffffff81364486>] _request_firmware+0x156/0x2c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.153447] [<ffffffff81364686>] request_firmware+0x16/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.159190] [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.166318] [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.173354] [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.180137] [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.186135] [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.192572] [<ffffffff8106d000>] __cpu_notify+0x20/0x40
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.197965] [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.203011] [<ffffffff8152d07b>] cpu_up+0xd9/0xec
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.207884] [<ffffffff8151e599>] store_online+0x99/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.213273] [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.218666] [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.224495] [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.229713] [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.234846] [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.240957] bash D 0000000000000000 5784 23638 17550 0x00000084
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.248136] ffff88046068bd88 0000000000000046 ffff88046068bfd8 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.255780] ffff88046068a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.263423] ffff88046068bfd8 00000000001d3a00 ffff8801f1592180 ffff88046d59a4c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.271072] Call Trace:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.273601] [<ffffffff81533653>] __mutex_lock_common+0x193/0x3f0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.279778] [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.286292] [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.292812] [<ffffffff815339d7>] mutex_lock_nested+0x37/0x50
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.298634] [<ffffffff810315f7>] cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.304980] [<ffffffff8151e532>] store_online+0x32/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.310371] [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.315766] [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.321591] [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.326808] [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.331938] [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun 9 11:29:40 istl-vmc-blade9 firmware.sh[23918]: Cannot find firmware file 'intel-ucode/06-2c-02'
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.338039] Restarting tasks ...


I have attached the config file and the log with this mail.
--
Regards,
Srivatsa S. Bhat <***@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab
Tejun Heo
2011-09-05 14:15:12 UTC
Permalink
Hello,
Post by Srivatsa S. Bhat
This could be a CPU hotplug issue too, since a "possible circular locking dependency
detected" warning was encountered, some time before task freezing failure was hit.
That one seems like a separate issue from the freezing failure. As
the lockdep warning involves SLAB, I'm cc'ing SLAB maintainers.
Christoph and Pekka, please skip to the lockdep warning.
Post by Srivatsa S. Bhat
invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
[<ffffffff81532de5>] schedule_timeout+0x235/0x320
[<ffffffff81532a0b>] wait_for_common+0x11b/0x170
[<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
[<ffffffff81364486>] _request_firmware+0x156/0x2c0
[<ffffffff81364686>] request_firmware+0x16/0x20
[<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
[<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
[<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
[<ffffffff8152d07b>] cpu_up+0xd9/0xec
[<ffffffff8151e599>] store_online+0x99/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
So, this task is trying to bring a CPU up, which triggers firmware
helper to load microcode. Firmware class currently sleeps
non-interruptibly to wait for firmware load to complete, which is
performed by another userland task. Now, the PM freezer doesn't
assume that there will be non-freezable wait dependencies among
userland tasks. It only knows two levels - userland and kernel tasks
- and assumes that the former group may have non-freezable wait
dependency on the latter but there's no such dependency among each
group itself. If there's such dependency, PM freezer may fail, which
is what happened here.

ie. the firmware loader userland process got frozen first.
invert_cpu_stat trying to bring up CPU was waiting for the firmware
loader to finish in non-interruptible sleep, so the freezer couldn't
proceed.
Post by Srivatsa S. Bhat
bash D 0000000000000000 5784 23638 17550 0x00000084
ffff88046068bd88 0000000000000046 ffff88046068bfd8 00000000001d3a00
ffff88046068a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
ffff88046068bfd8 00000000001d3a00 ffff8801f1592180 ffff88046d59a4c0
[<ffffffff81533653>] __mutex_lock_common+0x193/0x3f0
[<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
[<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
[<ffffffff815339d7>] mutex_lock_nested+0x37/0x50
[<ffffffff810315f7>] cpu_hotplug_driver_lock+0x17/0x20
[<ffffffff8151e532>] store_online+0x32/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
This one is just blocked on the first process.

I don't know. Maybe we can improve firmware loader such that it can
handle freezing while waiting for completion
(wait_for_completion_freezable?).

And, here's the lockdep warning. Christoph, Pekka, any ideas?
Post by Srivatsa S. Bhat
=======================================================
[ INFO: possible circular locking dependency detected ]
3.1.0-rc2 #1
-------------------------------------------------------
(alc_key){..-...}, at: [<ffffffff81168fa9>] kmem_cache_free+0x1a9/0x240
(&(&parent->list_lock)->rlock){-.-...}, at: [<ffffffff8152e610>] cpuup_canceled+0x82/0x1a3
which lock already depends on the new lock.
[<ffffffff810ab50c>] validate_chain+0x6cc/0x7d0
[<ffffffff810ab914>] __lock_acquire+0x304/0x500
[<ffffffff810ac1d2>] lock_acquire+0xa2/0x130
[<ffffffff815349b6>] _raw_spin_lock+0x36/0x70
[<ffffffff81169294>] __drain_alien_cache+0x64/0xa0
[<ffffffff811698cb>] kfree+0x1db/0x2a0
[<ffffffff81169a21>] free_alien_cache+0x91/0xa0
[<ffffffff8152e9b9>] cpuup_prepare+0x168/0x1a9
[<ffffffff8152ea2f>] cpuup_callback+0x35/0xc5
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8152cf02>] _cpu_up+0x6e/0x10e
[<ffffffff8152d07b>] cpu_up+0xd9/0xec
[<ffffffff81e21bd6>] smp_init+0x41/0x96
[<ffffffff81e03791>] kernel_init+0x1ef/0x2a6
[<ffffffff81540184>] kernel_thread_helper+0x4/0x10
[<ffffffff810aae18>] check_prev_add+0x528/0x550
[<ffffffff810ab50c>] validate_chain+0x6cc/0x7d0
[<ffffffff810ab914>] __lock_acquire+0x304/0x500
[<ffffffff810ac1d2>] lock_acquire+0xa2/0x130
[<ffffffff815349b6>] _raw_spin_lock+0x36/0x70
[<ffffffff81168fa9>] kmem_cache_free+0x1a9/0x240
[<ffffffff81169094>] slab_destroy+0x54/0x80
[<ffffffff8116911d>] free_block+0x5d/0x170
[<ffffffff8152e62f>] cpuup_canceled+0xa1/0x1a3
[<ffffffff8152ea96>] cpuup_callback+0x9c/0xc5
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8106d035>] cpu_notify_nofail+0x15/0x30
[<ffffffff8151beed>] _cpu_down+0x12d/0x2b0
[<ffffffff8151c0a6>] cpu_down+0x36/0x50
[<ffffffff8151e571>] store_online+0x71/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
CPU0 CPU1
---- ----
lock(&(&parent->list_lock)->rlock);
lock(alc_key);
lock(&(&parent->list_lock)->rlock);
lock(alc_key);
Thanks.
--
tejun
Tejun Heo
2011-09-06 05:08:31 UTC
Permalink
Hello, again.
Post by Tejun Heo
Post by Srivatsa S. Bhat
invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
[<ffffffff81532de5>] schedule_timeout+0x235/0x320
[<ffffffff81532a0b>] wait_for_common+0x11b/0x170
[<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
[<ffffffff81364486>] _request_firmware+0x156/0x2c0
[<ffffffff81364686>] request_firmware+0x16/0x20
[<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
[<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
[<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
[<ffffffff8152d07b>] cpu_up+0xd9/0xec
[<ffffffff8151e599>] store_online+0x99/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
So, this task is trying to bring a CPU up, which triggers firmware
helper to load microcode. Firmware class currently sleeps
non-interruptibly to wait for firmware load to complete, which is
performed by another userland task. Now, the PM freezer doesn't
assume that there will be non-freezable wait dependencies among
userland tasks. It only knows two levels - userland and kernel tasks
- and assumes that the former group may have non-freezable wait
dependency on the latter but there's no such dependency among each
group itself. If there's such dependency, PM freezer may fail, which
is what happened here.
ie. the firmware loader userland process got frozen first.
invert_cpu_stat trying to bring up CPU was waiting for the firmware
loader to finish in non-interruptible sleep, so the freezer couldn't
proceed.
Hmmm... I went through the code again and usermodehelper_disable()
seems to be there to prevent deadlocks like this. usermode helpers
are drained & plugged before freezing is tried. Rafael, the above
shouldn't be happening, right?

Thanks.
--
tejun
Rafael J. Wysocki
2011-09-06 06:01:09 UTC
Permalink
Post by Tejun Heo
Hello, again.
Post by Tejun Heo
Post by Srivatsa S. Bhat
invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
[<ffffffff81532de5>] schedule_timeout+0x235/0x320
[<ffffffff81532a0b>] wait_for_common+0x11b/0x170
[<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
[<ffffffff81364486>] _request_firmware+0x156/0x2c0
[<ffffffff81364686>] request_firmware+0x16/0x20
[<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
[<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
[<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
[<ffffffff8152d07b>] cpu_up+0xd9/0xec
[<ffffffff8151e599>] store_online+0x99/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
So, this task is trying to bring a CPU up, which triggers firmware
helper to load microcode. Firmware class currently sleeps
non-interruptibly to wait for firmware load to complete, which is
performed by another userland task. Now, the PM freezer doesn't
assume that there will be non-freezable wait dependencies among
userland tasks. It only knows two levels - userland and kernel tasks
- and assumes that the former group may have non-freezable wait
dependency on the latter but there's no such dependency among each
group itself. If there's such dependency, PM freezer may fail, which
is what happened here.
ie. the firmware loader userland process got frozen first.
invert_cpu_stat trying to bring up CPU was waiting for the firmware
loader to finish in non-interruptible sleep, so the freezer couldn't
proceed.
Hmmm... I went through the code again and usermodehelper_disable()
seems to be there to prevent deadlocks like this. usermode helpers
are drained & plugged before freezing is tried. Rafael, the above
shouldn't be happening, right?
No, it shouldn't in theory, but I'm not sure any more after the recent
modifications of firmware loading related to the initialization. I'll have
a closer look tomorrow.

Thanks,
Rafael
Srivatsa S. Bhat
2011-10-02 19:13:42 UTC
Permalink
Post by Rafael J. Wysocki
Post by Tejun Heo
Hello, again.
Post by Tejun Heo
Post by Srivatsa S. Bhat
invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
[<ffffffff81532de5>] schedule_timeout+0x235/0x320
[<ffffffff81532a0b>] wait_for_common+0x11b/0x170
[<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
[<ffffffff81364486>] _request_firmware+0x156/0x2c0
[<ffffffff81364686>] request_firmware+0x16/0x20
[<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
[<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
[<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
[<ffffffff8152d07b>] cpu_up+0xd9/0xec
[<ffffffff8151e599>] store_online+0x99/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
So, this task is trying to bring a CPU up, which triggers firmware
helper to load microcode. Firmware class currently sleeps
non-interruptibly to wait for firmware load to complete, which is
performed by another userland task. Now, the PM freezer doesn't
assume that there will be non-freezable wait dependencies among
userland tasks. It only knows two levels - userland and kernel tasks
- and assumes that the former group may have non-freezable wait
dependency on the latter but there's no such dependency among each
group itself. If there's such dependency, PM freezer may fail, which
is what happened here.
ie. the firmware loader userland process got frozen first.
invert_cpu_stat trying to bring up CPU was waiting for the firmware
loader to finish in non-interruptible sleep, so the freezer couldn't
proceed.
Hmmm... I went through the code again and usermodehelper_disable()
seems to be there to prevent deadlocks like this. usermode helpers
are drained & plugged before freezing is tried. Rafael, the above
shouldn't be happening, right?
No, it shouldn't in theory, but I'm not sure any more after the recent
modifications of firmware loading related to the initialization. I'll have
a closer look tomorrow.
Hi,
I have posted a fix for this bug at https://lkml.org/lkml/2011/10/2/142
With my fix, the numerous "WARNING"s at drivers/base/firmware_class.c
disappear and the task freezing failures are fixed too.
I have tested this for about 10-12 hours (much more time than what was
necessary to reproduce the bug earlier).
--
Regards,
Srivatsa S. Bhat <***@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab
Rafael J. Wysocki
2011-10-02 19:33:59 UTC
Permalink
Post by Srivatsa S. Bhat
Post by Rafael J. Wysocki
Post by Tejun Heo
Hello, again.
Post by Tejun Heo
Post by Srivatsa S. Bhat
invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
[<ffffffff81532de5>] schedule_timeout+0x235/0x320
[<ffffffff81532a0b>] wait_for_common+0x11b/0x170
[<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
[<ffffffff81364486>] _request_firmware+0x156/0x2c0
[<ffffffff81364686>] request_firmware+0x16/0x20
[<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
[<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
[<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
[<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
[<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8106d000>] __cpu_notify+0x20/0x40
[<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
[<ffffffff8152d07b>] cpu_up+0xd9/0xec
[<ffffffff8151e599>] store_online+0x99/0xd0
[<ffffffff81355eb0>] sysdev_store+0x20/0x30
[<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
[<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
[<ffffffff8117f024>] sys_write+0x54/0xa0
[<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
So, this task is trying to bring a CPU up, which triggers firmware
helper to load microcode. Firmware class currently sleeps
non-interruptibly to wait for firmware load to complete, which is
performed by another userland task. Now, the PM freezer doesn't
assume that there will be non-freezable wait dependencies among
userland tasks. It only knows two levels - userland and kernel tasks
- and assumes that the former group may have non-freezable wait
dependency on the latter but there's no such dependency among each
group itself. If there's such dependency, PM freezer may fail, which
is what happened here.
ie. the firmware loader userland process got frozen first.
invert_cpu_stat trying to bring up CPU was waiting for the firmware
loader to finish in non-interruptible sleep, so the freezer couldn't
proceed.
Hmmm... I went through the code again and usermodehelper_disable()
seems to be there to prevent deadlocks like this. usermode helpers
are drained & plugged before freezing is tried. Rafael, the above
shouldn't be happening, right?
No, it shouldn't in theory, but I'm not sure any more after the recent
modifications of firmware loading related to the initialization. I'll have
a closer look tomorrow.
Hi,
I have posted a fix for this bug at https://lkml.org/lkml/2011/10/2/142
With my fix, the numerous "WARNING"s at drivers/base/firmware_class.c
disappear and the task freezing failures are fixed too.
I have tested this for about 10-12 hours (much more time than what was
necessary to reproduce the bug earlier).
I saw the fix, thanks for it,
Rafael

Srivatsa S. Bhat
2011-09-05 06:49:06 UTC
Permalink
Post by Rafael J. Wysocki
Post by Tejun Heo
Hello,
The freezer code has developed a number of convolutions and bugs.
It's now using five per-task flags - TIF_FREEZE, PF_FREEZING,
PF_NOFREEZE, PF_FROZEN, PF_FREEZER_SKIP and PF_FREEZER_NOSIG, and at
the same time has quite a few race conditions. PF_NOFREEZE
modifications can race against PM freezer, cgroup_freezer can race
against PM freezer, and so on.
This patchset tries to simplify the freezer implementation and fix the
various bugs. It makes the synchronization more straight forward and
replaces TIF_FREEZE with directly checking freeze conditions which are
in effect, which makes the whole thing much saner.
This patchset removes TIF_FREEZE and PF_FREEZING. Also,
PF_FREEZER_SKIP users are planned to move away from the flag and will
be removed. It contains the following 16 patches.
0001-freezer-fix-current-state-restoration-race-in-refrig.patch
0002-freezer-don-t-unnecessarily-set-PF_NOFREEZE-explicit.patch
0003-freezer-unexport-refrigerator-and-update-try_to_free.patch
0004-freezer-implement-and-use-kthread_freezable_should_s.patch
0005-freezer-rename-thaw_process-to-__thaw_task-and-simpl.patch
0006-freezer-make-exiting-tasks-properly-unfreezable.patch
0007-freezer-don-t-distinguish-nosig-tasks-on-thaw.patch
0008-freezer-use-dedicated-lock-instead-of-task_lock-memo.patch
0009-freezer-make-freezing-indicate-freeze-condition-in-e.patch
0010-freezer-fix-set_freezable-_with_signal-race.patch
0011-freezer-kill-PF_FREEZING.patch
0012-freezer-clean-up-freeze_processes-failure-path.patch
0013-cgroup_freezer-prepare-for-removal-of-TIF_FREEZE.patch
0014-freezer-make-freezing-test-freeze-conditions-in-effe.patch
0015-freezer-remove-now-unused-TIF_FREEZE.patch
0016-freezer-remove-should_send_signal-and-update-frozen.patch
This patchset is on top of the current linus#master (01b883358b "Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc") and
available in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
Good work, thanks for doing this. I'd like this to go through my
tree if you don't mind, so please let me know when the patchset is
ready for me to pull from your branch.
Also please CC linux-pm on PM-related patches.
Hi,

I was testing out Tejun's above mentioned freezer patchset in different scenarios.
While running CPU hot-plug stress test and kernel compilation in the background
and simultaneously testing the suspend infrastructure using the pm_test framework
(at the freezer level), after a few minutes, it reported failure to freeze
tasks within 20 seconds.

This could be a CPU hotplug issue too, since a "possible circular locking dependency
detected" warning was encountered, some time before task freezing failure was hit.

Here is a an excerpt of the log:

Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.058681] Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.067717] invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.074901] ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.082551] ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.090199] ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.097847] Call Trace:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.100383] [<ffffffff81532de5>] schedule_timeout+0x235/0x320
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.106308] [<ffffffff810a8630>] ? __lock_acquired+0x280/0x2f0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.112315] [<ffffffff8153292c>] ? wait_for_common+0x3c/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.118227] [<ffffffff81532a03>] ? wait_for_common+0x113/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.124224] [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.130052] [<ffffffff81064de0>] ? try_to_wake_up+0x300/0x300
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.135969] [<ffffffff8107d64a>] ? mod_timer+0x15a/0x2c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.141445] [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.147445] [<ffffffff81364486>] _request_firmware+0x156/0x2c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.153447] [<ffffffff81364686>] request_firmware+0x16/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.159190] [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.166318] [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.173354] [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.180137] [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.186135] [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.192572] [<ffffffff8106d000>] __cpu_notify+0x20/0x40
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.197965] [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.203011] [<ffffffff8152d07b>] cpu_up+0xd9/0xec
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.207884] [<ffffffff8151e599>] store_online+0x99/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.213273] [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.218666] [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.224495] [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.229713] [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.234846] [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.240957] bash D 0000000000000000 5784 23638 17550 0x00000084
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.248136] ffff88046068bd88 0000000000000046 ffff88046068bfd8 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.255780] ffff88046068a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.263423] ffff88046068bfd8 00000000001d3a00 ffff8801f1592180 ffff88046d59a4c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.271072] Call Trace:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.273601] [<ffffffff81533653>] __mutex_lock_common+0x193/0x3f0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.279778] [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.286292] [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.292812] [<ffffffff815339d7>] mutex_lock_nested+0x37/0x50
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.298634] [<ffffffff810315f7>] cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.304980] [<ffffffff8151e532>] store_online+0x32/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.310371] [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.315766] [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.321591] [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.326808] [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.331938] [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun 9 11:29:40 istl-vmc-blade9 firmware.sh[23918]: Cannot find firmware file 'intel-ucode/06-2c-02'
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.338039] Restarting tasks ...


I have attached the config file and the detailed log with this mail.
--
Regards,
Srivatsa S. Bhat <***@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab
Loading...