The recent perl transition has turned the use of defined(@array) into an
error. The defined is simply not required. Remove it to quiet the
error.
BugLink: http://bugs.launchpad.net/bugs/1549677
Signed-off-by: Andy Whitcroft <apw@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1473584
When the audit=1 kernel parameter is absent and auditd is not running,
AUDIT_USER_AVC messages are being silently discarded.
AUDIT_USER_AVC messages should be sent to userspace using printk(), as
mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the
audit-disabled case for discarding user messages").
When audit_enabled is 0, audit_receive_msg() discards all user messages
except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg()
refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to
special case AUDIT_USER_AVC messages in both functions.
It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()")
introduced this bug.
Cc: <stable@kernel.org> # v2.6.25+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
(cherry picked from commit 0868a5e150bc4c47e7a003367cd755811eb41e0b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit 54a217887a7b658e2650c3feff22756ab80c7339 upstream.
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex. We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.
The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address. This can lead to state leakage and worse under some
circumstances.
Handle the cases explicit:
Waiter | pi_state | pi->owner | uTID | uODIED | ?
[1] NULL | --- | --- | 0 | 0/1 | Valid
[2] NULL | --- | --- | >0 | 0/1 | Valid
[3] Found | NULL | -- | Any | 0/1 | Invalid
[4] Found | Found | NULL | 0 | 1 | Valid
[5] Found | Found | NULL | >0 | 1 | Invalid
[6] Found | Found | task | 0 | 1 | Valid
[7] Found | Found | NULL | Any | 0 | Invalid
[8] Found | Found | task | ==taskTID | 0/1 | Valid
[9] Found | Found | task | 0 | 0 | Invalid
[10] Found | Found | task | !=taskTID | 0/1 | Invalid
[1] Indicates that the kernel can acquire the futex atomically. We
came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
[2] Valid, if TID does not belong to a kernel thread. If no matching
thread is found then it indicates that the owner TID has died.
[3] Invalid. The waiter is queued on a non PI futex
[4] Valid state after exit_robust_list(), which sets the user space
value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
[5] The user space value got manipulated between exit_robust_list()
and exit_pi_state_list()
[6] Valid state after exit_pi_state_list() which sets the new owner in
the pi_state but cannot access the user space value.
[7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
[8] Owner and user space value match
[9] There is no transient state which sets the user space TID to 0
except exit_robust_list(), but this is indicated by the
FUTEX_OWNER_DIED bit. See [4]
[10] There is no transient state which leaves owner and user space
TID out of sync.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e upstream.
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex. So the owner TID of the current owner
(the unlocker) persists. That's observable inconsistant state,
especially when the ownership of the pi state got transferred.
Clean it up unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270 upstream.
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.
Verify whether the futex has waiters associated with kernel state. If
it has, return -EINVAL. The state is corrupted already, so no point in
cleaning it up. Subsequent calls will fail as well. Not our problem.
[ tglx: Use futex_top_waiter() and explain why we do not need to try
restoring the already corrupted user space state. ]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit e9c243a5a6de0be8e584c604d353412584b592f8 upstream.
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call. If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.
This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")
[ tglx: Compare the resulting keys as well, as uaddrs might be
different depending on the mapping ]
Fixes CVE-2014-3153.
Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit f12d5bfceb7e1f9051563381ec047f7f13956c3c upstream.
The hugepage code had the exact same bug that regular pages had in
commit 7485d0d375 ("futexes: Remove rw parameter from
get_futex_key()").
The regular page case was fixed by commit 9ea71503a8 ("futex: Fix
regression with read only mappings"), but the transparent hugepage case
(added in a5b338f2b0: "thp: update futex compound knowledge") case
remained broken.
Found by Dave Jones and his trinity tool.
Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit aa10990e028cac3d5e255711fb9fb47e00700e35 upstream.
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.
While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.
Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.
This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.
[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit 59fa6245192159ab5e1e17b8e31f15afa9cff4bf upstream.
Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076
The detailed problem analysis shows:
Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.
T1 lock_futex_pi(F);
T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.
T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.
T3 lock_futex_pi(F);
--> Succeeds due to the check for F's userspace TID field == 0
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes T2 via
rt_mutex_unlock(&pi_state->mutex)
T2 --> acquires pi_state->mutex and gains real ownership of the
pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
T3 --> observes inconsistent state
This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.
So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:
if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.
AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.
Now the proposed change
- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+ if (unlikely(ownerdied ||
+ !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {
solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.
What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state->mutex). Probelm solved!
Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.
Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.
lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.
So the above scenario changes to:
T1 lock_futex_pi(F);
T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.
T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.
T3 lock_futex_pi(F);
--> Finds pi_state and blocks on pi_state->rt_mutex
T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes it via
rt_mutex_unlock(&pi_state->mutex)
T2 --> acquires pi_state->mutex and gains ownership of the pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.
Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.
As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)
Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2014-3153
commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.
If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2013-0871
commit 9067ac85d533651b98c2ff903182a20cbb361fcb upstream.
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().
TASK_ALL has no other users, probably can be killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2013-0871
commit 9899d11f654474d2d54ea52ceaa2a1f4db3abd68 upstream.
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack. However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.
set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.
As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it. Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.
Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().
While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2013-0871
commit 910ffdb18a6408e14febbb6e4b6840fd2c928c82 upstream.
Cleanup and preparation for the next change.
signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.
Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.
This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
CVE-2012-0957
commit 31fd84b95eb211d5db460a1dda85e004800a7b52 upstream.
The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:
kernel/sys.c: In function 'override_release':
kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1266546
commit d049f74f2dbe71354d43d393ac3a188947811348 upstream.
The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.
Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }
Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }
Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)
The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.
CVE-2013-2929
Reported-by: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[tyhicks: Backported to 3.4]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Change-Id: Ie2ffc0967d4ffe7ee4c70781313c7b00cf7e3092
Signed-off-by: Colin Cross <ccross@android.com>
Device rebooted by kernel panic but no panic message.
This helps to get the panic messages in this case.
Bug: 10184128
Change-Id: I24545b1ce2f9e3e239795089d8564c2cc5f89fe7
Signed-off-by: shihhao.feng <shihhao.feng@lge.com>
Signed-off-by: Devin Kim <dojip.kim@lge.com>
commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream.
With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.
[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ed Tam <etam@google.com>
commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream.
We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.
This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool. Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool. This
assures that /dev/urandom is returning unpredictable data as soon as
possible.
(Based on an original patch by Linus, but significantly modified by
tytso.)
Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ed Tam <etam@google.com>
Avoid waking up every thread sleeping in a sigtimedwait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Change-Id: Ic27469b60a67d50cdc0d0c78975951a99c25adcd
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid waking up every thread sleeping in a nanosleep call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Change-Id: I93383201d4dd62130cd9a9153842d303fc2e2986
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Change-Id: I9ccab9c2d201adb66c85432801cdcf43fc91e94f
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Android goes through suspend/resume very often (every few seconds when
on a busy wifi network with the screen off), and a significant portion
of the energy used to go in and out of suspend is spent in the
freezer. If a task has called freezer_do_not_count(), don't bother
waking it up. If it happens to wake up later it will call
freezer_count() and immediately enter the refrigerator.
Combined with patches to convert freezable helpers to use
freezer_do_not_count() and convert common sites where idle userspace
tasks are blocked to use the freezable helpers, this reduces the
time and energy required to suspend and resume.
Change-Id: I6ba019d24273619849af757a413271da3261d7db
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All tasks can easily be frozen in under 10 ms, switch to using
an initial 1 ms sleep followed by exponential backoff until
8 ms. Also convert the printed time to ms instead of centiseconds.
Change-Id: I7b198b16eefb623c2b0fc45dce50d9bca320afdc
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The only existing caller to debug_check_no_locks_held calls it
with 'current' as the task, and the freezer needs to call
debug_check_no_locks_held but doesn't already have a current
task pointer, so remove the argument. It is already assuming
that the current task is relevant by dumping the current stack
trace as part of the warning.
This was originally part of 6aa9707099c (lockdep: check that
no locks held at freeze time) which was reverted in
dbf520a9d7d4.
Change-Id: Idbaf1332ce6c80dc49c1d31c324c7fbf210657c5
Original-author: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ARM disables interrupts in do_signal, which triggers a warning in
try_to_freeze, see details at https://lkml.org/lkml/2011/8/23/221.
To prevent the warnings, add try_to_freeze_nowarn and call it from
do_signal.
Change-Id: If7482de21c386adc705fa1ac4ecb8c7ece5bb356
Signed-off-by: Colin Cross <ccross@android.com>
The current code makes the assumption that a cpu_base lock cannot
be held if a CPU is offline, which is tragically wrong.
If a hrtimer is not queued, then it will not be migrated by
migrate_hrtimers() when a CPU is offlined. Therefore, the
hrtimer's cpu_base may still point to a CPU which has
subsequently gone offline if the timer wasn't enqueued at the
time the CPU went down.
Normally this wouldn't be a problem, but a cpu_base's lock is
blindly reinitialized each time a CPU is brought up. If a CPU is
brought online during the period that another thread is
performing a hrtimer operation on a stale hrtimer, then the lock
will be reinitialized under its feet, and a SPIN_BUG() like the
following will be observed:
[ 448.212369] BUG: spinlock already unlocked on CPU#3,
swapper/3/0
[ 448.217368] lock: 0xc47b1a40, .magic: dead4ead, .owner:
swapper/3/0, .owner_cpu: -1
[ 448.217398] [<c0014398>] (unwind_backtrace+0x0/0x120) from
[<c0269ab0>] (do_raw_spin_unlock+0x44/0xdc)
[ 448.217419] [<c0269ab0>] (do_raw_spin_unlock+0x44/0xdc) from
[<c071bbb0>] (_raw_spin_unlock_irqrestore+0xc/0x38)
[ 448.217438] [<c071bbb0>]
(_raw_spin_unlock_irqrestore+0xc/0x38) from [<c00aa498>]
(hrtimer_try_to_cancel+0xf8/0x10c)
[ 448.217455] [<c00aa498>] (hrtimer_try_to_cancel+0xf8/0x10c)
from [<c00aa4bc>] (hrtimer_cancel+0x10/0x20)
[ 448.217475] [<c00aa4bc>] (hrtimer_cancel+0x10/0x20) from
[<c00e6934>] (rcu_idle_exit_common+0x5c/0xe0)
[ 448.217494] [<c00e6934>] (rcu_idle_exit_common+0x5c/0xe0) from
[<c00e6ae8>] (rcu_idle_exit+0x9c/0xb4)
[ 448.217514] [<c00e6ae8>] (rcu_idle_exit+0x9c/0xb4) from
[<c000f2dc>] (cpu_idle+0x98/0xf0)
[ 448.217529] [<c000f2dc>] (cpu_idle+0x98/0xf0) from
[<000081ac>] (0x81ac)
CRs-Fixed: 460670
Change-Id: I588f6a864054abe6d4c249aad815c6d2f5a54077
Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
Signed-off-by: Ajay Dudani <adudani@codeaurora.org>
An inactive timer's base can refer to a offline cpu's base.
In the current code, cpu_base's lock is blindly reinitialized
each time a CPU is brought up. If a CPU is brought online
during the period that another thread is trying to modify an
inactive timer on that CPU with holding its timer base lock,
then the lock will be reinitialized under its feet. This leads
to following SPIN_BUG().
<0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
<0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
<4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
<4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
<4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
<4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
<4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
<4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
<4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
<4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
<4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
<4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
<4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
<4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
<4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)
As an example, this particular crash occurred when CPU #3 is executing
mod_timer() on an inactive timer whose base is refered to offlined CPU #2.
The code locked the timer_base corresponding to CPU #2. Before it could
proceed, CPU #2 came online and reinitialized the spinlock corresponding
to its base. Thus now CPU #3 held a lock which was reinitialized. When
CPU #3 finally ended up unlocking the old cpu_base corresponding to CPU #2,
we hit the above SPIN_BUG().
CPU #0 CPU #3 CPU #2
------ ------- -------
..... ...... <Offline>
mod_timer()
lock_timer_base
spin_lock_irqsave(&base->lock)
cpu_up(2) ..... ......
init_timers_cpu()
.... ..... spin_lock_init(&base->lock)
..... spin_unlock_irqrestore(&base->lock) ......
<spin_bug>
Allocation of per_cpu timer vector bases is done only once under
"tvec_base_done[]" check. In the current code, spinlock_initialization
of base->lock isn't under this check. When a CPU is up each time the base
lock is reinitialized. Move base spinlock initialization under the check.
Change-Id: I73b48440fffb227a60af9180e318c851048530dd
Signed-off-by: Tirupathi Reddy <tirupath@codeaurora.org>
Signed-off-by: Ajay Dudani <adudani@codeaurora.org>
Commit 88a7e37d265 (sched: provide per cpu-cgroup option to
notify on migrations) added a notifier call when a task is moved
to a different CPU. Unfortunately the two call sites in the RT
sched class where this occurs happens with a runqueue lock held.
This can result in a deadlock if the notifier call attempts to do
something like wake up a task.
Fortunately the benefit of 88a7e37d265 comes mainly from notifying
on migration of non-RT tasks, so we can simply ignore the movements
of RT tasks.
CRs-Fixed: 491370
Change-Id: I8849d826bf1eeaf85a6f6ad872acb475247c5926
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Mekala Natarajan <mekalan@codeaurora.org>
On systems where CPUs may run asynchronously, task migrations
between CPUs running at grossly different speeds can cause
problems.
This change provides a mechanism to notify a subsystem
in the kernel if a task in a particular cgroup migrates to a
different CPU. Other subsystems (such as cpufreq) may then
register for this notifier to take appropriate action when
such a task is migrated.
The cgroup attribute to set for this behavior is
"notify_on_migrate" .
Change-Id: Ie1868249e53ef901b89c837fdc33b0ad0c0a4590
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Mekala Natarajan <mekalan@codeaurora.org>
This patch corrects a buffer overflow in kernels from 3.0 to 3.4 when calling
log_prefix() function from call_console_drivers().
This bug existed in previous releases but has been revealed with commit
162a7e7500 (2.6.39 => 3.0) that made changes
about how to allocate memory for early printk buffer (use of memblock_alloc).
It disappears with commit 7ff9554bb578ba02166071d2d487b7fc7d860d62 (3.4 => 3.5)
that does a refactoring of printk buffer management.
In log_prefix(), the access to "p[0]", "p[1]", "p[2]" or
"simple_strtoul(&p[1], &endp, 10)" may cause a buffer overflow as this
function is called from call_console_drivers by passing "&LOG_BUF(cur_index)"
where the index must be masked to do not exceed the buffer's boundary.
The trick is to prepare in call_console_drivers() a buffer with the necessary
data (PRI field of syslog message) to be safely evaluated in log_prefix().
This patch can be applied to stable kernel branches 3.0.y, 3.2.y and 3.4.y.
Without this patch, one can freeze a server running this loop from shell :
$ export DUMMY=`cat /dev/urandom | tr -dc '12345AZERTYUIOPQSDFGHJKLMWXCVBNazertyuiopqsdfghjklmwxcvbn' | head -c255`
$ while true do ; echo $DUMMY > /dev/kmsg ; done
The "server freeze" depends on where memblock_alloc does allocate printk buffer :
if the buffer overflow is inside another kernel allocation the problem may not
be revealed, else the server may hangs up.
Signed-off-by: Alexandre SIMON <Alexandre.Simon@univ-lorraine.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the new signal handlers are set up, the location of sa_restorer is
not cleared, leaking a parent process's address space location to
children. This allows for a potential bypass of the parent's ASLR by
examining the sa_restorer value returned when calling sigaction().
Based on what should be considered "secret" about addresses, it only
matters across the exec not the fork (since the VMAs haven't changed
until the exec). But since exec sets SIG_DFL and keeps sa_restorer,
this is where it should be fixed.
Given the few uses of sa_restorer, a "set" function was not written
since this would be the only use. Instead, we use
__ARCH_HAS_SA_RESTORER, as already done in other places.
Example of the leak before applying this patch:
$ cat /proc/$$/maps
...
7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
...
$ ./leak
...
7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
...
1 0 (nil) 0x7fb9f30b94a0
2 4000000 (nil) 0x7f278bcaa4a0
3 4000000 (nil) 0x7f278bcaa4a0
4 0 (nil) 0x7fb9f30b94a0
...
[akpm@linux-foundation.org: use SA_RESTORER for backportability]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Emese Revfy <re.emese@gmail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Julien Tinnes <jln@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().
TASK_ALL has no other users, probably can be killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>