'ret' in __ftrace_function_set_filter() may be used uninitialized
if 're_count' is zero. Fix this to avoid a compiler warning.
Change-Id: I0a257159141d86d92573c28d233a3653e89c48ea
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
(cherry picked from commit 212806984abf1c19d56f5c0c0e72e38a318851e1)
kernel/time/alarmtimer.c conflicts with drivers/rtc/alarm.c,
disable it for now.
Change-Id: I6cdb3b885828d45836a54971adf16143039b0a0e
Signed-off-by: Colin Cross <ccross@android.com>
(cherry picked from commit abbb445f65bbb139202fde5a66f9a249977058c9)
This is a squash of 2 older commits on kernel/sched.c
commit 099aa69b9cfb6f4c5b56dd1d1d06ce9ef92cf2d5
Author: Steve Muckle <smuckle@codeaurora.org>
Date: Tue Feb 28 14:07:39 2012 -0800
kernel: reduce sleep duration in wait_task_inactive
Sleeping for an entire tick adds unnecessary latency to
hotplugging a cpu (cpu_up).
Change-Id: Iab323a79f4048bc9101ecfd368e0f275827ed4ab
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
commit 52984e96358c55f89947c6de6e63d70261479f67
Author: Jeff Ohlstein <johlstei@codeaurora.org>
Date: Wed Jun 23 12:59:04 2010 -0700
sched: Extend completion api to allow io_wait time tracking
Adds a function wait_for_completion_io which behaves like
wait_for_completion, except it calls io_schedule instead of
schedule. This indicates that the process waiting on the
completion is waiting on an io event, and keeps statistics
accordingly.
Change-Id: I2514d62ff7f26441782a4cbebc4a18c07bb5ad74
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Enabling SCHED_HRTICK currently results in rq->lock recursion and a hard
hang at bootup. Essentially try_to_wakeup() grabs rq->lock and tries
arming a hrtimer via hrtimer_restart(), which deep down tries waking up
ksoftirqd, which leads to a recursive call to try_to_wakeup() and thus
attempt to take rq->lock recursively!!
This is fixed by having scheduler queue hrtimer via
__hrtimer_start_range_ns() which avoids waking up ksoftirqd.
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Change-Id: I11a13be1d9db3a749614ccf3d4f5fb7bf6f18fa1
(cherry picked from commit 4ca1d04ea0bdc225cc7db302172f3375a63f44de)
The stm_log call in tracing_mark_write was logging the complete
internal buffer data structure instead of only the data part.
Changing the call to only log the data.
Change-Id: I33e800cd9b1dc1d27d519c74db0cf5bb6ef6e3f5
Signed-off-by: Pushkar Joshi <pushkarj@codeaurora.org>
(cherry picked from commit 949ddf3099ab51ea34bc16b44c0aec7fafd4105d)
The trace_printk, when configured to process printk strings before
storing them in the internal ring buffer, currently also logs the
complete internal buffer data structure to the STM. Instead it
should only log the string output obtained after processing the
printk format and arguments. Changing the stm_log call to only
log this relevant data.
Change-Id: Ia33109f95fb84fa1606247a861deeaedd2f95d3f
Signed-off-by: Pushkar Joshi <pushkarj@codeaurora.org>
(cherry picked from commit aaa6da531fe2cbf685791d217f3ed6bd08392a43)
Dup ftrace event traffic (including writes to trace_marker file from
userspace) to STM. Also dup printk traffic to STM. This allows Linux
tracing and log data to be correlated with other data transported over
STM.
Change-Id: Ieb0b856447f7667eb0005a6a884211dc46f50217
Signed-off-by: Pratik Patel <pratikp@codeaurora.org>
(cherry picked from commit 8e1e6b65fe92a0fa7bdb787fc7d9c5c0eae3d654)
Conflicts:
include/linux/coresight-stm.h
kernel/printk.c
It is sometimes useful to profile how long CPU frequency switches
take, and traces have already been added for this purpose. Make
use of these and the trace_stat framework to generate statistical
histograms of frequency switch times in the following format:
# cat /sys/kernel/debug/tracing/trace_stat/cpu_freq_switch
CPU START_KHZ END_KHZ COUNT AVG_US MIN_US MAX_US
| | | | | | |
0 384000 1512000 3 2787 1648 3418
0 486000 384000 1 1129 1129 1129
0 1458000 384000 1 3174 3174 3174
0 1512000 384000 1 3265 3265 3265
0 1512000 486000 1 3235 3235 3235
0 1512000 1458000 1 213 213 213
0 1512000 1512000 1 0 0 0
Profiling is disabled by default (since it does incur some
overhead). It can be enabled or re-disabled echoing 1 or 0
to /sys/kernel/debug/tracing/cpu_freq_switch_profile_enabled
Change-Id: I3ef7f9d681b7bd13bcaa031003b10312afe1aefe
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
(cherry picked from commit a629fb0b67b57cc6759da51b9c12750758736c80)
The timer_start event now shows whether the timer is
deferrable in case of a low-res timer. The debug_activate
function now includes deferrable flag while calling
trace_timer_start event. irq_handler_entry
event includes the ISR function in the trace event.
Change-Id: Ia2eeb4fa0fae34b301964144dad8bcef7632487c
Signed-off-by: Badhri Jagan Sridharan <badhris@codeaurora.org>
(cherry picked from commit a2cd6eaf5deaa40098eb6b692797519bc173381e)
It can happen that the scheduler tick stops on cpu 0 but keeps
running on some other cpu. Make the cpu in-charge of updating
the jiffies also update the rq_stats.
Change-Id: Idb1a8132bd96500c68c516b4a99663965cec28e1
Signed-off-by: Amar Singhal <asinghal@codeaurora.org>
(cherry picked from commit f10f2a8bad44078c11378d9a0da025bc4a8e0f15)
Recalculating the sleep length each time its called allows us to account
for the fact that the amount of time we can sleep for might change after
tick_nohz_stop_sched_tick is called in idle. The prime example of this
is an idle notifier that cancels timers as we are entering idle.
Change-Id: I92871efc7befb3fee2a816da16145ba9da334a9e
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
(cherry picked from commit 9feb87d70208e2236d24ef0ac2fa4d0e28e7d335)
With this change, we do the average run queue statistics calculation
in the scheduler tick itself. This helps avoid any extra timers to
do the same. Also doing this calculation in the scheduler tick avoids
any bias if the calculation is done in a workqueue
Change-Id: I854d90acc05cc7a7226487be5555976826d8c837
Signed-off-by: Amar Singhal <asinghal@codeaurora.org>
(cherry picked from commit f49d99bc4168c7937655bb09989cc72525163b40)
During board initialization read the shared memory item
SMEM_POWER_ON_STATUS_INFO and place it in the procfs at
/proc/sys/kernel/boot_reason
The data item is an integer with a bit being set to identify the reason
the device was powered on. The values of this data item is defined in
the document Document/arm/msm/boot.txt, the following is the data in the
documentation file.
power_on_status values set by the PMIC for power on event:
----------------------------------------------------------
0x01 -- keyboard power on
0x02 -- RTC alarm
0x04 -- cable power on
0x08 -- SMPL
0x10 -- Watch Dog timeout
0x20 -- USB charger
0x40 -- Wall charger
0xFF -- error reading power_on_status value
This is change is a response to a customer request described in
JIRA KERNEL-518
Change-Id: I59e665f92e6e29f7dfef4380314f676a2d92c94b
Signed-off-by: Rick Adams <rgadams@codeaurora.org>
(cherry picked from commit 9512d7e26abc9d23a1771533c5300605d70dfaa7)
Conflicts:
arch/arm/include/asm/processor.h
arch/arm/mach-msm/board-msm7x30.c
kernel/sysctl.c
The PF_WAKE_UP_IDLE per-task flag made it impossible to enable
the old behavior of SD_SHARE_PKG_RESOURCES, where every task
migrates to an idle CPU on wakeup.
The sched_wake_to_idle sysctl value, when made nonzero, will cause
all tasks to migrate to an idle CPU if one is available when the
task is woken up. This is regardless of how PF_WAKE_UP_IDLE is
configured for tasks in the system. Similar to PF_WAKE_UP_IDLE,
the SD_SHARE_PKG_RESOURCES scheduler domain flag must be enabled
for the sysctl value to have an effect.
Change-Id: I23bed846d26502c7aed600bfcf1c13053a7e5f61
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
(cherry picked from commit 9d5b38dc0025d19df5b756b16024b4269e73f282)
Conflicts:
kernel/sched/fair.c
Certain workloads may benefit from the SD_SHARE_PKG_RESOURCES behavior
of waking their tasks up on idle CPUs. The feature has too much of a
negative impact on other workloads however to apply globally. The
PF_WAKE_UP_IDLE flag tells the scheduler to wake up tasks that have this
flag set, or tasks woken by tasks with this flag set, on an idle CPU
if one is available.
Change-Id: I20b28faf35029f9395e9d9f5ddd57ce2de795039
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Add code to calculate the run queue depth of a cpu and iowait
depth of the cpu.
The scheduler calls in to sched_update_nr_prod whenever there
is a runqueue change. This function maintains the runqueue average
and the iowait of that cpu in that time interval.
Whoever wants to know the runqueue average is expected to call
sched_get_nr_running_avg periodically to get the accumulated
runqueue and iowait averages for all the cpus.
Change-Id: Id8cb2ecf0ed479f090a83ccb72dd59c53fa73e0c
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
(cherry picked from commit 0299fcaaad80e2c0ac9aa583c95107f6edc27750)
This functionality is currently not available outside of
kernel/resource.c
It is needed in order to find the memory resource corresponding
to removable memory so that it can be cleanly removed.
Change-Id: Iedc785d0df5023c16c60bf2881e5602d45f2b809
Signed-off-by: Larry Bassel <lbassel@codeaurora.org>
(cherry picked from commit 00d3c81438b3e3f827ae720afb17a2e79a604e1e)
Each grace period is supposed to have at least one callback waiting
for that grace period to complete. However, if CONFIG_NO_HZ=n, an
extra callback-free grace period is no big problem -- it will chew up
a tiny bit of CPU time, but it will complete normally. In contrast,
CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
sleep indefinitely, in turn indefinitely delaying completion of the
callback-free grace period. Given that nothing is waiting on this grace
period, this is also not a problem.
That is, unless RCU CPU stall warnings are also enabled, as they are
in recent kernels. In this case, if a CPU wakes up after at least one
minute of inactivity, an RCU CPU stall warning will result. The reason
that no one noticed until quite recently is that most systems have enough
OS noise that they will never remain absolutely idle for a full minute.
But there are some embedded systems with cut-down userspace configurations
that consistently get into this situation.
All this begs the question of exactly how a callback-free grace period
gets started in the first place. This can happen due to the fact that
CPUs do not necessarily agree on which grace period is in progress.
If a CPU still believes that the grace period that just completed is
still ongoing, it will believe that it has callbacks that need to wait for
another grace period, never mind the fact that the grace period that they
were waiting for just completed. This CPU can therefore erroneously
decide to start a new grace period. Note that this can happen in
TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system: Deadlock
considerations mean that the CPU that detected the end of the grace
period is not necessarily officially informed of this fact for some time.
Once this CPU notices that the earlier grace period completed, it will
invoke its callbacks. It then won't have any callbacks left. If no
other CPU has any callbacks, we now have a callback-free grace period.
This commit therefore makes CPUs check more carefully before starting a
new grace period. This new check relies on an array of tail pointers
into each CPU's list of callbacks. If the CPU is up to date on which
grace periods have completed, it checks to see if any callbacks follow
the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
follow the RCU_WAIT_TAIL segment. The reason that this works is that
the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
as soon as the CPU is officially notified that the old grace period
has ended.
This change is to cpu_needs_another_gp(), which is called in a number
of places. The only one that really matters is in rcu_start_gp(), where
the root rcu_node structure's ->lock is held, which prevents any
other CPU from starting or completing a grace period, so that the
comparison that determines whether the CPU is missing the completion
of a grace period is stable.
Change-Id: Ic77e2bbe865059aa6f363db284473e9adcf8797e
Reported-by: Becky Bruce <bgillbruce@gmail.com>
Reported-by: Subodh Nijsure <snijsure@grid-net.com>
Reported-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Walmsley <paul@pwsan.com> # OMAP3730, OMAP4430
Cc: stable@vger.kernel.org
Signed-off-by: Jin Hong <jinh@codeaurora.org>
(cherry picked from commit a389ade9175fa41dd1513a8ba049f0d3b80ad372)
The flushing of the console in the CPU_ONLINE callback on a hotplug
is rather slow, taking up to 11 ms to complete. Defer this to lower
hotplug latency.
Change-Id: If96089fe2c6cd6024d1a9e2bac35296aea75cf9e
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
(cherry picked from commit 6517f202c7d255a5d315d869605b506516e13a05)
Conflicts:
kernel/printk.c
Log printk events to the uncached buffer. If a reset occurs
and printk data is still present in the cache, information will
be lost. This doesn't store the actual printk, but it at least
gives information on who the last caller was.
Change-Id: If1bd414a44b19b45c157d23a7ddde8f16b30e780
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Trevor Bourget <tbourget@codeaurora.org>
(cherry picked from commit aa9b60bb87a69ba2e0472fbfb04a8c056690936f)
Conflicts:
arch/arm/mach-msm/include/mach/msm_rtb.h
arch/arm/mach-msm/msm_rtb.c
The CPU HOTPLUG take_cpu_down path is invokved with preemption
disabled via stop_machine. This causes a "Scheduling while
atomic" BUG when there is contention for the console semaphore.
The solution is to defer the console flush until it's not in
scheduling violation.
Change-Id: I2d0d58576a4db308ee40850a18a6bb9784ca4e4b
Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
(cherry picked from commit f6d11b2eb9c110d0801aa40b1bfdb8194a5e3e3a)
panic_lock is meant to ensure that panic processing takes
place only on one cpu; if any of the other cpus encounter
a panic, they will spin waiting to be shut down.
However, this causes a regression in this scenario:
1. Cpu 0 encounters a panic and acquires the panic_lock
and proceeds with the panic processing.
2. There is an interrupt on cpu 0 that also encounters
an error condition and invokes panic.
3. This second invocation fails to acquire the panic_lock
and enters the infinite while loop in panic_smp_self_stop.
Thus all panic processing is stopped, and the cpu is stuck
for eternity in the while(1) inside panic_smp_self_stop.
To address this, disable local interrupts with
local_irq_disable before acquiring the panic_lock. This will
prevent interrupt handlers from executing during the panic
processing, thus avoiding this particular problem.
Change-Id: Ibf70e96343d35587571968bbc39062e28b7d3c0a
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit dd58afef43357f265e803c317bbaa91f8c440663)
Using %ps in a printk format will sometimes fail silently and
print the empty string if the address passed in does not match a
symbol that kallsyms knows about. But using %pS will fall back to
printing the full address if kallsyms can't find the symbol. Make
%ps act the same as %pS by falling back to printing the address.
While we're here also make %ps print the module that a symbol
comes from so that it matches what %pS already does. Take this
simple function for example (in a module):
static void test_printk(void)
{
int test;
pr_info("with pS: %pS\n", &test);
pr_info("with ps: %ps\n", &test);
}
Before this patch:
with pS: 0xdff7df44
with ps:
After this patch:
with pS: 0xdff7df44
with ps: 0xdff7df44
Change-Id: Id03d74b079d40fe24b07a978909faedc741e281a
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
(cherry picked from commit 364da7c6dda2d9f41cb4ab715da204bc9923f3e2)
In suspend interrupts are disabled from 0 to NR_IRQ, in resume interrupts
should be enabled in reverse order.
Enabling parent or summary interrupts before enabling child interrupts
causes the handler of the child interrupt to run even before it is
enabled. Usually the genirq handler does the correct thing of masking
the interrupt and additionally marking the interrupt IRQ_PENDING if its
an edge triggered interrupt. However the nested handler
(handle_nested_irq()) simply ignores the interrupt causing a loss of it.
Not calling the action of an interrupt, especially if it marked wakeup,
causes the system to incorrectly go back to suspend immediately.
Change-Id: Ica30c10a975a4a7b41b97b4f21250dac80335b2b
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
(cherry picked from commit 6dfcdc120d05d041e38668d15fd041fb7803986d)
The function allows us to set the pending bit for an irq
Used by the MPM mainly to set the pending flag for the irq
that was responsible for waking up the MSM.
Change-Id: Icc72c2a51a37df11a610f69fffda9d59aff2ac2a
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
This reverts commit ffdcd796e23c86d2cfeb25cb2d140f11d5fd6411.
This feature is replaced by passing 'earlyprintk' on the
kernel command line.
Change-Id: I2d4f2812e39b1c7afc061f106863b63710762fa7
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
When an interrupt is freed, the shutdown or the disable callback
is called for that interrupt. These calls might not be implemented
or even if they were, might not mask the interrupt.
Explicitly mask the interrupt when it is freed. If not masked, the
interrupt could trigger, set the pending bit in the irq controller
and cause unnecessary wakeup or exits from idle power collapse.
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Conflicts:
kernel/irq/manage.c
Some drivers need to know what the status of the interrupt line is.
This is especially true for drivers that register a handler with
IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING and in the handler they
need to know which edge transition it was invoked for. Provide a way
for these handlers to read the logical status of the line after their
handler was invoked. If the line reads high it was called for a
rising edge and if the line reads low it was called for a falling edge.
The irq_read_line callback in the chip allows the controller to provide
the real time status of this line. Controllers that can read the status
of an interrupt line should implement this by doing necessary
hardware reads and return the logical state of the line.
Interrupt controllers based on the slow bus architecture should conduct
the transaction in this callback. The genirq code will call the chip's
bus lock prior to calling irq_read_line. Obviously since the transaction
would be completed before returning from irq_read_line it need not do
any transactions in the bus unlock call.
Change-Id: I3c8746706530bba14a373c671d22ee963b84dfab
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
(cherry picked from commit ed3e47cb88b61859da3c221f22b509ebe0433218)
Conflicts:
include/linux/interrupt.h
The IRQS_PENDING flag is meant to record an edge interrupt trigger event
when that interrupt is disabled.
When an edge triggered interrupt is enabled, check_irq_resend() retriggers
that irq and resets the flag to zero if set. Note that check_irq_resend()
only does this for edge triggered interrupts.
For level triggered interrupts it is expected that the interrupt remains
active and doesn't need this PENDING flag assistance from software for
re-triggering it.
However, handle_fasteoi_irq flow handler sets the PENDING flag even for
a disabled level interrupt. This causes an adverse effect if that level
interrupt is marked wakeup. The suspend code sees the pending flag on a
wakeup interrupt and aborts suspend whereas check_irq_resend does not reset
it to 0 (as it is a level interrupt). The end result is that the PENDING
flag on this level triggered wakeup interrupt never clears and the system
keeps aborting suspend.
Fix this by setting IRQS_PENDING flag only for edge interrupts in the
handle_fasteoi_irq.
CRs-Fixed: 314344
Change-Id: I775d40f434f9309fd9672bae372b0f0fb5b91627
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
When lazy disabling is implemented and an interrupt is disabled the
genirq code ends up marking it as IRQ_DISABLED in the descriptor.
The interrupt stays enabled in the controller. If the interrupt
fires after disabling, the flow handlers namely handle_level_irq and
handle_edge_irq mask the interrupt in the controller.
This is not the case with handle_nested_irq. The interrupt stays enabled in
the controller and if it were a level interrupt it keeps firing only to be
ignored by handle_nested_irq.
Update handle_nested_irq to mask such an interrupt.
CRs-Fixed: 300931
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Conflicts:
kernel/irq/chip.c
On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from
rebooting the device after a panic. Add module parameters
debug_core.break_on_exception and debug_core.break_on_panic to
allow skipping debug on panics and exceptions respectively. Both
default to true to preserve existing behavior.
Change-Id: I75dce7263e96cee069a9750920cce83dc6f98e8c
Signed-off-by: Colin Cross <ccross@android.com>
task_tick_rt has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq. However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup. In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice. If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.
Modify task_tick_rt to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.
Change-Id: I4f5b118517f85db3570923eb2f5e4c933ece9247
Signed-off-by: Colin Cross <ccross@android.com>
Pull perf, x86 and scheduler updates from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing: Do not enable function event with enable
perf stat: handle ENXIO error for perf_event_open
perf: Turn off compiler warnings for flex and bison generated files
perf stat: Fix case where guest/host monitoring is not supported by kernel
perf build-id: Fix filename size calculation
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
x86: Fix section annotation of acpi_map_cpu2node()
x86/microcode: Ensure that module is only loaded on supported Intel CPUs
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
Export handle_edge_irq() and irq_to_desc() to modules to allow them to
do things such as
__irq_set_handler_locked(...., handle_edge_irq);
This fixes
ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined!
ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined!
when gpio-pch is being built as a module.
This was introduced by commit df9541a60a ("gpio: pch9: Use proper flow
type handlers") that added
__irq_set_handler_locked(d->irq, handle_edge_irq);
but handle_edge_irq() was not exported for modules (and inlined
__irq_set_handler_locked() requires irq_to_desc() exported as well)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the adding of function tracing event to perf, it caused a
side effect that produces the following warning when enabling all
events in ftrace:
# echo 1 > /sys/kernel/debug/tracing/events/enable
[console]
event trace: Could not enable event function
This is because when enabling all events via the debugfs system
it ignores events that do not have a ->reg() function assigned.
This was to skip over the ftrace internal events (as they are
not TRACE_EVENTs). But as the ftrace function event now has
a ->reg() function attached to it for use with perf, it is no
longer ignored.
Worse yet, this ->reg() function is being called when it should
not be. It returns an error and causes the above warning to
be printed.
By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
and have all ftrace internel event structures have it set,
setting the events/enable will no longe try to incorrectly enable
the function event and does not warn.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
compat_sys_sigprocmask reads a smaller signal mask from userspace than
sigprogmask accepts for setting. So the high word of blocked.sig[0]
will be cleared, releasing any potentially blocked RT signal.
This was discovered via userspace code that relies on get/setcontext.
glibc's i386 versions of those functions use sigprogmask instead of
rt_sigprogmask to save/restore signal mask and caused RT signal
unblocking this way.
As suggested by Linus, this replaces the sys_sigprocmask based compat
version with one that open-codes the required logic, including the merge
of the existing blocked set with the new one provided on SIG_SETMASK.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we have one cpu that failed to boot and boot cpu gave up on
waiting for it and then another cpu is being booted, kernel
might crash with following OOPS:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff812c3630>] __bitmap_weight+0x30/0x80
Call Trace:
[<ffffffff8108b9b6>] build_sched_domains+0x7b6/0xa50
The crash happens in init_sched_groups_power() that expects
sched_groups to be circular linked list. However it is not
always true, since sched_groups preallocated in __sdt_alloc are
initialized in build_sched_groups and it may exit early
if (cpu != cpumask_first(sched_domain_span(sd)))
return 0;
without initializing sd->groups->next field.
Fix bug by initializing next field right after sched_group was
allocated.
Also-Reported-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: a.p.zijlstra@chello.nl
Cc: pjt@google.com
Cc: seto.hidetoshi@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1336559908-32533-1-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull power management fixes from Rafael J. Wysocki:
"Fix for an issue causing hibernation to hang on systems with highmem
(that practically means i386) due to broken memory management (bug
introduced in 3.2, so -stable material) and PM documentation update
making the freezer documentation follow the code again after some
recent updates."
* tag 'pm-for-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Freezer / Docs: Update documentation about freezing of tasks
PM / Hibernate: fix the number of pages used for hibernate/thaw buffering
Pull perf fixes from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix perf_event_for_each() to use sibling
perf symbols: Read plt symbols from proper symtab_type binary
tracing: Fix stacktrace of latency tracers (irqsoff and friends)
perf tools: Add 'G' and 'H' modifiers to event parsing
tracing: Fix regression with tracing_on
perf tools: Drop CROSS_COMPILE from flex and bison calls
perf report: Fix crash showing warning related to kernel maps
tracing: Fix build breakage without CONFIG_PERF_EVENTS (again)
Pull build fixes for less mainstream architectures from Paul Gortmaker:
"These are fixes for frv(1), blackfin(2), powerpc(1) and xtensa(4).
Fortunately the touches are nearly all specific to files just used by
the arch in question. The two touches to shared/common files
[kernel/irq/debug.h and drivers/pci/Makefile] are trivial to assess as
no risk to anyone.
Half of them relate to xtensa directly. It was only when I fixed the
last xtensa issue that I realized that the arch has been broken for a
significant time, and isn't a specific v3.4 regression. So if you
wanted, we could leave xtensa lying bleeding in the street for a
couple more weeks and queue those for 3.5. But given they are no risk
to anyone outside of xtensa, I figured to just leave them in.
If you are OK with taking the xtensa fixes, then please pull to get:
- one last implicit include uncovered by system.h that is in a file
specific to just one powerpc defconfig. (I'd sync'd with BenH).
- fix an oversight in the PCI makefile where shared code wasn't being
compiled for ARCH=frv
- fix a missing include for GPIO in blackfin framebuffer.
- audit and tag endif in blackfin ezkit board file, in order to find
and fix the misplaced endif masking a block of code.
- fix irq/debug.h choice of temporary macro names to be more internal
so they don't conflict with names used by xtensa.
- fix a reference to an undeclared local var in xtensa's signal.c
- fix an implicit bug.h usage in xtensa's asm/io.h uncovered by my
removing bug.h from kernel.h
- fix xtensa to properly indicate it is using asm-generic/hardirq.h
in order to resolve the link error - undefined ack_bad_irq
The xtensa still fails final link as my latest binutils does something
evil when ld forward-relocates unlikely() blocks, but in theory people
who have older/valid toolchains could now use the thing."
* 'for-v3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
xtensa: fix build fail on undefined ack_bad_irq
blackfin: fix ifdef fustercluck in mach-bf538/boards/ezkit.c
blackfin: fix compile error in bfin-lq035q1-fb.c
pci: frv architecture needs generic setup-bus infrastructure
irq: hide debug macros so they don't collide with others.
xtensa: fix build error in xtensa/include/asm/io.h
xtensa: fix build failure in xtensa/kernel/signal.c
powerpc: fix system.h fallout in sysdev/scom.c [chroma_defconfig]
In perf_event_for_each() we call a function on an event, and then
iterate over the siblings of the event.
However we don't call the function on the siblings, we call it
repeatedly on the original event - it seems "obvious" that we should
be calling it with sibling as the argument.
It looks like this broke in commit 75f937f24b ("Fix ctx->mutex
vs counter->mutex inversion").
The only effect of the bug is that the PERF_IOC_FLAG_GROUP parameter
to the ioctls doesn't work.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334109253-31329-1-git-send-email-michael@ellerman.id.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Under extreme memory used up situations, percpu allocation
might fail. We hit it when system goes to suspend-to-ram,
causing a kworker panic:
EIP: [<c124411a>] build_sched_domains+0x23a/0xad0
Kernel panic - not syncing: Fatal exception
Pid: 3026, comm: kworker/u:3
3.0.8-137473-gf42fbef #1
Call Trace:
[<c18cc4f2>] panic+0x66/0x16c
[...]
[<c1244c37>] partition_sched_domains+0x287/0x4b0
[<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210
[<c123712d>] cpuset_cpu_inactive+0x1d/0x30
[...]
With this fix applied build_sched_domains() will return -ENOMEM and
the suspend attempt fails.
Signed-off-by: he, bo <bo.he@intel.com>
Reviewed-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo
[ So, we fail to deallocate a CPU because we cannot allocate RAM :-/
I don't like that kind of sad behavior but nevertheless it should
not crash under high memory load. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commits 367456c756 ("sched: Ditch per cgroup task lists for
load-balancing") and 5d6523ebd ("sched: Fix load-balance wreckage")
left some more wreckage.
By setting loop_max unconditionally to ->nr_running load-balancing
could take a lot of time on very long runqueues (hackbench!). So keep
the sysctl as max limit of the amount of tasks we'll iterate.
Furthermore, the min load filter for migration completely fails with
cgroups since inequality in per-cpu state can easily lead to such
small loads :/
Furthermore the change to add new tasks to the tail of the queue
instead of the head seems to have some effect.. not quite sure I
understand why.
Combined these fixes solve the huge hackbench regression reported by
Tim when hackbench is ran in a cgroup.
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1335365763.28150.267.camel@twins
[ got rid of the CONFIG_PREEMPT tuning and made small readability edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>