Commit Graph

300369 Commits

Author SHA1 Message Date
Michael Bohan
82d5b8dc2a printk: Don't take console semaphore in atomic context
The CPU HOTPLUG take_cpu_down path is invokved with preemption
disabled via stop_machine. This causes a "Scheduling while
atomic" BUG when there is contention for the console semaphore.
The solution is to defer the console flush until it's not in
scheduling violation.

Change-Id: I2d0d58576a4db308ee40850a18a6bb9784ca4e4b
Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
(cherry picked from commit f6d11b2eb9c110d0801aa40b1bfdb8194a5e3e3a)
2013-02-08 15:14:23 -08:00
Vikram Mulukutla
8041e1099b panic: Fix a possible deadlock in panic()
panic_lock is meant to ensure that panic processing takes
place only on one cpu; if any of the other cpus encounter
a panic, they will spin waiting to be shut down.

However, this causes a regression in this scenario:

1. Cpu 0 encounters a panic and acquires the panic_lock
   and proceeds with the panic processing.
2. There is an interrupt on cpu 0 that also encounters
   an error condition and invokes panic.
3. This second invocation fails to acquire the panic_lock
   and enters the infinite while loop in panic_smp_self_stop.

Thus all panic processing is stopped, and the cpu is stuck
for eternity in the while(1) inside panic_smp_self_stop.

To address this, disable local interrupts with
local_irq_disable before acquiring the panic_lock. This will
prevent interrupt handlers from executing during the panic
processing, thus avoiding this particular problem.

Change-Id: Ibf70e96343d35587571968bbc39062e28b7d3c0a
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit dd58afef43357f265e803c317bbaa91f8c440663)
2013-02-08 15:14:23 -08:00
Pushkar Joshi
715363e77e tracing: have trace_printk process arguments before storing
trace_printk records its arguments in the internal ring buffer for
them to be processed later. The mapping between the format string
for the particular instance of a trace_printk call and the
arguments is maintained using the instruction pointer internally
in the kernel. However, this makes it harder for an external tool
to process this data, which is necessary when the same data
is logged to the STM and needs to be processed externally.

Change-Id: Ifa2ae5fe1f1fd62c85fac432a581737450781f88
Signed-off-by: Pushkar Joshi <pushkarj@codeaurora.org>
(cherry picked from commit c86e25f0f775d7effaf192ded6fb7a6885694d7b)
2013-02-08 15:14:22 -08:00
Stephen Boyd
2d168136d4 vsprintf: Fix %ps on non symbols when using kallsyms
Using %ps in a printk format will sometimes fail silently and
print the empty string if the address passed in does not match a
symbol that kallsyms knows about. But using %pS will fall back to
printing the full address if kallsyms can't find the symbol. Make
%ps act the same as %pS by falling back to printing the address.

While we're here also make %ps print the module that a symbol
comes from so that it matches what %pS already does. Take this
simple function for example (in a module):

	static void test_printk(void)
	{
		int test;
		pr_info("with pS: %pS\n", &test);
		pr_info("with ps: %ps\n", &test);
	}

Before this patch:

 with pS: 0xdff7df44
 with ps:

After this patch:

 with pS: 0xdff7df44
 with ps: 0xdff7df44

Change-Id: Id03d74b079d40fe24b07a978909faedc741e281a
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
(cherry picked from commit 364da7c6dda2d9f41cb4ab715da204bc9923f3e2)
2013-02-08 15:14:22 -08:00
Abhijeet Dharmapurikar
445abc7500 genirq: pm: Fix the enable ordering in resume
In suspend interrupts are disabled from 0 to NR_IRQ, in resume interrupts
should be enabled in reverse order.

Enabling parent or summary interrupts before enabling child interrupts
causes the handler of the child interrupt to run even before it is
enabled. Usually the genirq handler does the correct thing of masking
the interrupt and additionally marking the interrupt IRQ_PENDING if its
an edge triggered interrupt. However the nested handler
(handle_nested_irq()) simply ignores the interrupt causing a loss of it.

Not calling the action of an interrupt, especially if it marked wakeup,
causes the system to incorrectly go back to suspend immediately.

Change-Id: Ica30c10a975a4a7b41b97b4f21250dac80335b2b
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
(cherry picked from commit 6dfcdc120d05d041e38668d15fd041fb7803986d)
2013-02-08 15:14:17 -08:00
Rohit Vaswani
ee004b5221 genirq: Add irq_set_pending
The function allows us to set the pending bit for an irq
Used by the MPM mainly to set the pending flag for the irq
that was responsible for waking up the MSM.

Change-Id: Icc72c2a51a37df11a610f69fffda9d59aff2ac2a
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2013-02-08 15:04:34 -08:00
Abhijeet Dharmapurikar
aeef1962d0 genirq: explicitly mask a freed irq
When an interrupt is freed, the shutdown or the disable callback
is called for that interrupt. These calls might not be implemented
or even if they were, might not mask the interrupt.

Explicitly mask the interrupt when it is freed. If not masked, the
interrupt could trigger, set the pending bit in the irq controller
and cause unnecessary wakeup or exits from idle power collapse.

Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>

Conflicts:

	kernel/irq/manage.c
2013-01-09 15:40:55 -08:00
Abhijeet Dharmapurikar
18f1fe25ad genirq: implement read_irq_line for interrupt lines
Some drivers need to know what the status of the interrupt line is.
This is especially true for drivers that register a handler with
IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING and in the handler they
need to know which edge transition it was invoked for. Provide a way
for these handlers to read the logical status of the line after their
handler was invoked. If the line reads high it was called for a
rising edge and if the line reads low it was called for a falling edge.

The irq_read_line callback in the chip allows the controller to provide
the real time status of this line. Controllers that can read the status
of an interrupt line should implement this by doing necessary
hardware reads and return the logical state of the line.

Interrupt controllers based on the slow bus architecture should conduct
the transaction in this callback. The genirq code will call the chip's
bus lock prior to calling irq_read_line. Obviously since the transaction
would be completed before returning from irq_read_line it need not do
any transactions in the bus unlock call.

Change-Id: I3c8746706530bba14a373c671d22ee963b84dfab
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
(cherry picked from commit ed3e47cb88b61859da3c221f22b509ebe0433218)

Conflicts:

	include/linux/interrupt.h
2013-01-09 15:40:54 -08:00
Abhijeet Dharmapurikar
c9cf77ff7e genirq: chip: set pending only for edge interrupts
The IRQS_PENDING flag is meant to record an edge interrupt trigger event
when that interrupt is disabled.

When an edge triggered interrupt is enabled, check_irq_resend() retriggers
that irq and resets the flag to zero if set. Note that check_irq_resend()
only does this for edge triggered interrupts.

For level triggered interrupts it is expected that the interrupt remains
active and doesn't need this PENDING flag assistance from software for
re-triggering it.

However, handle_fasteoi_irq flow handler sets the PENDING flag even for
a disabled level interrupt. This causes an adverse effect if that level
interrupt is marked wakeup. The suspend code sees the pending flag on a
wakeup interrupt and aborts suspend whereas check_irq_resend does not reset
it to 0 (as it is a level interrupt). The end result is that the PENDING
flag on this level triggered wakeup interrupt never clears and the system
 keeps aborting suspend.

Fix this by setting IRQS_PENDING flag only for edge interrupts in the
handle_fasteoi_irq.

CRs-Fixed: 314344
Change-Id: I775d40f434f9309fd9672bae372b0f0fb5b91627
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2013-01-09 15:40:54 -08:00
Abhijeet Dharmapurikar
68cfba07e7 genirq: fix handle_nested_irq for lazy disable
When lazy disabling is implemented and an interrupt is disabled the
genirq code ends up marking it as IRQ_DISABLED in the descriptor.
The interrupt stays enabled in the controller.  If the interrupt
fires after disabling, the flow handlers namely handle_level_irq and
handle_edge_irq mask the interrupt in the controller.

This is not the case with handle_nested_irq. The interrupt stays enabled in
the controller and if it were a level interrupt it keeps firing only to be
ignored by handle_nested_irq.

Update handle_nested_irq to mask such an interrupt.

CRs-Fixed: 300931
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>

Conflicts:

	kernel/irq/chip.c
2013-01-09 15:40:53 -08:00
Linus Torvalds
76e10d158e Linux 3.4 2012-05-20 15:29:13 -07:00
Linus Torvalds
d6c7797367 Merge tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6
Pull PA-RISC fixes from James Bottomley:
 "This is a set of three bug fixes that gets parisc running again on
  systems with PA1.1 processors.

  Two fix regressions introduced in 2.6.39 and one fixes a prefetch bug
  that only affects PA7300LC processors.  We also have another pending
  fix to do with the sectional arrangement of vmlinux.lds, but there's a
  query on it during testing on one particular system type, so I'll hold
  off sending it in for now."

* tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] fix panic on prefetch(NULL) on PA7300LC
  [PARISC] fix crash in flush_icache_page_asm on PA1.1
  [PARISC] fix PA1.1 oops on boot
2012-05-19 15:30:15 -07:00
Linus Torvalds
5d1204582e Merge branch 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 linker bug workarounds from Peter Anvin.

GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly
turns certain relocation entries absolute.  Section-relative symbols
that are part of otherwise empty sections are silently changed them to
absolute.  We rely on section-relative symbols staying section-relative,
and actually have several sections in the linker script solely for this
purpose.

See for example

   http://sourceware.org/bugzilla/show_bug.cgi?id=14052

We could just black-list the buggy linker, but it appears that it got
shipped in at least F17, and possibly other distros too, so it's sadly
not some rare unusual case.

This backports the workaround from the x86/trampoline branch, and as
Peter says: "This is not a minimal fix, not at all, but it is a tested
code base."

* 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: When printing an error, say relative or absolute
  x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
  x86, realmode: 16-bit real-mode code support for relocs tool

(*) That's a manly release numbering system. Stupid, sure. But manly.
2012-05-19 15:28:22 -07:00
Linus Torvalds
14e931a264 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A few small, but important fixes.  Most of them are marked for stable
  as well

   - Fix failure to release a semaphore on error path in mtip32xx.
   - Fix crashable condition in bio_get_nr_vecs().
   - Don't mark end-of-disk buffers as mapped, limit it to i_size.
   - Fix for build problem with CONFIG_BLOCK=n on arm at least.
   - Fix for a buffer overlow on UUID partition printing.
   - Trivial removal of unused variables in dac960."

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix buffer overflow when printing partition UUIDs
  Fix blkdev.h build errors when BLOCK=n
  bio allocation failure due to bio_get_nr_vecs()
  block: don't mark buffers beyond end of disk as mapped
  mtip32xx: release the semaphore on an error path
  dac960: Remove unused variables from DAC960_CreateProcEntries()
2012-05-19 10:12:17 -07:00
Linus Torvalds
a2ae978756 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull one more networking bug-fix from David Miller:
 "One last straggler.

  Eric Dumazet's pktgen unload oops fix was not entirely complete, but
  all the cases should be handled properly now....  fingers crossed."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  pktgen: fix module unload for good
2012-05-19 10:10:59 -07:00
Hugh Dickins
62ade86ab6 memcg,thp: fix res_counter:96 regression
Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
a flurry of hundreds of warnings at kernel/res_counter.c:96, where
res_counter_uncharge_locked() does WARN_ON(counter->usage < val).

The first trace of each flurry implicates __mem_cgroup_cancel_charge()
of mc.precharge, and an audit of mc.precharge handling points to
mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e8
("memcg: avoid THP split in task migration").

Checking !mc.precharge is good everywhere else, when a single page is to
be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
follow, is liable to result in underflow (a lot can change since the
precharge was estimated).

Simply check against HPAGE_PMD_NR: there's probably a better
alternative, trying precharge for more, splitting if unsuccessful; but
this one-liner is safer for now - no kernel/res_counter.c:96 warnings
seen in 26 hours.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-19 10:10:27 -07:00
H. Peter Anvin
24ab82bd9b x86, relocs: When printing an error, say relative or absolute
When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol.  This should make it a lot more
clear what action the programmer needs to take and should help us find
the reason if additional symbol bugs show up.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:50:02 -07:00
H. Peter Anvin
a3e854d95a x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols.  Let the relocs program know that those should be treated as
relative symbols.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
2012-05-18 19:50:00 -07:00
H. Peter Anvin
6520fe5564 x86, realmode: 16-bit real-mode code support for relocs tool
A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.

In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.

16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.

The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture.  be compiled before building the arch/x86 tree.

[ hpa: accelerating this because it detects invalid absolute
  relocations, a serious bug in binutils 2.22.52.0.x which currently
  produces bad kernels. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:49:40 -07:00
Linus Torvalds
b1dab2f040 Merge tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull a dm fix from Alasdair G Kergon:
 "A fix to the thin provisioning userspace interface."

* tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm thin: fix table output when pool target disables discard passdown internally
2012-05-18 18:22:45 -07:00
Mike Snitzer
f402693d06 dm thin: fix table output when pool target disables discard passdown internally
When the thin pool target clears the discard_passdown parameter
internally, it incorrectly changes the table line reported to userspace.
This breaks dumb string comparisons on these table lines in generic
userspace device-mapper library code and leads to tables being reloaded
repeatedly when nothing is actually meant to be changing.

This patch corrects this by no longer changing the table line when
discard passdown was disabled.

We can still tell when discard passdown is overridden by looking for the
message "Discard unsupported by data device (sdX): Disabling discard passdown."

This automatic detection is also moved from the 'load' to the 'resume'
so that it is re-evaluated should the properties of underlying devices
change.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-05-19 01:01:01 +01:00
Linus Torvalds
2f05af8b59 Merge tag 'md-3.4-fixes' of git://neil.brown.name/md
Pull one more md bugfix from NeilBrown:
 "Fix bug in recent fix to RAID10.

  Without this patch, recovery will crash"

* tag 'md-3.4-fixes' of git://neil.brown.name/md:
  md/raid10: fix transcription error in calc_sectors conversion.
2012-05-18 16:19:59 -07:00
Linus Torvalds
8394edf371 Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile tree bugfix from Chris Metcalf:
 "This fixes a security vulnerability (and correctness bug) in tilegx"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tilegx: enable SYSCALL_WRAPPERS support
2012-05-18 16:16:42 -07:00
NeilBrown
b0d634d568 md/raid10: fix transcription error in calc_sectors conversion.
The old code was
		sector_div(stride, fc);
the new code was
		sector_dir(size, conf->near_copies);

'size' is right (the stride various wasn't really needed), but
'fc' means 'far_copies', and that is an important difference.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-19 09:01:13 +10:00
Linus Torvalds
73f1f5dd3e Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc fixes from Andrew Morton.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
  frv: delete incorrect task prototypes causing compile fail
  slub: missing test for partial pages flush work in flush_all()
  fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
  drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
2012-05-18 15:56:25 -07:00
Linus Torvalds
30a08bf2d3 proc: move fd symlink i_mode calculations into tid_fd_revalidate()
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.

Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around.  Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.

Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a2878402: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds.  And it
just makes sense to update i_mode in the same place we update i_uid/gid.

Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead.  However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor.  We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-18 14:06:17 -07:00
Eric Dumazet
d4b1133558 pktgen: fix module unload for good
commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.

1) list_splice() arguments were in the wrong order

2) list_splice(list, head) has undefined behavior if head is not
initialized.

3) We should use the list_splice_init() variant to clear pktgen_threads
list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:54:33 -04:00
Chris Metcalf
e6d9668e11 tilegx: enable SYSCALL_WRAPPERS support
Some discussion with the glibc mailing lists revealed that this was
necessary for 64-bit platforms with MIPS-like sign-extension rules
for 32-bit values.  The original symptom was that passing (uid_t)-1 to
setreuid() was failing in programs linked -pthread because of the "setxid"
mechanism for passing setxid-type function arguments to the syscall code.
SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
proper sign-extension and is thus the appropriate fix for this problem.

On other platforms (s390, powerpc, sparc64, and mips) this was fixed
in 2.6.28.6.  The general issue is tracked as CVE-2009-0029.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-18 13:33:24 -04:00
Linus Torvalds
3d9944978e Merge tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull a machine check recovery fix from Tony Luck.

I really don't like how the MCE code does some of the things it does,
but this does seem to be an improvement.

* tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Only restart instruction after machine check recovery if it is safe
2012-05-18 09:42:20 -07:00
Paul Gortmaker
93c2d656c7 frv: delete incorrect task prototypes causing compile fail
Commit 41101809a865 ("fork: Provide weak arch_release_[task_struct|
thread_info] functions") in -tip highlights a problem in the frv arch,
where it has needles prototypes for alloc_task_struct_node and
free_task_struct.  This now shows up as:

  kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration
  kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration

since that commit turned them into real functions.  Since arch/frv does
does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e.  it just
uses the generic ones) it shouldn't list these at all.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
majianpeng
02e1a9cd1e slub: missing test for partial pages flush work in flush_all()
I found some kernel messages such as:

    SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
    Pid: 6143, comm: mdadm Tainted: G           O 3.4.0-rc6+        #75
    Call Trace:
    kmem_cache_destroy+0x328/0x400
    free_conf+0x2d/0xf0 [raid456]
    stop+0x41/0x60 [raid456]
    md_stop+0x1a/0x60 [md_mod]
    do_md_stop+0x74/0x470 [md_mod]
    md_ioctl+0xff/0x11f0 [md_mod]
    blkdev_ioctl+0xd8/0x7a0
    block_ioctl+0x3b/0x40
    do_vfs_ioctl+0x96/0x560
    sys_ioctl+0x91/0xa0
    system_call_fastpath+0x16/0x1b

Then using kmemleak I found these messages:

    unreferenced object 0xffff8800b6db7380 (size 112):
      comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
      hex dump (first 32 bytes):
        01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff  .....N..........
        ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff  .........@J.....
      backtrace:
        kmemleak_alloc+0x21/0x50
        kmem_cache_alloc+0xeb/0x1b0
        kmem_cache_open+0x2f1/0x430
        kmem_cache_create+0x158/0x320
        setup_conf+0x649/0x770 [raid456]
        run+0x68b/0x840 [raid456]
        md_run+0x529/0x940 [md_mod]
        do_md_run+0x18/0xc0 [md_mod]
        md_ioctl+0xba8/0x11f0 [md_mod]
        blkdev_ioctl+0xd8/0x7a0
        block_ioctl+0x3b/0x40
        do_vfs_ioctl+0x96/0x560
        sys_ioctl+0x91/0xa0
        system_call_fastpath+0x16/0x1b

This bug was introduced by commit a8364d5555 ("slub: only IPI CPUs that
have per cpu obj to flush"), which did not include checks for per cpu
partial pages being present on a cpu.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Cyrill Gorcunov
eb94cd96e0 fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
map_files/ entries are never supposed to be executed, still curious
minds might try to run them, which leads to the following deadlock

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.4.0-rc4-24406-g841e6a6 #121 Not tainted
  -------------------------------------------------------
  bash/1556 is trying to acquire lock:
   (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1

  but task is already holding lock:
   (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&sig->cred_guard_mutex){+.+.+.}:
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_killable_nested+0x40/0x45
         lock_trace+0x24/0x59
         proc_map_files_lookup+0x5a/0x165
         __lookup_hash+0x52/0x73
         do_lookup+0x276/0x2b1
         walk_component+0x3d/0x114
         do_last+0xfc/0x540
         path_openat+0xd3/0x306
         do_filp_open+0x3d/0x89
         do_sys_open+0x74/0x106
         sys_open+0x21/0x23
         tracesys+0xdd/0xe2

  -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
         check_prev_add+0x6a/0x1ef
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_nested+0x40/0x45
         do_lookup+0x267/0x2b1
         walk_component+0x3d/0x114
         link_path_walk+0x1f9/0x48f
         path_openat+0xb6/0x306
         do_filp_open+0x3d/0x89
         open_exec+0x25/0xa0
         do_execve_common+0xea/0x2f9
         do_execve+0x43/0x45
         sys_execve+0x43/0x5a
         stub_execve+0x6c/0xc0

This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
and when do_lookup happens we try to grab task->signal->cred_guard_mutex
again in lock_trace.

Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
and in proc_map_files_readdir() instead of lock_trace(), the caller must
be CAP_SYS_ADMIN granted anyway.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Rajkumar Kasirajan
c0a5f4a05a drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
The reset date of the ST Micro version of PL031 is 2000-01-01.  The
correct weekday for 2000-01-01 is saturday, but pl031 is initialized to
sunday.  This may lead to alarm malfunction, so configure the correct
wday if RTC_DR indicates reset.

Signed-off-by: Rajkumar Kasirajan <rajkumar.kasirajan@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Mattias Wallin <mattias.wallin@stericsson.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Linus Torvalds
42ea7d7f2a Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
 "Small set of fixes again."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
  ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
  ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS
  ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
2012-05-17 16:52:29 -07:00
Linus Torvalds
39c2028531 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull two networking fixes from David S. Miller:

1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
   been present in do_tcp_sendpages() since that function was written in
   2002.

   When we block to wait for memory we have to unconditionally try and
   push out pending TCP data, otherwise we can block for an unreasonably
   long amount of time.

2) Fix deadlock in e1000, fixes kernel bugzilla 43132

   From Tushar Dave.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  e1000: Prevent reset task killing itself.
  tcp: do_tcp_sendpages() must try to push data out on oom conditions
2012-05-17 16:30:26 -07:00
Rafael J. Wysocki
5c7dd710f6 ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold
Commit 1cc0c998fd ("ACPI: Fix D3hot v D3cold confusion") introduced a
bug in __acpi_bus_set_power() and changed the behavior of
acpi_pci_set_power_state() in such a way that it generally doesn't work
as expected if PCI_D3hot is passed to it as the second argument.

First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to
__acpi_bus_set_power() and the explicit_set flag is set for the D3cold
state, the function will try to execute AML method called "_PS4", which
doesn't exist.

Fix this by adding a check to ensure that the name of the AML method
to execute for transitions to ACPI_STATE_D3_COLD is correct in
__acpi_bus_set_power().  Also make sure that the explicit_set flag
for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify
acpi_power_transition() to avoid accessing power resources for
ACPI_STATE_D3_COLD, because they don't exist.

Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the
target state, the function will request a transition to
ACPI_STATE_D3_HOT instead of ACPI_STATE_D3.  However,
ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML
method is defined for the given device, which is rare.  This causes
problems to happen on systems where devices were successfully put
into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work
now.  In particular, some unused graphics adapters are not turned
off as a result.

To fix this issue restore the old behavior of
acpi_pci_set_power_state(), which is to request a transition to
ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or
PCI_D3cold is passed to it as the argument.

This approach is not ideal, because generally power should not
be removed from devices if PCI_D3hot is the target power state,
but since this behavior is relied on, we have no choice but to
restore it at the moment and spend more time on designing a
better solution in the future.

References: https://bugzilla.kernel.org/show_bug.cgi?id=43228
Reported-by: rocko <rockorequin@hotmail.com>
Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reported-and-tested-by: Peter <lekensteyn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 16:16:16 -07:00
Tushar Dave
8ce6909f77 e1000: Prevent reset task killing itself.
Killing reset task while adapter is resetting causes deadlock.
Only kill reset task if adapter is not resetting.
Ref bug #43132 on bugzilla.kernel.org

CC: stable@vger.kernel.org
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 18:32:41 -04:00
Willy Tarreau
bad115cfe5 tcp: do_tcp_sendpages() must try to push data out on oom conditions
Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
2012-05-17 18:31:43 -04:00
Linus Torvalds
eea036475d Merge branch '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull two more target-core updates from Nicholas Bellinger:
 "The first patch addresses a SPC-2 reservations RELEASE bug in a
  special (iscsi specific) multi-ISID setup case that was allowing the
  same initiator to be able to incorrect release it's own reservation on
  a different SCSI path with enforce_pr_isid=1 operation.  This bug was
  caught by Bernhard Kohl.

  The second patch is to address a bug with FILEIO backends where the
  incorrect number of blocks for READ_CAPACITY was being reported after
  an underlying device-mapper block_device size change.  This patch uses
  now i_size_read() in fd_get_blocks() for FILEIO backends with an
  underlying block_device, instead of trying to determine this value at
  setup time during fd_create_virtdevice().  (hch CC'ed)

  Both are CC'ed to stable."

* '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix bug in handling of FILEIO + block_device resize ops
  target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups
2012-05-17 13:25:17 -07:00
Nicholas Bellinger
cd9323fd68 target: Fix bug in handling of FILEIO + block_device resize ops
This patch fixes a bug in the handling of FILEIO w/ underlying block_device
resize operations where the original fd_dev->fd_dev_size was incorrectly being
used in fd_get_blocks() for READ_CAPACITY response payloads.

This patch avoids using fd_dev->fd_dev_size for FILEIO devices with
an underlying block_device, and instead changes fd_get_blocks() to
get the sector count directly from i_size_read() as recommended by hch.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-05-17 12:02:43 -07:00
Linus Torvalds
1be5f0b757 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes fromVinod Koul:
 "fixes of cylic dma usages in slave dma drivers"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: fix cyclic dma usage
  dmaengine: pl330: dont complete descriptor for cyclic dma
2012-05-17 09:57:13 -07:00
Linus Torvalds
42e8b9c001 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull last minute virtio fixes from Michael S. Tsirkin:
 "Here are a couple of last minute virtio fixes for 3.4.  Hope it's not
  too late yes - I might have tried too hard to make sure the fix is
  well tested.

  Fixes are by Amit and myself.  One fixes module removal and one
  suspend of a VM, the last one the handling of out of memory condition.

  They are thus very low risk as most people never hit these paths, but
  do fix very annoying problems for people that do use the feature.

  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_net: invoke softirqs after __napi_schedule
  virtio: balloon: let host know of updated balloon size before module removal
  virtio: console: tell host of open ports after resume from s3/s4
2012-05-17 09:55:58 -07:00
Linus Torvalds
674ff51776 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM: SoC fixes from Olof Johansson:
 "I will stop trying to predict when we're done with fixes for a
  release.

  Here's another small batch of three patches for arm-soc:

   - A fix for a boot time WARN_ON() due to irq domain conversion on
     PRIMA2
   - Fix for a regression in Tegra SMP spinup code due to swapped
     register offsets
   - Fixed config dependency for mv_cesa crypto driver to avoid build
     breakage"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller
  crypto: mv_cesa requires on CRYPTO_HASH to build
  ARM: tegra: Fix flow controller accesses
2012-05-17 09:46:07 -07:00
Linus Torvalds
36a1987cd8 Merge tag 'md-3.4-fixes' of git://neil.brown.name/md
Pull two md fixes from NeilBrown:
 "One fixes a bug in the new raid10 resize code so is relevant to 3.4
  only.

  The other fixes a bug in the use of md by dm-raid, so is relevant to
  any kernel with dm-raid support"

* tag 'md-3.4-fixes' of git://neil.brown.name/md:
  MD: Add del_timer_sync to mddev_suspend (fix nasty panic)
  md/raid10: set dev_sectors properly when resizing devices in array.
2012-05-17 09:44:35 -07:00
Linus Torvalds
31ae98359d Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf, x86 and scheduler updates from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing: Do not enable function event with enable
  perf stat: handle ENXIO error for perf_event_open
  perf: Turn off compiler warnings for flex and bison generated files
  perf stat: Fix case where guest/host monitoring is not supported by kernel
  perf build-id: Fix filename size calculation

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
  x86: Fix section annotation of acpi_map_cpu2node()
  x86/microcode: Ensure that module is only loaded on supported Intel CPUs

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
2012-05-17 09:35:17 -07:00
Will Deacon
56cb248428 ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
Commit ff9a184c ("ARM: 7400/1: vfp: clear fpscr length and stride bits
on entry to sig handler") flushes the VFP state prior to entering a
signal handler so that a VFP operation inside the handler will trap and
force a restore of ABI-compliant registers. Reflushing and disabling VFP
on the sigreturn path is predicated on the saved thread state indicating
that VFP was used by the handler -- however for SMP platforms this is
only set on context-switch, making the check unreliable and causing VFP
register corruption in userspace since the register values are not
necessarily those restored from the sigframe.

This patch unconditionally flushes the VFP state after a signal handler.
Since we already perform the flush before the handler and the flushing
itself happens lazily, the redundant flush when VFP is not used by the
handler is essentially a nop.

Reported-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-05-17 14:48:56 +01:00
Vitaly Andrianov
1a3abcf41f ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
A zero value for prot_sect in the memory types table implies that
section mappings should never be created for the memory type in question.
This is checked for in alloc_init_section().

With LPAE, we set a bit to mask access flag faults for kernel mappings.
This breaks the aforementioned (!prot_sect) check in alloc_init_section().

This patch fixes this bug by first checking for a non-zero
prot_sect before setting the PMD_SECT_AF flag.

Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-05-17 14:48:56 +01:00
Michael S. Tsirkin
ec13ee8014 virtio_net: invoke softirqs after __napi_schedule
__napi_schedule might raise softirq but nothing
causes do_softirq to trigger, so it does not in fact
run. As a result,
the error message "NOHZ: local_softirq_pending 08"
sometimes occurs during boot of a KVM guest when the network service is
started and we are oom:

  ...
  Bringing up loopback interface:  [  OK  ]
  Bringing up interface eth0:
  Determining IP information for eth0...NOHZ: local_softirq_pending 08
   done.
  [  OK  ]
  ...

Further, receive queue processing might get delayed
indefinitely until some interrupt triggers:
virtio_net expected napi to be run immediately.

One way to cause do_softirq to be executed is by
invoking local_bh_enable(). As __napi_schedule is
normally called from bh or irq context, this
seems to make sense: disable bh before __napi_schedule
and enable afterwards.

In fact it's a very complicated way of calling do_softirq(),
and works since this function is only used when we are not
in interrupt context.  It's not hot at all, in any ideal scenario.

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-17 12:16:38 +03:00
Amit Shah
b8ae0eb320 virtio: balloon: let host know of updated balloon size before module removal
When the balloon module is removed, we deflate the balloon, reclaiming
all the pages that were given to the host.  However, we don't update the
config values for the new balloon size, resulting in the host showing
outdated balloon values.

The size update is done after each leak and fill operation, only the
module removal case was left out.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-05-17 12:14:34 +03:00
Amit Shah
fa8b66ccd2 virtio: console: tell host of open ports after resume from s3/s4
If a port was open before going into one of the sleep states, the port
can continue normal operation after restore.  However, the host has to
be told that the guest side of the connection is open to restore
pre-suspend state.

This wasn't noticed so far due to a bug in qemu that was fixed recently
(which marked the guest-side connection as always open).

CC: stable@vger.kernel.org   # Only for 3.3

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-05-17 12:14:33 +03:00