Commit Graph

26706 Commits

Author SHA1 Message Date
David S. Miller
823dcd2506 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-08-20 10:39:12 -07:00
Alan Stern
5b1b0b812a PM / Runtime: Add macro to test for runtime PM events
This patch (as1482) adds a macro for testing whether or not a
pm_message value represents an autosuspend or autoresume (i.e., a
runtime PM) event.  Encapsulating this notion seems preferable to
open-coding the test all over the place.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-19 23:49:48 +02:00
Linus Torvalds
5ccc38740a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
  Revert "cfq: Remove special treatment for metadata rqs."
  block: fix flush machinery for stacking drivers with differring flush flags
  block: improve rq_affinity placement
  blktrace: add FLUSH/FUA support
  Move some REQ flags to the common bio/request area
  allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
  xen/blkback: Make description more obvious.
  cfq-iosched: Add documentation about idling
  block: Make rq_affinity = 1 work as expected
  block: swim3: fix unterminated of_device_id table
  block/genhd.c: remove useless cast in diskstats_show()
  drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
  drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
  bsg-lib: add module.h include
  cfq-iosched: Reduce linked group count upon group destruction
  blk-throttle: correctly determine sync bio
  loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
  loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
  loop: add management interface for on-demand device allocation
  loop: replace linked list of allocated devices with an idr index
  ...
2011-08-19 10:47:07 -07:00
Eric Dumazet
11fd165c68 sunrpc: use better NUMA affinities
Use NUMA aware allocations to reduce latencies and increase throughput.

sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: Greg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19 13:25:36 -04:00
J. Bruce Fields
778fc546f7 locks: fix tracking of inprogress lease breaks
We currently use a bit in fl_flags to record whether a lease is being
broken, and set fl_type to the type (RDLCK or UNLCK) that it will
eventually have.  This means that once the lease break starts, we forget
what the lease's type *used* to be.  Breaking a read lease will then
result in blocking read opens, even though there's no conflict--because
the lease type is now F_UNLCK and we can no longer tell whether it was
previously a read or write lease.

So, instead keep fl_type as the original type (the type which we
enforce), and keep track of whether we're unlocking or merely
downgrading by replacing the single FL_INPROGRESS flag by
FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags.

To get this right we also need to track separate downgrade and break
times, to handle the case where a write-leased file gets conflicting
opens first for read, then later for write.

(I first considered just eliminating the downgrade behavior
completely--nfsv4 doesn't need it, and nobody as far as I can tell
actually uses it currently--but Jeremy Allison tells me that Windows
oplocks do behave this way, so Samba will probably use this some day.)

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19 13:25:34 -04:00
J. Bruce Fields
710b721696 locks: move F_INPROGRESS from fl_type to fl_flags field
F_INPROGRESS isn't exposed to userspace.  To me it makes more sense in
fl_flags....

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19 13:25:34 -04:00
Linus Torvalds
0c3bef6128 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: OF: Don't crash when bridge parent is NULL.
  PCI: export pcie_bus_configure_settings symbol
  PCI: code and comments cleanup
  PCI: make cardbus-bridge resources optional
  PCI: make SRIOV resources optional
  PCI : ability to relocate assigned pci-resources
  PCI: honor child buses add_size in hot plug configuration
  PCI: Set PCI-E Max Payload Size on fabric
2011-08-19 10:02:37 -07:00
Christoph Lameter
49e2258586 slub: per cpu cache for partial pages
Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
partial pages. The partial page list is used in slab_free() to avoid
per node lock taking.

In __slab_alloc() we can then take multiple partial pages off the per
node partial list in one go reducing node lock pressure.

We can also use the per cpu partial list in slab_alloc() to avoid scanning
partial lists for pages with free objects.

The main effect of a per cpu partial list is that the per node list_lock
is taken for batches of partial pages instead of individual ones.

Potential future enhancements:

1. The pickup from the partial list could be perhaps be done without disabling
   interrupts with some work. The free path already puts the page into the
   per cpu partial list without disabling interrupts.

2. __slab_free() may have some code paths that could use optimization.

Performance:

				Before		After
./hackbench 100 process 200000
				Time: 1953.047	1564.614
./hackbench 100 process 20000
				Time: 207.176   156.940
./hackbench 100 process 20000
				Time: 204.468	156.940
./hackbench 100 process 20000
				Time: 204.879	158.772
./hackbench 10 process 20000
				Time: 20.153	15.853
./hackbench 10 process 20000
				Time: 20.153	15.986
./hackbench 10 process 20000
				Time: 19.363	16.111
./hackbench 1 process 20000
				Time: 2.518	2.307
./hackbench 1 process 20000
				Time: 2.258	2.339
./hackbench 1 process 20000
				Time: 2.864	2.163

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-08-19 19:34:27 +03:00
Wu Fengguang
bb0822954a squeeze max-pause area and drop pass-good area
Revert the pass-good area introduced in ffd1f609ab ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.

This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.

Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.

The test logs show that

- the disks are sometimes under utilized

- global dirty pages sometimes rush high to the pass-good area for
  several hundred seconds, while in the mean time some bdi dirty pages
  drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
  global dirty pages dropped under global dirty threshold and bdi_dirty
  rush very high (for example, 2 times higher than bdi_thresh). During
  which time balance_dirty_pages() is not called at all.

So the problems are

1) The random writes progress so slow that they break the assumption of
   the max-pause logic that "8 pages per 200ms is typically more than
   enough to curb heavy dirtiers".

2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
   for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
   and then (bdi_thresh >> bdi_dirty) for others.

3) The higher max-pause/pass-good thresholds somehow leads to the bad
   swing of dirty pages.

The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.

Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.

Reported-by: Li Shaohua <shaohua.li@intel.com>
Tested-by: Li Shaohua <shaohua.li@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-08-19 22:42:07 +08:00
Changli Gao
4ca2462e94 net: add the comment for skb->l4_rxhash
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-19 07:26:44 -07:00
Jiri Pirko
b81693d914 net: remove ndo_set_multicast_list callback
Remove no longer used operation.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Jiri Pirko
01789349ee net: introduce IFF_UNICAST_FLT private flag
Use IFF_UNICAST_FTL to find out if driver handles unicast address
filtering. In case it does not, promisc mode is entered.

Patch also fixes following drivers:
stmmac, niu: support uc filtering and yet it propagated
	ndo_set_multicast_list
bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set
	ndo_set_rx_mode but do not support uc filtering

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:21:27 -07:00
Tom Herbert
bdeab99191 rps: Add flag to skb to indicate rxhash is based on L4 tuple
The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet.  This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:06:03 -07:00
Linus Torvalds
b4fd4ae6c6 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
2011-08-17 13:15:25 -07:00
Ian Campbell
aa462abe8a mm: fix __page_to_pfn for a const struct page argument
This allows the cast in lowmem_page_address (introduced as a warning
fixup to 33dd4e0ec9 "mm: make some struct page's const") to be
removed.

Propagate const'ness to page_to_section() as well since it is required
by __page_to_pfn.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Ian Campbell
f991879473 mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.
Followup to 33dd4e0ec9 "mm: make some struct page's const" which missed the
HASHED_PAGE_VIRTUAL case.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Linus Torvalds
6cac952960 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rtc: Limit RTC PIE frequency
  rtc: Fix hrtimer deadlock
  rtc: Handle errors correctly in rtc_irq_set_state()

Fixup trivial conflicts in drivers/rtc/interface.c due to slightly
trivially versions of the same patch coming in two different ways.
2011-08-17 10:28:33 -07:00
Linus Torvalds
950d0a10d1 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: Track the owner of irq descriptor
  irq: Always set IRQF_ONESHOT if no primary handler is specified
  genirq: Fix wrong bit operation
2011-08-17 10:23:50 -07:00
Lukas Czerner
fbc854027c ext3: remove deprecated oldalloc
For a long time now orlov is the default block allocator in the ext3. It
performs better than the old one and no one seems to claim otherwise so
we can safely drop it and make oldalloc and orlov mount option
deprecated.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-08-17 11:42:19 +02:00
Ben Hutchings
a27fc96b03 ethtool: Note common alternate exit condition for interrupt coalescing
Many implementations ignore the value of max_frames and do not
treat usecs == 0 as special.  Document this as deprecated.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16 16:36:15 -07:00
Ben Hutchings
acf42a2a57 ethtool: Explicitly state the exit condition for interrupt coalescing
Also explicitly state how to disable interrupt coalescing.

Remove the now-redundant text from field descriptions.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16 16:36:15 -07:00
Ben Hutchings
2d7c79390e ethtool: Correct description of 'max_coalesced_frames' fields
The current descriptions state that these fields specify 'How many
packets to delay ... after a packet ...' which implies that the
hardware should wait for (max_coalesced_frames + 1) completions before
generating an interrupt.  It is also stated that setting both this
field and the corresponding 'coalesce_usecs' field to 0 is invalid.
Together, this implies that the hardware must always be configured
to delay a completion IRQ for at least 1 usec or 1 more completion.

I believe that the addition of 1 is not intended, and David Miller
confirms that the original implementation (in tg3) does not do this.
Clarify the descriptions of these fields to avoid this interpretation.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16 16:36:14 -07:00
Ben Hutchings
725b361b29 ethtool: Specify what kind of coalescing struct ethtool_coalesce covers
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16 16:36:14 -07:00
Ben Hutchings
5f308d195e ethtool: Reformat struct ethtool_coalesce comments into kernel-doc format
This reorders and duplicates some wording, but should make no
substantive changes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16 16:36:14 -07:00
Don Zickus
abd4d5587b pstore: change mutex locking to spin_locks
pstore was using mutex locking to protect read/write access to the
backend plug-ins.  This causes problems when pstore is executed in
an NMI context through panic() -> kmsg_dump().

This patch changes the mutex to a spin_lock_irqsave then also checks to
see if we are in an NMI context.  If we are in an NMI and can't get the
lock, just print a message stating that and blow by the locking.

All this is probably a hack around the bigger locking problem but it
solves my current situation of trying to sleep in an NMI context.

Tested by loading the lkdtm module and executing a HARDLOCKUP which
will cause the machine to panic inside the nmi handler.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-08-16 11:55:58 -07:00
Mark Brown
1ddc07d0f1 ASoC: Add WM8958 noise gate support
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-08-17 00:48:47 +09:00
Vinod Koul
a16e470caa dmaengine: remove struct scatterlist for header
Commit 90b44f8 introduces dmaengine_prep_slave_single API which adds
scatterlist.h in dmaengine.h, so defining struct scatterlist is not required

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
2011-08-16 16:41:41 +05:30
David S. Miller
19e2f6fe96 net: Fix sungem_phy sharing.
Since sungem_phy is used by multiple, unrelated, drivers make it
build as a real module under drivers/net.

depmod will pick up the symbol dependency and make sure sungem_phy.ko
gets loaded any time sungem.ko or spider_net.ko is loaded.

Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16 00:16:49 -07:00
Mimi Zohar
1e39f384bb evm: fix build problems
- Make the previously missing security_old_inode_init_security() stub
  function definition static inline.

- The stub security_inode_init_security() function previously returned
  -EOPNOTSUPP and relied on the callers to change it to 0.  The stub
  security/security_old_inode_init_security() functions now return 0.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-08-16 09:23:44 +10:00
Jeff Moyer
4853abaae7 block: fix flush machinery for stacking drivers with differring flush flags
Commit ae1b153962, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
        struct request *rq;

        while (1) {
-               while (!list_empty(&q->queue_head)) {
+               if (!list_empty(&q->queue_head)) {
                        rq = list_entry_rq(q->queue_head.next);
-                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
-                           (rq->cmd_flags & REQ_FLUSH_SEQ))
-                               return rq;
-                       rq = blk_do_flush(q, rq);
-                       if (rq)
-                               return rq;
+                       return rq;
                }

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
        unsigned int fflags = q->flush_flags; /* may change, cache it */
        bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
        bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
        bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
        REQ_FUA);
        unsigned skip = 0;
...
        if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                rq->cmd_flags &= ~REQ_FLUSH;
		if (!has_fua)
			rq->cmd_flags &= ~REQ_FUA;
	        return rq;
	}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all.  Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking.  While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io).  Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers.  So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance.  Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-15 21:37:25 +02:00
David S. Miller
2bb698412d net: Move sungem_phy.h under include/linux
Fixes build failures of the spider_net driver because it tries
to use a convoluted path to include this header.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-14 22:52:04 -07:00
Linus Torvalds
97c24d1d45 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: remove unused "ddr" parameter in struct mmc_ios
  mmc: dw_mmc: Fix DDR mode support.
  mmc: core: use defined R1_STATE_PRG macro for card status
  mmc: sdhci: use f_max instead of host->clock for timeouts
  mmc: sdhci: move timeout_clk calculation farther down
  mmc: sdhci: check host->clock before using it as a denominator
  mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK"
  mmc: tmio: eliminate unused variable 'mmc' warning
  mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC
  mmc: sdhci-s3c: Fix build for header change
  mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro
  mmc: cb710: fix possible pci_dev leak in cb710_pci_configure()
  mmc: core: Detect eMMC v4.5 ext_csd entries
  mmc: mmc_test: avoid stalled file in debugfs
  mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk
  mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing
  mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
2011-08-14 12:28:15 -07:00
Rafael J. Wysocki
17f2ae7f67 PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case.  Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-14 13:34:31 +02:00
Paul Turner
ec12cb7f31 sched: Accumulate per-cfs_rq cpu usage and charge against bandwidth
Account bandwidth usage on the cfs_rq level versus the task_groups to which
they belong.  Whether we are tracking bandwidth on a given cfs_rq is maintained
under cfs_rq->runtime_enabled.

cfs_rq's which belong to a bandwidth constrained task_group have their runtime
accounted via the update_curr() path, which withdraws bandwidth from the global
pool as desired.  Updates involving the global pool are currently protected
under cfs_bandwidth->lock, local runtime is protected by rq->lock.

This patch only assigns and tracks quota, no action is taken in the case that
cfs_rq->runtime_used exceeds cfs_rq->runtime_assigned.

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110721184757.179386821@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-14 12:03:26 +02:00
Jaehoon Chung
7fd781e8f9 mmc: remove unused "ddr" parameter in struct mmc_ios
"mmc: dw_mmc: Fix DDR mode support" removed the last user.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2011-08-13 14:50:32 -04:00
Frank Blaschka
4dc83dfd3e if_ether: add new Ethernet Protocol ID for af_iucv
Add a new ethertype for af_iucv over s/390 HiperSockets transport. Since
HiperSockets is not a real ethernet hw this is not an officially registered ID.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13 01:10:16 -07:00
Linus Torvalds
73e0881d31 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  dt: add empty of_get_property for non-dt
2011-08-12 21:59:09 -07:00
Jouni Malinen
9946ecfb51 nl80211/cfg80211: Add extra IE configuration to AP mode setup
The NL80211_CMD_NEW_BEACON command is, in practice, requesting AP mode
operations to be started. Add new attributes to provide extra IEs
(e.g., WPS IE, P2P IE) for drivers that build Beacon, Probe Response,
and (Re)Association Response frames internally (likely in firmware).

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12 13:45:04 -04:00
Jouni Malinen
5fb628e910 nl80211/cfg80211: Add crypto settings into NEW_BEACON
This removes need from drivers to parse the beacon tail/head data
to figure out what crypto settings are to be used in AP mode in case
the Beacon and Probe Response frames are fully constructed in the
driver/firmware.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12 13:45:04 -04:00
Jouni Malinen
32e9de846b nl80211/cfg80211: Allow SSID to be specified in new beacon command
This makes it easier for drivers that generate Beacon and Probe Response
frames internally (in firmware most likely) in AP mode.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12 13:45:03 -04:00
Linus Torvalds
ce8a84ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  e1000e: increase driver version number
  e1000e: alternate MAC address update
  e1000e: do not disable receiver on 82574/82583
  e1000e: alternate MAC address does not work on device id 0x1060
  PCnet: Fix section mismatch
  bnx2x: disable dcb on 578xx since not supported yet
  bnx2x: properly clean indirect addresses
  bnx2x: prevent race between undi_unload and load flows
  bnx2x: fix select_queue when FCoE is disabled
  bnx2x: init FCOE FP only once
  ipv4: some rt_iif -> rt_route_iif conversions
  net/bridge/netfilter/ebtables.c: use available error handling code
  net/netlabel/netlabel_kapi.c: add missing cleanup code
  net/irda: sh_sir: tidyup compile warning
  net/irda: sh_sir: add missing header
  net/irda: sh_irda: add missing header
  slcan: ldisc generated skbs are received in softirq context
  scm: Capture the full credentials of the scm sender
  tcp: initialize variable ecn_ok in syncookies path
  drivers/net/wireless/wl1251: add missing kfree
  ...
2011-08-12 06:43:53 -07:00
Mark Brown
c18eee3181 ASoC: Add bitfield definitions for WM8958 MICBIAS registers
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Liam Girdwood <lrg@ti.com>
2011-08-12 14:23:04 +09:00
Vasiliy Kulikov
72fa59970f move RLIMIT_NPROC check from set_user() to do_execve_common()
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only.  But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges.  So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes.  Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve().  If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected.  If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 11:24:42 -07:00
Namhyung Kim
c09c47caed blktrace: add FLUSH/FUA support
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

 - WRITE:            W
 - WRITE_FLUSH:      FW
 - WRITE_FUA:        WF
 - WRITE_FLUSH_FUA:  FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11 10:36:05 +02:00
Matthew Wilcox
8e4bf84474 Move some REQ flags to the common bio/request area
REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as
on a request, so relocate them to the shared part of the enum.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11 10:36:03 +02:00
Mimi Zohar
5a4730ba95 evm: fix evm_inode_init_security return code
evm_inode_init_security() should return 0, when EVM is not enabled.
(Returning an error is a remnant of evm_inode_post_init_security.)

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-08-11 17:42:41 +10:00
Mimi Zohar
e1c9b23adb evm: building without EVM enabled fixes
- Missing 'inline' on evm_inode_setattr() definition.
Introduced by commit 817b54aa45 ("evm: add evm_inode_setattr to prevent
updating an invalid security.evm").

- Missing security_old_inode_init_security() stub function definition.
Caused by commit 9d8f13ba3f ("security: new security_inode_init_security
API adds function callback").

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-08-11 17:42:07 +10:00
Mathieu Desnoyers
b75ef8b44b Tracepoint: Dissociate from module mutex
Copy the information needed from struct module into a local module list
held within tracepoint.c from within the module coming/going notifier.

This vastly simplifies locking of tracepoint registration /
unregistration, because we don't have to take the module mutex to
register and unregister tracepoints anymore. Steven Rostedt ran into
dependency problems related to modules mutex vs kprobes mutex vs ftrace
mutex vs tracepoint mutex that seems to be hard to fix without removing
this dependency between tracepoint and module mutex. (note: it should be
investigated whether kprobes could benefit of being dissociated from the
modules mutex too.)

This also fixes module handling of tracepoint list iterators, because it
was expecting the list to be sorted by pointer address. Given we have
control on our own list now, it's OK to sort this list which has
tracepoints as its only purpose. The reason why this sorting is required
is to handle the fact that seq files (and any read() operation from
user-space) cannot hold the tracepoint mutex across multiple calls, so
list entries may vanish between calls. With sorting, the tracepoint
iterator becomes usable even if the list don't contain the exact item
pointed to by the iterator anymore.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Jason Baron <jbaron@redhat.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20110810191839.GC8525@Krystal
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-10 20:38:14 -04:00
John Stultz
9082c465a5 alarmtimers: Add try_to_cancel functionality
There's a number of edge cases when cancelling a alarm, so
to be sure we accurately do so, introduce try_to_cancel, which
returns proper failure errors if it cannot. Also modify cancel
to spin until the alarm is properly disabled.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-08-10 14:55:29 -07:00
John Stultz
a28cde81ab alarmtimers: Add more refined alarm state tracking
In order to allow for functionality like try_to_cancel, add
more refined  state tracking (similar to hrtimers).

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-08-10 14:55:27 -07:00