ubports_kernel_google_msm

Author	SHA1	Message	Date
ted_lin	d892eeb1c6	mm: workaround for widevine playback failed This change fix widevine playback issue for ION failing to secure mm heap after flashing image into device or do factory reset. Bug: 9051476 Reference from: commit 15962c2598f71e0fed921ae441fede5df2bb9e1c Author: Laura Abbott <lauraa@codeaurora.org> Date: Fri Apr 5 12:39:18 2013 -0700 mm: Retry original migrate type if CMA failed Currently, __rmqueue_cma will disregard the original migrate type and only try MIGRATE_CMA for allocations. If the MIGRATE_CMA allocation fails, the fallback types of the original migrate type are used. Note that in this current path we never try to actually allocate from the original migrate type. If the only pages left in the system are the original migrate type, we will fail the allocation since we never actually try the original migrate type. This may lead to infinite looping since the system still (correctly) calculates there are pages available for allocation and will keep trying to allocate pages. Fix this degenerate case by allocating from the original migrate type if the MIGRATE_CMA allocation fails. Change-Id: I62ab293dc694955eaf88e790131a8565395ba8cb CRs-Fixed: 470615 Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: ted_lin <ted_lin@asus.com>	2013-05-22 07:57:36 +00:00
Heesub Shin	262de8b688	cma: redirect page allocation to CMA CMA pages are designed to be used as fallback for movable allocations and cannot be used for non-movable allocations. If CMA pages are utilized poorly, non-movable allocations may end up getting starved if all regular movable pages are allocated and the only pages left are CMA. Always using CMA pages first creates unacceptable performance problems. As a midway alternative, use CMA pages for certain userspace allocations. The userspace pages can be migrated or dropped quickly which giving decent utilization. Change-Id: I6165dda01b705309eebabc6dfa67146b7a95c174 CRs-Fixed: 452508 [lauraa@codeaurora.org: Missing CONFIG_CMA guards, add commit text] Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-03-15 17:08:42 -07:00
Laura Abbott	2390206ce0	Revert "mm: cma: on movable allocations try MIGRATE_CMA first" This reverts commit b5662d64fa5ee483b985b351dec993402422fee3. Using CMA pages first creates good utilization but has some unfortunate side effects. Many movable allocations come from the filesystem layer which can hold on to pages for long periods of time which causes high allocation times (~200ms) and high rates of failure. Revert this patch and use alternate allocation strategies to get better utilization. Change-Id: I917e137d5fb292c9f8282506f71a799a6451ccfa CRs-Fixed: 452508 Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-03-15 17:08:41 -07:00
Laura Abbott	ee57020c2b	mm: Don't put CMA pages on per cpu lists CMA allocations rely on being able to migrate pages out quickly to fulfill the allocations. Most use cases for movable allocations meet this requirement. File system allocations may take an unaccpetably long time to migrate, which creates delays from CMA. Prevent CMA pages from ending up on the per-cpu lists to avoid code paths grabbing CMA pages on the fast path. CMA pages can still be allocated as a fallback under tight memory pressure. CRs-Fixed: 452508 Change-Id: I79a28f697275a2a1870caabae53c8ea345b4b47d Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-03-15 17:08:40 -07:00
Bartlomiej Zolnierkiewicz	deba57a4a7	cma: fix watermark checking * Add ALLOC_CMA alloc flag and pass it to [__]zone_watermark_ok() (from Minchan Kim). * During watermark check decrease available free pages number by free CMA pages number if necessary (unmovable allocations cannot use pages from CMA areas). CRs-Fixed: 446321 Change-Id: Ibd069b028eb80b70701c1b81cb28a503d8265be0 Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lauraa@codeaurora.org: context fixups] Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-03-15 17:06:38 -07:00
Rabin Vincent	204a535e11	mm: show migration types in show_mem This is useful to diagnose the reason for page allocation failure for cases where there appear to be several free pages. Example, with this alloc_pages(GFP_ATOMIC) failure: swapper/0: page allocation failure: order:0, mode:0x0 ... Mem-info: Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 48 CPU 1: hi: 90, btch: 15 usd: 21 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:84 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:4026 slab_reclaimable:75 slab_unreclaimable:484 mapped:0 shmem:0 pagetables:0 bounce:0 Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 Before the patch, it's hard (for me, at least) to say why all these free chunks weren't considered for allocation: Normal: 04kB 08kB 016kB 032kB 064kB 0128kB 1256kB 1512kB 11024kB 12048kB 34096kB = 16128kB After the patch, it's obvious that the reason is that all of these are in the MIGRATE_CMA (C) freelist: Normal: 04kB 08kB 016kB 032kB 064kB 0128kB 1256kB (C) 1512kB (C) 11024kB (C) 12048kB (C) 34096kB (C) = 16128kB Change-Id: Ic5fe77d762e0c03715bfb917774e7c4f03ac43f5 Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-03-07 15:25:06 -08:00
Michal Nazarewicz	a1357d5f1b	mm: cma: on movable allocations try MIGRATE_CMA first It has been observed that system tends to keep a lot of CMA free pages even in very high memory pressure use cases. The CMA fallback for movable pages is used very rarely, only when system is completely pruned from MOVABLE pages. This means that the out-of-memory is triggered for unmovable allocations even when there are many CMA pages available. This problem was not observed previously since movable pages were used as a fallback for unmovable allocations. To avoid such situation this commit changes the allocation order so that on movable allocations the MIGRATE_CMA pageblocks are used first. This change means that the MIGRATE_CMA can be removed from fallback path of the MIGRATE_MOVABLE type. This means that the __rmqueue_fallback() function will never deal with CMA pages and thus all the checks around MIGRATE_CMA can be removed from that function. Change-Id: Ie13312d62a6af12d7aa78b4283ed25535a6d49fd CRs-Fixed: 435287 Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-03-07 15:25:05 -08:00
Liam Mark	bd6b0d88ed	android/lowmemorykiller: Selectively count free CMA pages In certain memory configurations there can be a large number of CMA pages which are not suitable to satisfy certain memory requests. This large number of unsuitable pages can cause the lowmemorykiller to not kill any tasks because the lowmemorykiller counts all free pages. In order to ensure the lowmemorykiller properly evaluates the free memory only count the free CMA pages if they are suitable for satisfying the memory request. Change-Id: I7f06d53e2d8cfe7439e5561fe6e5209ce73b1c90 CRs-fixed: 437016 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2013-03-07 15:24:17 -08:00
Larry Bassel	bad999e743	mm: make counts of CMA free pages correct Both patches needed, second patch (among other things) fixes a bug in the first. commit 2139cbe627b8910ded55148f87ee10f7485408ed Author: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Date: Mon Oct 8 16:32:00 2012 -0700 cma: fix counting of isolated pages Isolated free pages shouldn't be accounted to NR_FREE_PAGES counter. Fix it by properly decreasing/increasing NR_FREE_PAGES counter in set_migratetype_isolate()/unset_migratetype_isolate() and removing counter adjustment for isolated pages from free_one_page() and split_free_page(). Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lbassel@codeaurora.org: backport from 3.7, small changes needed] Signed-off-by: Larry Bassel <lbassel@codeaurora.org> commit d1ce749a0db12202b711d1aba1d29e823034648d Author: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Date: Mon Oct 8 16:32:02 2012 -0700 cma: count free CMA pages Add NR_FREE_CMA_PAGES counter to be later used for checking watermark in __zone_watermark_ok(). For simplicity and to avoid #ifdef hell make this counter always available (not only when CONFIG_CMA=y). [akpm@linux-foundation.org: use conventional migratetype naming] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lbassel@codeaurora.org: backport from 3.7, small changes needed] Signed-off-by: Larry Bassel <lbassel@codeaurora.org> Change-Id: I7d4f5fe0b6931192706337e0b730f43e7cccd031 Signed-off-by: Larry Bassel <lbassel@codeaurora.org> Signed-off-by: Neha Pandey <nehap@codeaurora.org>	2013-03-07 15:23:58 -08:00
Laura Abbott	3c2b534580	mm: Use aligned zone start for pfn_to_bitidx calculation The current calculation in pfn_to_bitidx assumes that (pfn - zone->zone_start_pfn) >> pageblock_order will return the same bit for all pfn in a pageblock. If zone_start_pfn is not aligned to pageblock_nr_pages, this may not always be correct. Consider the following with pageblock order = 10, zone start 2MB: pfn \| pfn - zone start \| (pfn - zone start) >> page block order ---------------------------------------------------------------- 0x26000 \| 0x25e00 \| 0x97 0x26100 \| 0x25f00 \| 0x97 0x26200 \| 0x26000 \| 0x98 0x26300 \| 0x26100 \| 0x98 This means that calling {get,set}_pageblock_migratetype on a single page will not set the migratetype for the full block. Fix this by rounding down zone_start_pfn when doing the bitidx calculation. Change-Id: I13e2f53f50db294f38ec86138c17c6fe29f0ee82 Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>	2013-03-07 15:23:38 -08:00
Minchan Kim	360ffdd941	cma: fix migration mode __alloc_contig_migrate_range calls migrate_pages with wrong argument for migrate_mode. Fix it. Change-Id: I84697cf7c6aef6253e9ee7e5b3028c946b95e253 Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>	2013-03-07 15:23:37 -08:00
woojoong.lee	200d9f6ddf	cma : use migrate_prep() instead of migrate_prep_local() __alloc_contig_migrate_range() should use all possible ways to get all the pages migrated from the given memory range, so pruning per-cpu lru lists for all CPUs is required, regadless the cost of such operation. Otherwise some pages which got stuck at per-cpu lru list might get missed by migration procedure causing the contiguous allocation to fail. Change-Id: I70cc0864c57dd49e89f57797122a3fd0f300647a Signed-off-by: woojoong.lee <woojoong.lee@samsung.com> Reviewed-on: http://165.213.202.130:8080/43063 Tested-by: System S/W SCM <scm.systemsw@samsung.com> Reviewed-by: daeho jeong <daeho.jeong@samsung.com> Reviewed-by: Jeong-Ho Kim <jammer@samsung.com> Tested-by: Jeong-Ho Kim <jammer@samsung.com> [lauraa@codeaurora.org: Applied to correct file] Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>	2013-03-07 15:23:36 -08:00
Laura Abbott	9fa56006b4	mm: Add is_cma_pageblock definition Bring back the is_cma_pageblock definition for determining if a page is CMA or not. Change-Id: I39fd546e22e240b752244832c79514f109c8e84b Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>	2013-03-07 15:23:36 -08:00
Liam Mark	39832ef865	mm: split_free_page ignore memory watermarks for CMA Memory watermarks were sometimes preventing CMA allocations in low memory. Change-Id: I550ec987cbd6bc6dadd72b4a764df20cd0758479 Signed-off-by: Liam Mark <lmark@codeaurora.org> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>	2013-03-07 15:23:36 -08:00
Rabin Vincent	a71dc49243	mm: cma: don't replace lowmem pages with highmem The filesystem layer expects pages in the block device's mapping to not be in highmem (the mapping's gfp mask is set in bdget()), but CMA can currently replace lowmem pages with highmem pages, leading to crashes in filesystem code such as the one below: Unable to handle kernel NULL pointer dereference at virtual address 00000400 pgd = c0c98000 [00000400] pgd=00c91831, pte=00000000, *ppte=00000000 Internal error: Oops: 817 [#1] PREEMPT SMP ARM CPU: 0 Not tainted (3.5.0-rc5+ #80) PC is at __memzero+0x24/0x80 ... Process fsstress (pid: 323, stack limit = 0xc0cbc2f0) Backtrace: [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98) [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc) r4:c15337f0 [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98) [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac) r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0 [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24) r6:beccdcf0 r5:00074000 r4:beccdbbc [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30) Fix this by replacing only highmem pages with highmem. Change-Id: I6af2d509af48b5a586037be14bd3593b3f269d95 Reported-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:18:02 -08:00
Marek Szyprowski	3d46ca5672	mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks alloc_contig_range() performs memory allocation so it also should keep track on keeping the correct level of memory watermarks. This commit adds a call to *_slowpath style reclaim to grab enough pages to make sure that the final collection of contiguous pages from freelists will not starve the system. Change-Id: I2d68d9ac2cfcd32ca6f515fc7e44e8d9d850dff1 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> CC: Michal Nazarewicz <mina86@mina86.com> Tested-by: Rob Clark <rob.clark@linaro.org> Tested-by: Ohad Ben-Cohen <ohad@wizery.com> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Tested-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:14:15 -08:00
Marek Szyprowski	b68303ab8d	mm: extract reclaim code from __alloc_pages_direct_reclaim() This patch extracts common reclaim code from __alloc_pages_direct_reclaim() function to separate function: __perform_reclaim() which can be later used by alloc_contig_range(). Change-Id: Ia9d8b82018d91dc669488955b20f69f1cba43147 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Rob Clark <rob.clark@linaro.org> Tested-by: Ohad Ben-Cohen <ohad@wizery.com> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Tested-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:14:03 -08:00
Mel Gorman	2c063ac1c8	mm: Serialize access to min_free_kbytes There is a race between the min_free_kbytes sysctl, memory hotplug and transparent hugepage support enablement. Memory hotplug uses a zonelists_mutex to avoid a race when building zonelists. Reuse it to serialise watermark updates. Change-Id: I31786592a8cc03e579ee01d99d7eba76e926263f [a.p.zijlstra@chello.nl: Older patch fixed the race with spinlock] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:14:03 -08:00
Michal Nazarewicz	944e4004e2	mm: page_isolation: MIGRATE_CMA isolation functions added This commit changes various functions that change pages and pageblocks migrate type between MIGRATE_ISOLATE and MIGRATE_MOVABLE in such a way as to allow to work with MIGRATE_CMA migrate type. Change-Id: Ib3a0b04cae49396b206a39bfced470e218ab1f90 Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Rob Clark <rob.clark@linaro.org> Tested-by: Ohad Ben-Cohen <ohad@wizery.com> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Tested-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:14:02 -08:00
Michal Nazarewicz	269a4c9264	mm: mmzone: MIGRATE_CMA migration type added The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks. This guarantees (to some degree) that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system). It is designed to be used for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, pages from MIGRATE_CMA pageblocks may be migrated away to create a contiguous block. To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types when requested. Change-Id: I2bb0954de8be4f212b03dea0e5a508048684bda2 Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Rob Clark <rob.clark@linaro.org> Tested-by: Ohad Ben-Cohen <ohad@wizery.com> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Tested-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:14:01 -08:00
Michal Nazarewicz	629b2bf987	mm: page_alloc: change fallbacks array handling This commit adds a row for MIGRATE_ISOLATE type to the fallbacks array which was missing from it. It also, changes the array traversal logic a little making MIGRATE_RESERVE an end marker. The letter change, removes the implicit MIGRATE_UNMOVABLE from the end of each row which was read by __rmqueue_fallback() function. Change-Id: Icdbbebb9eece2468c0b963964be9a4c579cbc775 Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Rob Clark <rob.clark@linaro.org> Tested-by: Ohad Ben-Cohen <ohad@wizery.com> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Tested-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:13:59 -08:00
Michal Nazarewicz	8b32931307	mm: page_alloc: introduce alloc_contig_range() This commit adds the alloc_contig_range() function which tries to allocate given range of pages. It tries to migrate all already allocated pages that fall in the range thus freeing them. Once all pages in the range are freed they are removed from the buddy system thus allocated for the caller to use. Change-Id: I659b133b1c9991568bfb6bd09c7792e15f2a2bfb Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Rob Clark <rob.clark@linaro.org> Tested-by: Ohad Ben-Cohen <ohad@wizery.com> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Tested-by: Robert Nelson <robertcnelson@gmail.com> Tested-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:13:59 -08:00
Michal Nazarewicz	aa8a31b992	mm: page_alloc: remove trailing whitespace Change-Id: I1f112fa3be958d1f9d24ebd076ef4ddcf91fe868 Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2013-02-27 18:13:56 -08:00
Shashank Mittal	48ac59797a	mm: Fix a compiler warning. Fix compiler warning for a variable not initialized. Change-Id: Ieedeb1cfb5a22eb5f671e6bfd1361315347a49af Signed-off-by: Shashank Mittal <mittals@codeaurora.org>	2013-02-27 18:11:30 -08:00
Stephen Boyd	84d1c1a3a3	Merge branch 'goog/googly' (early part) into goog/msm-soc-3.4 Fix NR_IPI to be 7 instead of 6 because both googly and core add an IPI. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Conflicts: arch/arm/Kconfig arch/arm/common/Makefile arch/arm/include/asm/hardware/cache-l2x0.h arch/arm/mm/cache-l2x0.c arch/arm/mm/mmu.c include/linux/wakelock.h kernel/power/Kconfig kernel/power/Makefile kernel/power/main.c kernel/power/power.h	2013-02-25 11:25:46 -08:00
Larry Bassel	514cd9bbbc	mm: use required fixed size of movable zone if FIX_MOVABLE_ZONE If FIX_MOVABLE_ZONE is enabled, we want a specific size and location of ZONE_MOVABLE. Change-Id: I0b858c7310cd328e1118abc9d5fe6f364bb4ffad Signed-off-by: Larry Bassel <lbassel@codeaurora.org> (cherry picked from commit e6436864f939e77226c8e971610539649ed5f869)	2013-02-20 02:43:58 -08:00
Jack Cheung	f6e34be773	mm: Add total_unmovable_pages global variable Vmalloc will exit if the amount it needs to allocate is greater than totalram_pages. Vmalloc cannot allocate from the movable zone, so pages in the movable zone should not be counted. This change adds a new global variable: total_unmovable_pages. It is calculated in init.c, based on totalram_pages minus the pages in the movable zone. Vmalloc now looks at this new global instead of totalram_pages. total_unmovable_pages can be modified during memory_hotplug. If the zone you are offlining/onlining is unmovable, then you modify it similar to totalram_pages. If the zone is movable, then no change is needed. Change-Id: Ie55c41051e9ad4b921eb04ecbb4798a8bd2344d6 Signed-off-by: Jack Cheung <jackc@codeaurora.org> (cherry picked from commit 59f9f1c9ae463a3d4499cd9353619f8b1993371b) Conflicts: arch/arm/mm/init.c mm/memory_hotplug.c mm/page_alloc.c mm/vmalloc.c	2013-02-20 02:43:58 -08:00
Jack Cheung	18e44d3eaf	mm: Cast lowmem_reserve to long z->lowmem_reserve[classzone_idx] is an unsigned long but free_pages and min are longs. If free_pages is negative, the function will incorrectly return true because it will treat the negative long as a large, positive unsigned long. This change casts z->lowmem_reserve to a long and fixes a typo in the comment. Change-Id: Icada1fa5ca650fbcdb0656f637adbb98f223eec5 Signed-off-by: Jack Cheung <jackc@codeaurora.org> (cherry picked from commit 9f41da81017657a194a4e145bab337f13a4d7fd9)	2013-02-20 02:43:57 -08:00
Colin Cross	ec0b571c19	Merge commit 'v3.4-rc7' into android-3.4	2012-05-14 16:41:02 -07:00
Hugh Dickins	1b76b02f15	mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Why is there less MemFree than there used to be? It perturbed a test, so I've just been bisecting linux-next, and now find the offender went upstream yesterday. Commit `93278814d3` "mm: fix division by 0 in percpu_pagelist_fraction()" mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, which leaves 1/8th of memory on percpu lists (on each cpu??); but most of us expect it to be left unset at 0 (and it's not then used as a divisor). MemTotal: 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB Repetitive test with percpu_pagelist_fraction 8: MemFree: 6948420kB 6237172kB 6949696kB 6840692kB 6949048kB 6862984kB Same test with percpu_pagelist_fraction back to 0: MemFree: 7945000kB 7944908kB 7948568kB 7949060kB 7948796kB 7948812kB Signed-off-by: Hugh Dickins <hughd@google.com> [ We really should fix the crazy sysctl interface too, but that's a separate thing - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-11 09:23:39 -07:00
Sasha Levin	93278814d3	mm: fix division by 0 in percpu_pagelist_fraction() percpu_pagelist_fraction_sysctl_handler() has only considered -EINVAL as a possible error from proc_dointvec_minmax(). If any other error is returned, it would proceed to divide by zero since percpu_pagelist_fraction wasn't getting initialized at any point. For example, writing 0 bytes into the proc file would trigger the issue. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-10 15:06:44 -07:00
Arve Hjønnevåg	cde8949795	mm: Add min_free_order_shift tunable. By default the kernel tries to keep half as much memory free at each order as it does for one order below. This can be too agressive when running without swap. Change-Id: I5efc1a0b50f41ff3ac71e92d2efd175dedd54ead Signed-off-by: Arve Hjønnevåg <arve@android.com>	2012-04-09 13:53:12 -07:00
Gilad Ben-Yossef	74046494ea	mm: only IPI CPUs to drain local pages if they exist Calculate a cpumask of CPUs with per-cpu pages in any zone and only send an IPI requesting CPUs to drain these pages to the buddy allocator if they actually have pages when asked to flush. This patch saves 85%+ of IPIs asking to drain per-cpu pages in case of severe memory pressure that leads to OOM since in these cases multiple, possibly concurrent, allocation requests end up in the direct reclaim code path so when the per-cpu pages end up reclaimed on first allocation failure for most of the proceeding allocation attempts until the memory pressure is off (possibly via the OOM killer) there are no per-cpu pages on most CPUs (and there can easily be hundreds of them). This also has the side effect of shortening the average latency of direct reclaim by 1 or more order of magnitude since waiting for all the CPUs to ACK the IPI takes a long time. Tested by running "hackbench 400" on a 8 CPU x86 VM and observing the difference between the number of direct reclaim attempts that end up in drain_all_pages() and those were more then 1/2 of the online CPU had any per-cpu page in them, using the vmstat counters introduced in the next patch in the series and using proc/interrupts. In the test sceanrio, this was seen to save around 3600 global IPIs after trigerring an OOM on a concurrent workload: $ cat /proc/vmstat \| tail -n 2 pcp_global_drain 0 pcp_global_ipi_saved 0 $ cat /proc/interrupts \| grep CAL CAL: 1 2 1 2 2 2 2 2 Function call interrupts $ hackbench 400 [OOM messages snipped] $ cat /proc/vmstat \| tail -n 2 pcp_global_drain 3647 pcp_global_ipi_saved 3642 $ cat /proc/interrupts \| grep CAL CAL: 6 13 6 3 3 3 1 2 7 Function call interrupts Please note that if the global drain is removed from the direct reclaim path as a patch from Mel Gorman currently suggests this should be replaced with an on_each_cpu_cond invocation. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Michal Nazarewicz <mina86@mina86.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-28 17:14:35 -07:00
David Rientjes	29fd66d289	mm, coredump: fail allocations when coredumping instead of oom killing The size of coredump files is limited by RLIMIT_CORE, however, allocating large amounts of memory results in three negative consequences: - the coredumping process may be chosen for oom kill and quickly deplete all memory reserves in oom conditions preventing further progress from being made or tasks from exiting, - the coredumping process may cause other processes to be oom killed without fault of their own as the result of a SIGSEGV, for example, in the coredumping process, or - the coredumping process may result in a livelock while writing to the dump file if it needs memory to allocate while other threads are in the exit path waiting on the coredumper to complete. This is fixed by implying __GFP_NORETRY in the page allocator for coredumping processes when reclaim has failed so the allocations fail and the process continues to exit. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-28 17:14:35 -07:00
Kautuk Consul	b224ef856b	page_alloc: remove unused find_zone_movable_pfns_for_nodes() argument find_zone_movable_pfns_for_nodes() does not use its argument. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:55:00 -07:00
Kautuk Consul	8d13bddd11	page_alloc.c: remove add_from_early_node_map() add_from_early_node_map() is unused. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:55:00 -07:00
Mel Gorman	cc9a6c8776	cpuset: mm: reduce large amounts of memory barrier related damage v3 Commit `c0ff7453bb` ("cpuset,mm: fix no node to alloc memory when changing cpuset's mems") wins a super prize for the largest number of memory barriers entered into fast paths for one commit. [get\|put]_mems_allowed is incredibly heavy with pairs of full memory barriers inserted into a number of hot paths. This was detected while investigating at large page allocator slowdown introduced some time after 2.6.32. The largest portion of this overhead was shown by oprofile to be at an mfence introduced by this commit into the page allocator hot path. For extra style points, the commit introduced the use of yield() in an implementation of what looks like a spinning mutex. This patch replaces the full memory barriers on both read and write sides with a sequence counter with just read barriers on the fast path side. This is much cheaper on some architectures, including x86. The main bulk of the patch is the retry logic if the nodemask changes in a manner that can cause a false failure. While updating the nodemask, a check is made to see if a false failure is a risk. If it is, the sequence number gets bumped and parallel allocators will briefly stall while the nodemask update takes place. In a page fault test microbenchmark, oprofile samples from __alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The actual results were 3.3.0-rc3 3.3.0-rc3 rc3-vanilla nobarrier-v2r1 Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%) Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%) Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%) Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%) Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%) Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%) Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%) Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%) Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%) Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%) Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%) Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%) Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%) Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%) Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%) MMTests Statistics: duration Sys Time Running Test (seconds) 135.68 132.17 User+Sys Time Running Test (seconds) 164.2 160.13 Total Elapsed Time (seconds) 123.46 120.87 The overall improvement is small but the System CPU time is much improved and roughly in correlation to what oprofile reported (these performance figures are without profiling so skew is expected). The actual number of page faults is noticeably improved. For benchmarks like kernel builds, the overall benefit is marginal but the system CPU time is slightly reduced. To test the actual bug the commit fixed I opened two terminals. The first ran within a cpuset and continually ran a small program that faulted 100M of anonymous data. In a second window, the nodemask of the cpuset was continually randomised in a loop. Without the commit, the program would fail every so often (usually within 10 seconds) and obviously with the commit everything worked fine. With this patch applied, it also worked fine so the fix should be functionally equivalent. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:54:59 -07:00
Konstantin Khlebnikov	f0cb3c76ae	mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug This cpu hotplug hook was accidentally removed in commit `00a62ce91e` ("mm: fix Committed_AS underflow on large NR_CPUS environment") The visible effect of this accident: some pages are borrowed in per-cpu page-vectors. Truncate can deal with it, but these pages cannot be reused while this cpu is offline. So this is like a temporary memory leak. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:54:58 -07:00
David Rientjes	08ab9b10d4	mm, oom: force oom kill on sysrq+f The oom killer chooses not to kill a thread if: - an eligible thread has already been oom killed and has yet to exit, and - an eligible thread is exiting but has yet to free all its memory and is not the thread attempting to currently allocate memory. SysRq+F manually invokes the global oom killer to kill a memory-hogging task. This is normally done as a last resort to free memory when no progress is being made or to test the oom killer itself. For both uses, we always want to kill a thread and never defer. This patch causes SysRq+F to always kill an eligible thread and can be used to force a kill even if another oom killed thread has failed to exit. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:54:58 -07:00
Rik van Riel	aff622495c	vmscan: only defer compaction for failed order and higher Currently a failed order-9 (transparent hugepage) compaction can lead to memory compaction being temporarily disabled for a memory zone. Even if we only need compaction for an order 2 allocation, eg. for jumbo frames networking. The fix is relatively straightforward: keep track of the highest order at which compaction is succeeding, and only defer compaction for orders at which compaction is failing. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:54:56 -07:00
Dimitri Sivanich	074b85175a	vfs: fix panic in __d_lookup() with high dentry hashtable counts When the number of dentry cache hash table entries gets too high (2147483648 entries), as happens by default on a 16TB system, use of a signed integer in the dcache_init() initialization loop prevents the dentry_hashtable from getting initialized, causing a panic in __d_lookup(). Fix this in dcache_init() and similar areas. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:38 -05:00
Michal Hocko	656a070629	mm: __count_immobile_pages(): make sure the node is online page_zone() requires an online node otherwise we are accessing NULL NODE_DATA. This is not an issue at the moment because node_zones are located at the structure beginning but this might change in the future so better be careful about that. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-23 08:38:47 -08:00
Michal Hocko	687875fb7d	mm: fix NULL ptr dereference in __count_immobile_pages Fix the following NULL ptr dereference caused by cat /sys/devices/system/memory/memory0/removable Pid: 13979, comm: sed Not tainted 3.0.13-0.5-default #1 IBM BladeCenter LS21 -[7971PAM]-/Server Blade RIP: __count_immobile_pages+0x4/0x100 Process sed (pid: 13979, threadinfo ffff880221c36000, task ffff88022e788480) Call Trace: is_pageblock_removable_nolock+0x34/0x40 is_mem_section_removable+0x74/0xf0 show_mem_removable+0x41/0x70 sysfs_read_file+0xfe/0x1c0 vfs_read+0xc7/0x130 sys_read+0x53/0xa0 system_call_fastpath+0x16/0x1b We are crashing because we are trying to dereference NULL zone which came from pfn=0 (struct page ffffea0000000000). According to the boot log this page is marked reserved: e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) and early_node_map confirms that: early_node_map[3] active PFN ranges 1: 0x00000010 -> 0x0000009c 1: 0x00000100 -> 0x000bffa3 1: 0x00100000 -> 0x00240000 The problem is that memory_present works in PAGE_SECTION_MASK aligned blocks so the reserved range sneaks into the the section as well. This also means that free_area_init_node will not take care of those reserved pages and they stay uninitialized. When we try to read the removable status we walk through all available sections and hope that the zone is valid for all pages in the section. But this is not true in this case as the zone and nid are not initialized. We have only one node in this particular case and it is marked as node=1 (rather than 0) and that made the problem visible because page_to_nid will return 0 and there are no zones on the node. Let's check that the zone is valid and that the given pfn falls into its boundaries and mark the section not removable. This might cause some false positives, probably, but we do not have any sane way to find out whether the page is reserved by the platform or it is just not used for whatever other reasons. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-23 08:38:47 -08:00
Hugh Dickins	4111304dab	mm: enum lru_list lru Mostly we use "enum lru_list lru": change those few "l"s to "lru"s. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:10 -08:00
Mel Gorman	66199712e9	mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred If compaction is deferred, direct reclaim is used to try to free enough pages for the allocation to succeed. For small high-orders, this has a reasonable chance of success. However, if the caller has specified __GFP_NO_KSWAPD to limit the disruption to the system, it makes more sense to fail the allocation rather than stall the caller in direct reclaim. This patch skips direct reclaim if compaction is deferred and the caller specifies __GFP_NO_KSWAPD. Async compaction only considers a subset of pages so it is possible for compaction to be deferred prematurely and not enter direct reclaim even in cases where it should. To compensate for this, this patch also defers compaction only if sync compaction failed. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel<riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:09 -08:00
Bob Liu	d0048b0e59	page_alloc: break early in check_for_regular_memory() If there is a zone below ZONE_NORMAL has present_pages, we can set node state to N_NORMAL_MEMORY, no need to loop to end. Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:07 -08:00
Johannes Weiner	6290df5458	mm: collect LRU list heads into struct lruvec Having a unified structure with a LRU list set for both global zones and per-memcg zones allows to keep that code simple which deals with LRU lists and does not care about the container itself. Once the per-memcg LRU lists directly link struct pages, the isolation function and all other list manipulations are shared between the memcg case and the global LRU case. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:05 -08:00
Johannes Weiner	c3993076f8	mm: page_alloc: generalize order handling in __free_pages_bootmem() __free_pages_bootmem() used to special-case higher-order frees to save individual page checking with free_pages_bulk(). Nowadays, both zero order and non-zero order frees use free_pages(), which checks each individual page anyway, and so there is little point in making the distinction anymore. The higher-order loop will work just fine for zero order pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:44 -08:00
Michal Hocko	df0a6daa01	mm: fix off-by-two in __zone_watermark_ok() Commit `88f5acf88a` ("mm: page allocator: adjust the per-cpu counter threshold when memory is low") changed the form how free_pages is calculated but it forgot that we used to do free_pages - ((1 << order) - 1) so we ended up with off-by-two when calculating free_pages. Reported-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:44 -08:00
Johannes Weiner	a756cf5908	mm: try to distribute dirty pages fairly across zones The maximum number of dirty pages that exist in the system at any time is determined by a number of pages considered dirtyable and a user-configured percentage of those, or an absolute number in bytes. This number of dirtyable pages is the sum of memory provided by all the zones in the system minus their lowmem reserves and high watermarks, so that the system can retain a healthy number of free pages without having to reclaim dirty pages. But there is a flaw in that we have a zoned page allocator which does not care about the global state but rather the state of individual memory zones. And right now there is nothing that prevents one zone from filling up with dirty pages while other zones are spared, which frequently leads to situations where kswapd, in order to restore the watermark of free pages, does indeed have to write pages from that zone's LRU list. This can interfere so badly with IO from the flusher threads that major filesystems (btrfs, xfs, ext4) mostly ignore write requests from reclaim already, taking away the VM's only possibility to keep such a zone balanced, aside from hoping the flushers will soon clean pages from that zone. Enter per-zone dirty limits. They are to a zone's dirtyable memory what the global limit is to the global amount of dirtyable memory, and try to make sure that no single zone receives more than its fair share of the globally allowed dirty pages in the first place. As the number of pages considered dirtyable excludes the zones' lowmem reserves and high watermarks, the maximum number of dirty pages in a zone is such that the zone can always be balanced without requiring page cleaning. As this is a placement decision in the page allocator and pages are dirtied only after the allocation, this patch allows allocators to pass __GFP_WRITE when they know in advance that the page will be written to and become dirty soon. The page allocator will then attempt to allocate from the first zone of the zonelist - which on NUMA is determined by the task's NUMA memory policy - that has not exceeded its dirty limit. At first glance, it would appear that the diversion to lower zones can increase pressure on them, but this is not the case. With a full high zone, allocations will be diverted to lower zones eventually, so it is more of a shift in timing of the lower zone allocations. Workloads that previously could fit their dirty pages completely in the higher zone may be forced to allocate from lower zones, but the amount of pages that "spill over" are limited themselves by the lower zones' dirty constraints, and thus unlikely to become a problem. For now, the problem of unfair dirty page distribution remains for NUMA configurations where the zones allowed for allocation are in sum not big enough to trigger the global dirty limits, wake up the flusher threads and remedy the situation. Because of this, an allocation that could not succeed on any of the considered zones is allowed to ignore the dirty limits before going into direct reclaim or even failing the allocation, until a future patch changes the global dirty throttling and flusher thread activation so that they take individual zone states into account. Test results 15M DMA + 3246M DMA32 + 504 Normal = 3765M memory 40% dirty ratio 16G USB thumb drive 10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 << 15)) seconds nr_vmscan_write (stddev) min\| median\| max xfs vanilla: 549.747( 3.492) 0.000\| 0.000\| 0.000 patched: 550.996( 3.802) 0.000\| 0.000\| 0.000 fuse-ntfs vanilla: 1183.094(53.178) 54349.000\| 59341.000\| 65163.000 patched: 558.049(17.914) 0.000\| 0.000\| 43.000 btrfs vanilla: 573.679(14.015) 156657.000\| 460178.000\| 606926.000 patched: 563.365(11.368) 0.000\| 0.000\| 1362.000 ext4 vanilla: 561.197(15.782) 0.000\|2725438.000\|4143837.000 patched: 568.806(17.496) 0.000\| 0.000\| 0.000 Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Tested-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00

1 2 3 4 5 ...

684 Commits