User memory needs to be zeroed out before it is sent to the user. To do this, the kernel maps the page, memsets it to zero and then unmaps it. By virtue of mapping it, this forces us to flush the dcache to ensure cache coherency between kernel and user mappings. Originally, the page_alloc loop was using GFP_ZERO (which does a map, memset, and unmap for each individual page) and then we were additionally calling flush_dcache_page() for each page killing us on performance. It is far more efficient, especially for large allocations (> 1MB), to allocate the pages without GFP_ZERO and then to vmap the entire allocation, memset it to zero, flush the cache and then unmap. This process is slightly slower for very small allocations, but only by a few microseconds, and is well within the margin of acceptability. In all, the new scheme is faster than the default for all sizes greater than 16k, and is almost 4X faster for 2MB and 4MB allocations which are common for textures and very large buffer objects. The downside is that if there isn't enough vmalloc room for the allocation that we are forced to fallback to a slow page by page memset/flush, but this should happen rarely (if at all) and is only included for completeness. CRs-Fixed: 372638 Change-Id: Ic0dedbadf3e27dcddf0f068594a40c00d64b495e Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>