[PATCH 0/5] Further compaction tuning

Discussion:

Vlastimil Babka

2014-10-07 15:33:34 UTC

Based on next-20141007. Patch 5 needs "mm: introduce single zone pcplists
drain" from https://lkml.org/lkml/2014/10/2/375

OK, time to reset the "days without a compaction series" counter back to 0.
This series is mostly what was postponed in the previous one (but sadly not
all of it), along with some smaller changes.

Patch 1 tries to solve the mismatch in watermark checking by compaction and
allocations by adding the missing pieces to compact_control. This mainly
allows simplifying deferred allocation handling by Patch 2. Change in Patch 2
was suggested by Joonsoo reviewing the previous series, but was not possible
without Patch 1. Patch 3 is a rather cosmetic change to deferred compaction.

Patch 4 removes probably the last occurence of compaction scanners rescanning
some pages when being restarted in the middle of the zone.

Patch 5 is a posthumous child of patch "mm, compaction: try to capture the
just-created high-order freepage" which was removed from the previous series.
Thanks to Joonsoo's objections we could find out that the improvements of the
capture patch was mainly due to better lru_add cache and pcplists draining.
The remaining delta wrt success rates between this patch and page capture was
due to different (questionable) watermark checking in the capture mechanism.
So this patch brings most of the improvements without the questionable parts
and complexity that capture had.

Vlastimil Babka (5):
mm, compaction: pass classzone_idx and alloc_flags to watermark
checking
mm, compaction: simplify deferred compaction
mm, compaction: defer only on COMPACT_COMPLETE
mm, compaction: always update cached scanner positions
mm, compaction: more focused lru and pcplists draining

include/linux/compaction.h | 10 +++---
mm/compaction.c | 89 +++++++++++++++++++++++++++++-----------------
mm/internal.h | 7 ++--
mm/page_alloc.c | 15 +-------
mm/vmscan.c | 12 +++----
5 files changed, 71 insertions(+), 62 deletions(-)

--
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to ***@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"***@kvack.org"> ***@kvack.org </a>

Vlastimil Babka

2014-10-07 15:33:37 UTC

Permalink

Deferred compaction is employed to avoid compacting zone where sync direct
compaction has recently failed. As such, it makes sense to only defer when
a full zone was scanned, which is when compact_zone returns with
COMPACT_COMPLETE. It's less useful to defer when compact_zone returns with
apparent success (COMPACT_PARTIAL), followed by a watermark check failure,
which can happen due to parallel allocation activity. It also does not make
much sense to defer compaction which was completely skipped (COMPACT_SKIP) for
being unsuitable in the first place.

This patch therefore makes deferred compaction trigger only when
COMPACT_COMPLETE is returned from compact_zone(). Results of stress-highalloc
becnmark show the difference is within measurement error, so the issue is
rather cosmetic.

Signed-off-by: Vlastimil Babka <***@suse.cz>
Cc: Minchan Kim <***@kernel.org>
Cc: Mel Gorman <***@suse.de>
Cc: Joonsoo Kim <***@lge.com>
Cc: Michal Nazarewicz <***@mina86.com>
Cc: Naoya Horiguchi <n-***@ah.jp.nec.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Rik van Riel <***@redhat.com>
Cc: David Rientjes <***@google.com>
---
mm/compaction.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4b3e0bd..9107588 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1340,7 +1340,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
goto break_loop;
}

- if (mode != MIGRATE_ASYNC) {
+ if (mode != MIGRATE_ASYNC && status == COMPACT_COMPLETE) {
/*
* We think that allocation won't succeed in this zone
* so we defer compaction there. If it ends up

Rik van Riel

2014-10-20 15:18:38 UTC

Permalink

Post by Vlastimil Babka
Deferred compaction is employed to avoid compacting zone where sync direct
compaction has recently failed. As such, it makes sense to only defer when
a full zone was scanned, which is when compact_zone returns with
COMPACT_COMPLETE. It's less useful to defer when compact_zone returns with
apparent success (COMPACT_PARTIAL), followed by a watermark check failure,
which can happen due to parallel allocation activity. It also does not make
much sense to defer compaction which was completely skipped (COMPACT_SKIP) for
being unsuitable in the first place.
This patch therefore makes deferred compaction trigger only when
COMPACT_COMPLETE is returned from compact_zone(). Results of stress-highalloc
becnmark show the difference is within measurement error, so the issue is
rather cosmetic.

Acked-by: Rik van Riel <***@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to ***@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"***@kvack.org"> ***@kvack.org </a>

Vlastimil Babka

2014-10-07 15:33:36 UTC

Permalink

Since commit ("mm, compaction: defer each zone individually instead of
preferred zone"), compaction is deferred for each zone where sync direct
compaction fails, and reset where it succeeds. However, it was observed
that for DMA zone compaction often appeared to succeed while subsequent
allocation attempt would not, due to different outcome of watermark check.
In order to properly defer compaction in this zone, the candidate zone has
to be passed back to __alloc_pages_direct_compact() and compaction deferred
in the zone after the allocation attempt fails.

The large source of mismatch between watermark check in compaction and
allocation was the lack of alloc_flags and classzone_idx values in compaction,
which has been fixed in the previous patch. So with this problem fixed, we
can simplify the code by removing the candidate_zone parameter and deferring
in __alloc_pages_direct_compact().

After this patch, the compaction activity during stress-highalloc benchmark is
still somewhat increased, but it's negligible compared to the increase that
occurred without the better watermark checking. This suggests that it is still
possible to apparently succeed in compaction but fail to allocate, possibly
due to parallel allocation activity.

Suggested-by: Joonsoo Kim <***@lge.com>
Signed-off-by: Vlastimil Babka <***@suse.cz>
Cc: Minchan Kim <***@kernel.org>
Cc: Mel Gorman <***@suse.de>
Cc: Joonsoo Kim <***@lge.com>
Cc: Michal Nazarewicz <***@mina86.com>
Cc: Naoya Horiguchi <n-***@ah.jp.nec.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Rik van Riel <***@redhat.com>
Cc: David Rientjes <***@google.com>
---
include/linux/compaction.h | 6 ++----
mm/compaction.c | 5 +----
mm/page_alloc.c | 12 +-----------
3 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index d896765..58c293d 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -33,8 +33,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx,
- struct zone **candidate_zone);
+ int alloc_flags, int classzone_idx);
extern void compact_pgdat(pg_data_t *pgdat, int order);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern unsigned long compaction_suitable(struct zone *zone, int order,
@@ -105,8 +104,7 @@ static inline bool compaction_restarting(struct zone *zone, int order)
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx,
- struct zone **candidate_zone)
+ int alloc_flags, int classzone_idx);
{
return COMPACT_CONTINUE;
}
diff --git a/mm/compaction.c b/mm/compaction.c
index dba8891..4b3e0bd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1276,15 +1276,13 @@ int sysctl_extfrag_threshold = 500;
* @mode: The migration mode for async, sync light, or sync migration
* @contended: Return value that determines if compaction was aborted due to
* need_resched() or lock contention
- * @candidate_zone: Return the zone where we think allocation should succeed
*
* This is the main entry point for direct page compaction.
*/
unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx,
- struct zone **candidate_zone)
+ int alloc_flags, int classzone_idx)
{
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
int may_enter_fs = gfp_mask & __GFP_FS;
@@ -1321,7 +1319,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
/* If a normal allocation would succeed, stop compacting */
if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
classzone_idx, alloc_flags)) {
- *candidate_zone = zone;
/*
* We think the allocation will succeed in this zone,
* but it is not certain, hence the false. The caller
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8d143a0..5a4506f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2328,7 +2328,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
int classzone_idx, int migratetype, enum migrate_mode mode,
int *contended_compaction, bool *deferred_compaction)
{
- struct zone *last_compact_zone = NULL;
unsigned long compact_result;
struct page *page;

@@ -2339,8 +2338,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
compact_result = try_to_compact_pages(zonelist, order, gfp_mask,
nodemask, mode,
contended_compaction,
- alloc_flags, classzone_idx,
- &last_compact_zone);
+ alloc_flags, classzone_idx);
current->flags &= ~PF_MEMALLOC;

switch (compact_result) {
@@ -2378,14 +2376,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
}

/*
- * last_compact_zone is where try_to_compact_pages thought allocation
- * should succeed, so it did not defer compaction. But here we know
- * that it didn't succeed, so we do the defer.
- */
- if (last_compact_zone && mode != MIGRATE_ASYNC)
- defer_compaction(last_compact_zone, order);
-
- /*
* It's bad if compaction run occurs and fails. The most likely reason
* is that pages exist, but not enough to satisfy watermarks.
*/

Andrew Morton

2014-10-15 22:32:12 UTC

Permalink

Post by Vlastimil Babka
Since commit ("mm, compaction: defer each zone individually instead of
preferred zone"), compaction is deferred for each zone where sync direct
compaction fails, and reset where it succeeds. However, it was observed
that for DMA zone compaction often appeared to succeed while subsequent
allocation attempt would not, due to different outcome of watermark check.
In order to properly defer compaction in this zone, the candidate zone has
to be passed back to __alloc_pages_direct_compact() and compaction deferred
in the zone after the allocation attempt fails.
The large source of mismatch between watermark check in compaction and
allocation was the lack of alloc_flags and classzone_idx values in compaction,
which has been fixed in the previous patch. So with this problem fixed, we
can simplify the code by removing the candidate_zone parameter and deferring
in __alloc_pages_direct_compact().
After this patch, the compaction activity during stress-highalloc benchmark is
still somewhat increased, but it's negligible compared to the increase that
occurred without the better watermark checking. This suggests that it is still
possible to apparently succeed in compaction but fail to allocate, possibly
due to parallel allocation activity.
...
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -33,8 +33,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx,
- struct zone **candidate_zone);
+ int alloc_flags, int classzone_idx);
extern void compact_pgdat(pg_data_t *pgdat, int order);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern unsigned long compaction_suitable(struct zone *zone, int order,
@@ -105,8 +104,7 @@ static inline bool compaction_restarting(struct zone *zone, int order)
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx,
- struct zone **candidate_zone)
+ int alloc_flags, int classzone_idx);
{
return COMPACT_CONTINUE;
}

--- a/include/linux/compaction.h~mm-compaction-simplify-deferred-compaction-fix
+++ a/include/linux/compaction.h
@@ -104,7 +104,7 @@ static inline bool compaction_restarting
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx);
+ int alloc_flags, int classzone_idx)
{
return COMPACT_CONTINUE;
}

It clearly wasn't tested with this config. Please do so and let us
know the result?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to ***@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"***@kvack.org"> ***@kvack.org </a>

Vlastimil Babka

2014-10-16 15:11:22 UTC

Permalink

Post by Andrew Morton

Post by Vlastimil Babka
@@ -105,8 +104,7 @@ static inline bool compaction_restarting(struct zone *zone, int order)
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx,
- struct zone **candidate_zone)
+ int alloc_flags, int classzone_idx);
{
return COMPACT_CONTINUE;
}

--- a/include/linux/compaction.h~mm-compaction-simplify-deferred-compaction-fix
+++ a/include/linux/compaction.h
@@ -104,7 +104,7 @@ static inline bool compaction_restarting
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx);
+ int alloc_flags, int classzone_idx)
{
return COMPACT_CONTINUE;
}
It clearly wasn't tested with this config. Please do so and let us
know the result?

Sorry, forgot. Hopefully will get better next time, since I learned
about the undertaker/vampyr tool [1] today.

You patch does fix the compilation, thanks. Boot+stress-highalloc tests
are now running through the series but I don't expect any surprises -
the series is basically a no-op with CONFIG_COMPACTION disabled.

[1] http://lwn.net/Articles/616098/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to ***@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"***@kvack.org"> ***@kvack.org </a>

Vlastimil Babka

2014-10-07 15:33:38 UTC

Permalink

Compaction caches the migration and free scanner positions between compaction
invocations, so that the whole zone gets eventually scanned and there is no
bias towards the initial scanner positions at the beginning/end of the zone.

The cached positions are continuously updated as scanners progress and the
updating stops as soon as a page is successfully isolated. The reasoning
behind this is that a pageblock where isolation succeeded is likely to succeed
again in near future and it should be worth revisiting it.

However, the downside is that potentially many pages are rescanned without
successful isolation. At worst, there might be a page where isolation from LRU
succeeds but migration fails (potentially always). So upon encountering this
page, cached position would always stop being updated for no good reason.
It might have been useful to let such page be rescanned with sync compaction
after async one failed, but this is now handled by caching scanner position
for async and sync mode separately since commit 35979ef33931 ("mm, compaction:
add per-zone migration pfn cache for async compaction").

After this patch, cached positions are updated unconditionally. In
stress-highalloc benchmark, this has decreased the numbers of scanned pages
by few percent, without affecting allocation success rates.

Signed-off-by: Vlastimil Babka <***@suse.cz>
Cc: Minchan Kim <***@kernel.org>
Cc: Mel Gorman <***@suse.de>
Cc: Joonsoo Kim <***@lge.com>
Cc: Michal Nazarewicz <***@mina86.com>
Cc: Naoya Horiguchi <n-***@ah.jp.nec.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Rik van Riel <***@redhat.com>
Cc: David Rientjes <***@google.com>
---
mm/compaction.c | 14 --------------
mm/internal.h | 5 -----
2 files changed, 19 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9107588..8fa888d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -195,16 +195,12 @@ static void update_pageblock_skip(struct compact_control *cc,

/* Update where async and sync compaction should restart */
if (migrate_scanner) {
- if (cc->finished_update_migrate)
- return;
if (pfn > zone->compact_cached_migrate_pfn[0])
zone->compact_cached_migrate_pfn[0] = pfn;
if (cc->mode != MIGRATE_ASYNC &&
pfn > zone->compact_cached_migrate_pfn[1])
zone->compact_cached_migrate_pfn[1] = pfn;
} else {
- if (cc->finished_update_free)
- return;
if (pfn < zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = pfn;
}
@@ -705,7 +701,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
del_page_from_lru_list(page, lruvec, page_lru(page));

isolate_success:
- cc->finished_update_migrate = true;
list_add(&page->lru, migratelist);
cc->nr_migratepages++;
nr_isolated++;
@@ -876,15 +871,6 @@ static void isolate_freepages(struct compact_control *cc)
block_start_pfn - pageblock_nr_pages;

/*
- * Set a flag that we successfully isolated in this pageblock.
- * In the next loop iteration, zone->compact_cached_free_pfn
- * will not be updated and thus it will effectively contain the
- * highest pageblock we isolated pages from.
- */
- if (isolated)
- cc->finished_update_free = true;
-
- /*
* isolate_freepages_block() might have aborted due to async
* compaction being contended
*/
diff --git a/mm/internal.h b/mm/internal.h
index 3cc9b0a..4928beb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -136,11 +136,6 @@ struct compact_control {
unsigned long migrate_pfn; /* isolate_migratepages search base */
enum migrate_mode mode; /* Async or sync migration mode */
bool ignore_skip_hint; /* Scan blocks even if marked skip */
- bool finished_update_free; /* True when the zone cached pfns are
- * no longer being updated
- */
- bool finished_update_migrate;
-
int order; /* order a direct compactor needs */
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
const int alloc_flags; /* alloc flags of a direct compactor */

Rik van Riel

2014-10-20 15:26:56 UTC

Permalink

Post by Vlastimil Babka
Compaction caches the migration and free scanner positions between compaction
invocations, so that the whole zone gets eventually scanned and there is no
bias towards the initial scanner positions at the beginning/end of the zone.
The cached positions are continuously updated as scanners progress and the
updating stops as soon as a page is successfully isolated. The reasoning
behind this is that a pageblock where isolation succeeded is likely to succeed
again in near future and it should be worth revisiting it.
However, the downside is that potentially many pages are rescanned without
successful isolation. At worst, there might be a page where isolation from LRU
succeeds but migration fails (potentially always). So upon encountering this
page, cached position would always stop being updated for no good reason.
It might have been useful to let such page be rescanned with sync compaction
after async one failed, but this is now handled by caching scanner position
add per-zone migration pfn cache for async compaction").
After this patch, cached positions are updated unconditionally. In
stress-highalloc benchmark, this has decreased the numbers of scanned pages
by few percent, without affecting allocation success rates.

Vlastimil Babka

2014-10-07 15:33:35 UTC

Permalink

Compaction relies on zone watermark checks for decisions such as if it's worth
to start compacting in compaction_suitable() or whether compaction should stop
in compact_finished(). The watermark checks take classzone_idx and alloc_flags
parameters, which are related to the memory allocation request. But from the
context of compaction they are currently passed as 0, including the direct
compaction which is invoked to satisfy the allocation request, and could
therefore know the proper values.

The lack of proper values can lead to mismatch between decisions taken during
compaction and decisions related to the allocation request. Lack of proper
classzone_idx value means that lowmem_reserve is not taken into account.
This has manifested (during recent changes to deferred compaction) when DMA
zone was used as fallback for preferred Normal zone. compaction_suitable()
without proper classzone_idx would think that the watermarks are already
satisfied, but watermark check in get_page_from_freelist() would fail. Because
of this problem, deferring compaction has extra complexity that can be removed
in the following patch.

The issue (not confirmed in practice) with missing alloc_flags is opposite in
nature. For allocations that include ALLOC_HIGH, ALLOC_HIGHER or ALLOC_CMA in
alloc_flags (the last includes all MOVABLE allocations on CMA-enabled systems)
the watermark checking in compaction with 0 passed will be stricter than in
get_page_from_freelist(). In these cases compaction might be running for a
longer time than is really needed.

This patch fixes these problems by adding alloc_flags and classzone_idx to
struct compact_control and related functions involved in direct compaction and
watermark checking. Where possible, all other callers of compaction_suitable()
pass proper values where those are known. This is currently limited to
classzone_idx, which is sometimes known in kswapd context. However, the direct
reclaim callers should_continue_reclaim() and compaction_ready() do not
currently know the proper values, so the coordination between reclaim and
compaction may still not be as accurate as it could. This can be fixed later,
if it's shown to be an issue.

The effect of this patch should be slightly better high-order allocation
success rates and/or less compaction overhead, depending on the type of
allocations and presence of CMA. It allows simplifying deferred compaction
code in a followup patch.

When testing with stress-highalloc, there was some slight improvement (which
might be just due to variance) in success rates of non-THP-like allocations.

Signed-off-by: Vlastimil Babka <***@suse.cz>
Cc: Minchan Kim <***@kernel.org>
Cc: Mel Gorman <***@suse.de>
Cc: Joonsoo Kim <***@lge.com>
Cc: Michal Nazarewicz <***@mina86.com>
Cc: Naoya Horiguchi <n-***@ah.jp.nec.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Rik van Riel <***@redhat.com>
Cc: David Rientjes <***@google.com>
---
include/linux/compaction.h | 8 ++++++--
mm/compaction.c | 29 +++++++++++++++--------------
mm/internal.h | 2 ++
mm/page_alloc.c | 1 +
mm/vmscan.c | 12 ++++++------
5 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 60bdf8d..d896765 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -33,10 +33,12 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
enum migrate_mode mode, int *contended,
+ int alloc_flags, int classzone_idx,
struct zone **candidate_zone);
extern void compact_pgdat(pg_data_t *pgdat, int order);
extern void reset_isolation_suitable(pg_data_t *pgdat);
-extern unsigned long compaction_suitable(struct zone *zone, int order);
+extern unsigned long compaction_suitable(struct zone *zone, int order,
+ int alloc_flags, int classzone_idx);

/* Do not skip compaction more than 64 times */
#define COMPACT_MAX_DEFER_SHIFT 6
@@ -103,6 +105,7 @@ static inline bool compaction_restarting(struct zone *zone, int order)
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
+ int alloc_flags, int classzone_idx,
struct zone **candidate_zone)
{
return COMPACT_CONTINUE;
@@ -116,7 +119,8 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat)
{
}

-static inline unsigned long compaction_suitable(struct zone *zone, int order)
+static inline unsigned long compaction_suitable(struct zone *zone, int order,
+ int alloc_flags, int classzone_idx)
{
return COMPACT_SKIPPED;
}
diff --git a/mm/compaction.c b/mm/compaction.c
index edba18a..dba8891 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1069,9 +1069,9 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,

/* Compaction run is not finished if the watermark is not met */
watermark = low_wmark_pages(zone);
- watermark += (1 << cc->order);

- if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0))
+ if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
+ cc->alloc_flags))
return COMPACT_CONTINUE;

/* Direct compactor: Is a suitable page free? */
@@ -1097,7 +1097,8 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
* COMPACT_PARTIAL - If the allocation would succeed without compaction
* COMPACT_CONTINUE - If compaction should run now
*/
-unsigned long compaction_suitable(struct zone *zone, int order)
+unsigned long compaction_suitable(struct zone *zone, int order,
+ int alloc_flags, int classzone_idx)
{
int fragindex;
unsigned long watermark;
@@ -1134,7 +1135,7 @@ unsigned long compaction_suitable(struct zone *zone, int order)
return COMPACT_SKIPPED;

if (fragindex == -1000 && zone_watermark_ok(zone, order, watermark,
- 0, 0))
+ classzone_idx, alloc_flags))
return COMPACT_PARTIAL;

return COMPACT_CONTINUE;
@@ -1148,7 +1149,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
const bool sync = cc->mode != MIGRATE_ASYNC;

- ret = compaction_suitable(zone, cc->order);
+ ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
+ cc->classzone_idx);
switch (ret) {
case COMPACT_PARTIAL:
case COMPACT_SKIPPED:
@@ -1237,7 +1239,8 @@ out:
}

static unsigned long compact_zone_order(struct zone *zone, int order,
- gfp_t gfp_mask, enum migrate_mode mode, int *contended)
+ gfp_t gfp_mask, enum migrate_mode mode, int *contended,
+ int alloc_flags, int classzone_idx)
{
unsigned long ret;
struct compact_control cc = {
@@ -1247,6 +1250,8 @@ static unsigned long compact_zone_order(struct zone *zone, int order,
.gfp_mask = gfp_mask,
.zone = zone,
.mode = mode,
+ .alloc_flags = alloc_flags,
+ .classzone_idx = classzone_idx,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
@@ -1278,6 +1283,7 @@ int sysctl_extfrag_threshold = 500;
unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
enum migrate_mode mode, int *contended,
+ int alloc_flags, int classzone_idx,
struct zone **candidate_zone)
{
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
@@ -1286,7 +1292,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
struct zoneref *z;
struct zone *zone;
int rc = COMPACT_DEFERRED;
- int alloc_flags = 0;
int all_zones_contended = COMPACT_CONTENDED_LOCK; /* init for &= op */

*contended = COMPACT_CONTENDED_NONE;
@@ -1295,10 +1300,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
if (!order || !may_enter_fs || !may_perform_io)
return COMPACT_SKIPPED;

-#ifdef CONFIG_CMA
- if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
- alloc_flags |= ALLOC_CMA;
-#endif
/* Compact each zone in the list */
for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
nodemask) {
@@ -1309,7 +1310,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
continue;

status = compact_zone_order(zone, order, gfp_mask, mode,
- &zone_contended);
+ &zone_contended, alloc_flags, classzone_idx);
rc = max(status, rc);
/*
* It takes at least one zone that wasn't lock contended
@@ -1318,8 +1319,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
all_zones_contended &= zone_contended;

/* If a normal allocation would succeed, stop compacting */
- if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0,
- alloc_flags)) {
+ if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
+ classzone_idx, alloc_flags)) {
*candidate_zone = zone;
/*
* We think the allocation will succeed in this zone,
diff --git a/mm/internal.h b/mm/internal.h
index 8293040..3cc9b0a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -143,6 +143,8 @@ struct compact_control {

int order; /* order a direct compactor needs */
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
+ const int alloc_flags; /* alloc flags of a direct compactor */
+ const int classzone_idx; /* zone index of a direct compactor */
struct zone *zone;
int contended; /* Signal need_sched() or lock
* contention detected during
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e758159..8d143a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2339,6 +2339,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
compact_result = try_to_compact_pages(zonelist, order, gfp_mask,
nodemask, mode,
contended_compaction,
+ alloc_flags, classzone_idx,
&last_compact_zone);
current->flags &= ~PF_MEMALLOC;

diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb4707..19ba76d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2249,7 +2249,7 @@ static inline bool should_continue_reclaim(struct zone *zone,
return true;

/* If compaction would go ahead or the allocation would succeed, stop */
- switch (compaction_suitable(zone, sc->order)) {
+ switch (compaction_suitable(zone, sc->order, 0, 0)) {
case COMPACT_PARTIAL:
case COMPACT_CONTINUE:
return false;
@@ -2346,7 +2346,7 @@ static inline bool compaction_ready(struct zone *zone, int order)
* If compaction is not ready to start and allocation is not likely
* to succeed without it, then keep reclaiming.
*/
- if (compaction_suitable(zone, order) == COMPACT_SKIPPED)
+ if (compaction_suitable(zone, order, 0, 0) == COMPACT_SKIPPED)
return false;

return watermark_ok;
@@ -2824,8 +2824,8 @@ static bool zone_balanced(struct zone *zone, int order,
balance_gap, classzone_idx, 0))
return false;

- if (IS_ENABLED(CONFIG_COMPACTION) && order &&
- compaction_suitable(zone, order) == COMPACT_SKIPPED)
+ if (IS_ENABLED(CONFIG_COMPACTION) && order && compaction_suitable(zone,
+ order, 0, classzone_idx) == COMPACT_SKIPPED)
return false;

return true;
@@ -2952,8 +2952,8 @@ static bool kswapd_shrink_zone(struct zone *zone,
* from memory. Do not reclaim more than needed for compaction.
*/
if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
- compaction_suitable(zone, sc->order) !=
- COMPACT_SKIPPED)
+ compaction_suitable(zone, sc->order, 0, classzone_idx)
+ != COMPACT_SKIPPED)
testorder = 0;

/*

Rik van Riel

2014-10-20 15:45:14 UTC

Permalink

Post by Vlastimil Babka
Compaction relies on zone watermark checks for decisions such as if it's worth
to start compacting in compaction_suitable() or whether compaction should stop
in compact_finished(). The watermark checks take classzone_idx and alloc_flags
parameters, which are related to the memory allocation request. But from the
context of compaction they are currently passed as 0, including the direct
compaction which is invoked to satisfy the allocation request, and could
therefore know the proper values.
The lack of proper values can lead to mismatch between decisions taken during
compaction and decisions related to the allocation request. Lack of proper
classzone_idx value means that lowmem_reserve is not taken into account.
This has manifested (during recent changes to deferred compaction) when DMA
zone was used as fallback for preferred Normal zone. compaction_suitable()
without proper classzone_idx would think that the watermarks are already
satisfied, but watermark check in get_page_from_freelist() would fail. Because
of this problem, deferring compaction has extra complexity that can be removed
in the following patch.
The issue (not confirmed in practice) with missing alloc_flags is opposite in
nature. For allocations that include ALLOC_HIGH, ALLOC_HIGHER or ALLOC_CMA in
alloc_flags (the last includes all MOVABLE allocations on CMA-enabled systems)
the watermark checking in compaction with 0 passed will be stricter than in
get_page_from_freelist(). In these cases compaction might be running for a
longer time than is really needed.
This patch fixes these problems by adding alloc_flags and classzone_idx to
struct compact_control and related functions involved in direct compaction and
watermark checking. Where possible, all other callers of compaction_suitable()
pass proper values where those are known. This is currently limited to
classzone_idx, which is sometimes known in kswapd context. However, the direct
reclaim callers should_continue_reclaim() and compaction_ready() do not
currently know the proper values, so the coordination between reclaim and
compaction may still not be as accurate as it could. This can be fixed later,
if it's shown to be an issue.
The effect of this patch should be slightly better high-order allocation
success rates and/or less compaction overhead, depending on the type of
allocations and presence of CMA. It allows simplifying deferred compaction
code in a followup patch.
When testing with stress-highalloc, there was some slight improvement (which
might be just due to variance) in success rates of non-THP-like allocations.

Vlastimil Babka

2014-10-07 15:33:39 UTC

Permalink

The goal of memory compaction is to create high-order freepages through page
migration. Page migration however puts pages on the per-cpu lru_add cache,
which is later flushed to per-cpu pcplists, and only after pcplists are
drained the pages can actually merge. This can happen due to the per-cpu
caches becoming full through further freeing, or explicitly.

During direct compaction, it is useful to do the draining explicitly so that
pages merge as soon as possible and compaction can detect success immediately
and keep the latency impact at minimum. However the current implementation is
far from ideal. Draining is done only in __alloc_pages_direct_compact(),
after all zones were already compacted, and the decisions to continue or stop
compaction in individual zones was done without the last batch of migrations
being merged. It is also missing the draining of lru_add cache before the
pcplists.

This patch moves the draining for direct compaction into compact_zone(). It
adds the missing lru_cache draining and uses the newly introduced single zone
pcplists draining to reduce overhead and avoid impact on unrelated zones.
Draining is only performed when it can actually lead to merging of a page of
desired order (passed by cc->order). This means it is only done when migration
occurred in the previously scanned cc->order aligned block(s) and the
migration scanner is now pointing to the next cc->order aligned block.

The patch has been tested with stress-highalloc benchmark from mmtests.
Although overal allocation success rates of the benchmark were not affected,
the number of detected compaction successes has doubled. This suggests that
allocations were previously successful due to implicit merging caused by
background activity, making a later allocation attempt succeed immediately,
but not attributing the success to compaction. Since stress-highalloc always
tries to allocate almost the whole memory, it cannot show the improvement in
its reported success rate metric. However after this patch, compaction should
detect success and terminate earlier, reducing the direct compaction latencies
in a real scenario.

Signed-off-by: Vlastimil Babka <***@suse.cz>
Cc: Minchan Kim <***@kernel.org>
Cc: Mel Gorman <***@suse.de>
Cc: Joonsoo Kim <***@lge.com>
Cc: Michal Nazarewicz <***@mina86.com>
Cc: Naoya Horiguchi <n-***@ah.jp.nec.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Rik van Riel <***@redhat.com>
Cc: David Rientjes <***@google.com>
---
mm/compaction.c | 41 ++++++++++++++++++++++++++++++++++++++++-
mm/page_alloc.c | 4 ----
2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8fa888d..41b49d7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1179,6 +1179,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
while ((ret = compact_finished(zone, cc, migratetype)) ==
COMPACT_CONTINUE) {
int err;
+ unsigned long last_migrated_pfn = 0;

switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
@@ -1187,7 +1188,12 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
- continue;
+ /*
+ * We haven't isolated and migrated anything, but
+ * there might still be unflushed migrations from
+ * previous cc->order aligned block.
+ */
+ goto check_drain;
case ISOLATE_SUCCESS:
;
}
@@ -1212,6 +1218,39 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
goto out;
}
}
+
+ /*
+ * Record where we have freed pages by migration and not yet
+ * flushed them to buddy allocator. Subtract 1, because often
+ * we finish a pageblock and migrate_pfn points to the first
+ * page* of the next one. In that case we want the drain below
+ * to happen immediately.
+ */
+ if (!last_migrated_pfn)
+ last_migrated_pfn = cc->migrate_pfn - 1;
+
+check_drain:
+ /*
+ * Have we moved away from the previous cc->order aligned block
+ * where we migrated from? If yes, flush the pages that were
+ * freed, so that they can merge and compact_finished() can
+ * detect immediately if allocation should succeed.
+ */
+ if (cc->order > 0 && last_migrated_pfn) {
+ int cpu;
+ unsigned long current_block_start =
+ cc->migrate_pfn & ~((1UL << cc->order) - 1);
+
+ if (last_migrated_pfn < current_block_start) {
+ cpu = get_cpu();
+ lru_add_drain_cpu(cpu);
+ drain_local_pages(zone);
+ put_cpu();
+ /* No more flushing until we migrate again */
+ last_migrated_pfn = 0;
+ }
+ }
+
}

out:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5a4506f..14fac43 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2357,10 +2357,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
*/
count_vm_event(COMPACTSTALL);

- /* Page migration frees to the PCP lists but we want merging */
- drain_pages(get_cpu());
- put_cpu();
-
page = get_page_from_freelist(gfp_mask, nodemask,
order, zonelist, high_zoneidx,
alloc_flags & ~ALLOC_NO_WATERMARKS,

Rik van Riel

2014-10-20 15:44:18 UTC

Permalink

Post by Vlastimil Babka
The goal of memory compaction is to create high-order freepages through page
migration. Page migration however puts pages on the per-cpu lru_add cache,
which is later flushed to per-cpu pcplists, and only after pcplists are
drained the pages can actually merge. This can happen due to the per-cpu
caches becoming full through further freeing, or explicitly.
During direct compaction, it is useful to do the draining explicitly so that
pages merge as soon as possible and compaction can detect success immediately
and keep the latency impact at minimum. However the current implementation is
far from ideal. Draining is done only in __alloc_pages_direct_compact(),
after all zones were already compacted, and the decisions to continue or stop
compaction in individual zones was done without the last batch of migrations
being merged. It is also missing the draining of lru_add cache before the
pcplists.
This patch moves the draining for direct compaction into compact_zone(). It
adds the missing lru_cache draining and uses the newly introduced single zone
pcplists draining to reduce overhead and avoid impact on unrelated zones.
Draining is only performed when it can actually lead to merging of a page of
desired order (passed by cc->order). This means it is only done when migration
occurred in the previously scanned cc->order aligned block(s) and the
migration scanner is now pointing to the next cc->order aligned block.

Continue reading on narkive:

Search results for '[PATCH 0/5] Further compaction tuning' (Questions and Answers)

replies

my piano got tuned the other day but it sounds weird?

started 2009-08-11 15:34:48 UTC

classical