From 6bdddd7980eac0cc883945d823986f24682ca47a Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Thu, 10 Oct 2019 17:57:49 +0100
Subject: [PATCH 02/11] x86/mm: Don't re-set PGT_pinned on a partially
de-validated page
When unpinning pagetables, if an operation is interrupted,
relinquish_memory() re-sets PGT_pinned so that the un-pin will
pickedup again when the hypercall restarts.
This is appropriate when put_page_and_type_preemptible() returns
-EINTR, which indicates that the page is back in its initial state
(i.e., completely validated). However, for -ERESTART, this leads to a
state where a page has both PGT_pinned and PGT_partial set.
This happens to work at the moment, although it's not really a
"canonical" state; but in subsequent patches, where we need to make a
distinction in handling between PGT_validated and PGT_partial pages,
this causes issues.
Move to a "canonical" state by:
- Only re-setting PGT_pinned on -EINTR
- Re-dropping the refcount held by PGT_pinned on -ERESTART
In the latter case, the PGT_partial bit will be cleared further down
with the rest of the other PGT_partial pages.
While here, clean up some trainling whitespace.
This is part of XSA-299.
Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
xen/arch/x86/domain.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 29f892c04c..8fbecbb169 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -112,7 +112,7 @@ static void play_dead(void)
* this case, heap corruption or #PF can occur (when heap debugging is
* enabled). For example, even printk() can involve tasklet scheduling,
* which touches per-cpu vars.
- *
+ *
* Consider very carefully when adding code to *dead_idle. Most hypervisor
* subsystems are unsafe to call.
*/
@@ -1838,9 +1838,34 @@ static int relinquish_memory(
break;
case -ERESTART:
case -EINTR:
+ /*
+ * -EINTR means PGT_validated has been re-set; re-set
+ * PGT_pinned again so that it gets picked up next time
+ * around.
+ *
+ * -ERESTART, OTOH, means PGT_partial is set instead. Put
+ * it back on the list, but don't set PGT_pinned; the
+ * section below will finish off de-validation. But we do
+ * need to drop the general ref associated with
+ * PGT_pinned, since put_page_and_type_preemptible()
+ * didn't do it.
+ *
+ * NB we can do an ASSERT for PGT_validated, since we
+ * "own" the type ref; but theoretically, the PGT_partial
+ * could be cleared by someone else.
+ */
+ if ( ret == -EINTR )
+ {
+ ASSERT(page->u.inuse.type_info & PGT_validated);
+ set_bit(_PGT_pinned, &page->u.inuse.type_info);
+ }
+ else
+ put_page(page);
+
ret = -ERESTART;
+
+ /* Put the page back on the list and drop the ref we grabbed above */
page_list_add(page, list);
- set_bit(_PGT_pinned, &page->u.inuse.type_info);
put_page(page);
goto out;
default:
@@ -2062,7 +2087,7 @@ void vcpu_kick(struct vcpu *v)
* pending flag. These values may fluctuate (after all, we hold no
* locks) but the key insight is that each change will cause
* evtchn_upcall_pending to be polled.
- *
+ *
* NB2. We save the running flag across the unblock to avoid a needless
* IPI for domains that we IPI'd to unblock.
*/
--
2.23.0