ddafa09
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
ddafa09
From: Daniel Axtens <dja@axtens.net>
ddafa09
Date: Wed, 15 Apr 2020 23:28:29 +1000
ddafa09
Subject: [PATCH] ieee1275: claim more memory
ddafa09
ddafa09
On powerpc-ieee1275, we are running out of memory trying to verify
ddafa09
anything. This is because:
ddafa09
ddafa09
 - we have to load an entire file into memory to verify it. This is
ddafa09
   extremely difficult to change with appended signatures.
ddafa09
 - We only have 32MB of heap.
ddafa09
 - Distro kernels are now often around 30MB.
ddafa09
ddafa09
So we want to claim more memory from OpenFirmware for our heap.
ddafa09
ddafa09
There are some complications:
ddafa09
ddafa09
 - The grub mm code isn't the only thing that will make claims on
ddafa09
   memory from OpenFirmware:
ddafa09
ddafa09
    * PFW/SLOF will have claimed some for their own use.
ddafa09
ddafa09
    * The ieee1275 loader will try to find other bits of memory that we
ddafa09
      haven't claimed to place the kernel and initrd when we go to boot.
ddafa09
ddafa09
    * Once we load Linux, it will also try to claim memory. It claims
ddafa09
      memory without any reference to /memory/available, it just starts
ddafa09
      at min(top of RMO, 768MB) and works down. So we need to avoid this
ddafa09
      area. See arch/powerpc/kernel/prom_init.c as of v5.11.
ddafa09
ddafa09
 - The smallest amount of memory a ppc64 KVM guest can have is 256MB.
ddafa09
   It doesn't work with distro kernels but can work with custom kernels.
ddafa09
   We should maintain support for that. (ppc32 can boot with even less,
ddafa09
   and we shouldn't break that either.)
ddafa09
ddafa09
 - Even if a VM has more memory, the memory OpenFirmware makes available
ddafa09
   as Real Memory Area can be restricted. A freshly created LPAR on a
ddafa09
   PowerVM machine is likely to have only 256MB available to OpenFirmware
ddafa09
   even if it has many gigabytes of memory allocated.
ddafa09
ddafa09
EFI systems will attempt to allocate 1/4th of the available memory,
ddafa09
clamped to between 1M and 1600M. That seems like a good sort of
ddafa09
approach, we just need to figure out if 1/4 is the right fraction
ddafa09
for us.
ddafa09
ddafa09
We don't know in advance how big the kernel and initrd are going to be,
ddafa09
which makes figuring out how much memory we can take a bit tricky.
ddafa09
ddafa09
To figure out how much memory we should leave unused, I looked at:
ddafa09
ddafa09
 - an Ubuntu 20.04.1 ppc64le pseries KVM guest:
ddafa09
    vmlinux: ~30MB
ddafa09
    initrd:  ~50MB
ddafa09
ddafa09
 - a RHEL8.2 ppc64le pseries KVM guest:
ddafa09
    vmlinux: ~30MB
ddafa09
    initrd:  ~30MB
ddafa09
ddafa09
Ubuntu VMs struggle to boot with just 256MB under SLOF.
ddafa09
RHEL likewise has a higher minimum supported memory figure.
ddafa09
So lets first consider a distro kernel and 512MB of addressible memory.
ddafa09
(This is the default case for anything booting under PFW.) Say we lose
ddafa09
131MB to PFW (based on some tests). This leaves us 381MB. 1/4 of 381MB
ddafa09
is ~95MB. That should be enough to verify a 30MB vmlinux and should
ddafa09
leave plenty of space to load Linux and the initrd.
ddafa09
ddafa09
If we consider 256MB of RMA under PFW, we have just 125MB remaining. 1/4
ddafa09
of that is a smidge under 32MB, which gives us very poor odds of verifying
ddafa09
a distro-sized kernel. However, if we need 80MB just to put the kernel
ddafa09
and initrd in memory, we can't claim any more than 45MB anyway. So 1/4
ddafa09
will do. We'll come back to this later.
ddafa09
ddafa09
grub is always built as a 32-bit binary, even if it's loading a ppc64
ddafa09
kernel. So we can't address memory beyond 4GB. This gives a natural cap
ddafa09
of 1GB for powerpc-ieee1275.
ddafa09
ddafa09
Also apply this 1/4 approach to i386-ieee1275, but keep the 32MB cap.
ddafa09
ddafa09
make check still works for both i386 and powerpc and I've booted
ddafa09
powerpc grub with this change under SLOF and PFW.
ddafa09
ddafa09
Signed-off-by: Daniel Axtens <dja@axtens.net>
ddafa09
---
ddafa09
 grub-core/kern/ieee1275/init.c | 81 +++++++++++++++++++++++++++++++++---------
ddafa09
 docs/grub-dev.texi             |  6 ++--
ddafa09
 2 files changed, 69 insertions(+), 18 deletions(-)
ddafa09
ddafa09
diff --git a/grub-core/kern/ieee1275/init.c b/grub-core/kern/ieee1275/init.c
ddafa09
index 0dcd114ce54..c61d91a0285 100644
ddafa09
--- a/grub-core/kern/ieee1275/init.c
ddafa09
+++ b/grub-core/kern/ieee1275/init.c
ddafa09
@@ -46,11 +46,12 @@
ddafa09
 #endif
ddafa09
 #include <grub/lockdown.h>
ddafa09
 
ddafa09
-/* The maximum heap size we're going to claim */
ddafa09
+/* The maximum heap size we're going to claim. Not used by sparc.
ddafa09
+   We allocate 1/4 of the available memory under 4G, up to this limit. */
ddafa09
 #ifdef __i386__
ddafa09
 #define HEAP_MAX_SIZE		(unsigned long) (64 * 1024 * 1024)
ddafa09
-#else
ddafa09
-#define HEAP_MAX_SIZE		(unsigned long) (32 * 1024 * 1024)
ddafa09
+#else // __powerpc__
ddafa09
+#define HEAP_MAX_SIZE		(unsigned long) (1 * 1024 * 1024 * 1024)
ddafa09
 #endif
ddafa09
 
ddafa09
 extern char _end[];
ddafa09
@@ -147,16 +148,45 @@ grub_claim_heap (void)
ddafa09
 				 + GRUB_KERNEL_MACHINE_STACK_SIZE), 0x200000);
ddafa09
 }
ddafa09
 #else
ddafa09
-/* Helper for grub_claim_heap.  */
ddafa09
+/* Helper for grub_claim_heap on powerpc. */
ddafa09
+static int
ddafa09
+heap_size (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
ddafa09
+	   void *data)
ddafa09
+{
ddafa09
+  grub_uint32_t total = *(grub_uint32_t *)data;
ddafa09
+
ddafa09
+  if (type != GRUB_MEMORY_AVAILABLE)
ddafa09
+    return 0;
ddafa09
+
ddafa09
+  /* Do not consider memory beyond 4GB */
ddafa09
+  if (addr > 0xffffffffUL)
ddafa09
+    return 0;
ddafa09
+
ddafa09
+  if (addr + len > 0xffffffffUL)
ddafa09
+    len = 0xffffffffUL - addr;
ddafa09
+
ddafa09
+  total += len;
ddafa09
+  *(grub_uint32_t *)data = total;
ddafa09
+
ddafa09
+  return 0;
ddafa09
+}
ddafa09
+
ddafa09
 static int
ddafa09
 heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
ddafa09
 	   void *data)
ddafa09
 {
ddafa09
-  unsigned long *total = data;
ddafa09
+  grub_uint32_t total = *(grub_uint32_t *)data;
ddafa09
 
ddafa09
   if (type != GRUB_MEMORY_AVAILABLE)
ddafa09
     return 0;
ddafa09
 
ddafa09
+  /* Do not consider memory beyond 4GB */
ddafa09
+  if (addr > 0xffffffffUL)
ddafa09
+    return 0;
ddafa09
+
ddafa09
+  if (addr + len > 0xffffffffUL)
ddafa09
+    len = 0xffffffffUL - addr;
ddafa09
+
ddafa09
   if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_NO_PRE1_5M_CLAIM))
ddafa09
     {
ddafa09
       if (addr + len <= 0x180000)
ddafa09
@@ -170,10 +200,6 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
ddafa09
     }
ddafa09
   len -= 1; /* Required for some firmware.  */
ddafa09
 
ddafa09
-  /* Never exceed HEAP_MAX_SIZE  */
ddafa09
-  if (*total + len > HEAP_MAX_SIZE)
ddafa09
-    len = HEAP_MAX_SIZE - *total;
ddafa09
-
ddafa09
   /* In theory, firmware should already prevent this from happening by not
ddafa09
      listing our own image in /memory/available.  The check below is intended
ddafa09
      as a safeguard in case that doesn't happen.  However, it doesn't protect
ddafa09
@@ -185,6 +211,18 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
ddafa09
       len = 0;
ddafa09
     }
ddafa09
 
ddafa09
+  /* If this block contains 0x30000000 (768MB), do not claim below that.
ddafa09
+     Linux likes to claim memory at min(RMO top, 768MB) and works down
ddafa09
+     without reference to /memory/available. */
ddafa09
+  if ((addr < 0x30000000) && ((addr + len) > 0x30000000))
ddafa09
+    {
ddafa09
+      len = len - (0x30000000 - addr);
ddafa09
+      addr = 0x30000000;
ddafa09
+    }
ddafa09
+
ddafa09
+  if (len > total)
ddafa09
+    len = total;
ddafa09
+
ddafa09
   if (len)
ddafa09
     {
ddafa09
       grub_err_t err;
ddafa09
@@ -193,10 +231,12 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
ddafa09
       if (err)
ddafa09
 	return err;
ddafa09
       grub_mm_init_region ((void *) (grub_addr_t) addr, len);
ddafa09
+      total -= len;
ddafa09
     }
ddafa09
 
ddafa09
-  *total += len;
ddafa09
-  if (*total >= HEAP_MAX_SIZE)
ddafa09
+  *(grub_uint32_t *)data = total;
ddafa09
+
ddafa09
+  if (total == 0)
ddafa09
     return 1;
ddafa09
 
ddafa09
   return 0;
ddafa09
@@ -205,13 +245,22 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
ddafa09
 static void 
ddafa09
 grub_claim_heap (void)
ddafa09
 {
ddafa09
-  unsigned long total = 0;
ddafa09
+  grub_uint32_t total = 0;
ddafa09
 
ddafa09
   if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_FORCE_CLAIM))
ddafa09
-    heap_init (GRUB_IEEE1275_STATIC_HEAP_START, GRUB_IEEE1275_STATIC_HEAP_LEN,
ddafa09
-	       1, &total);
ddafa09
-  else
ddafa09
-    grub_machine_mmap_iterate (heap_init, &total);
ddafa09
+    {
ddafa09
+      heap_init (GRUB_IEEE1275_STATIC_HEAP_START, GRUB_IEEE1275_STATIC_HEAP_LEN,
ddafa09
+		 1, &total);
ddafa09
+      return;
ddafa09
+    }
ddafa09
+
ddafa09
+  grub_machine_mmap_iterate (heap_size, &total);
ddafa09
+
ddafa09
+  total = total / 4;
ddafa09
+  if (total > HEAP_MAX_SIZE)
ddafa09
+    total = HEAP_MAX_SIZE;
ddafa09
+
ddafa09
+  grub_machine_mmap_iterate (heap_init, &total);
ddafa09
 }
ddafa09
 #endif
ddafa09
 
ddafa09
diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi
13985b0
index 19f708ee662..90083772c8a 100644
ddafa09
--- a/docs/grub-dev.texi
ddafa09
+++ b/docs/grub-dev.texi
ddafa09
@@ -1047,7 +1047,9 @@ space is limited to 4GiB. GRUB allocates pages from EFI for its heap, at most
ddafa09
 1.6 GiB.
ddafa09
 
ddafa09
 On i386-ieee1275 and powerpc-ieee1275 GRUB uses same stack as IEEE1275.
ddafa09
-It allocates at most 32MiB for its heap.
ddafa09
+
ddafa09
+On i386-ieee1275, GRUB allocates at most 32MiB for its heap. On
ddafa09
+powerpc-ieee1275, GRUB allocates up to 1GiB.
ddafa09
 
ddafa09
 On sparc64-ieee1275 stack is 256KiB and heap is 2MiB.
ddafa09
 
ddafa09
@@ -1075,7 +1077,7 @@ In short:
ddafa09
 @item i386-qemu               @tab 60 KiB  @tab < 4 GiB
ddafa09
 @item *-efi                   @tab ?       @tab < 1.6 GiB
ddafa09
 @item i386-ieee1275           @tab ?       @tab < 32 MiB
ddafa09
-@item powerpc-ieee1275        @tab ?       @tab < 32 MiB
ddafa09
+@item powerpc-ieee1275        @tab ?       @tab < 1 GiB
ddafa09
 @item sparc64-ieee1275        @tab 256KiB  @tab 2 MiB
ddafa09
 @item arm-uboot               @tab 256KiB  @tab 2 MiB
ddafa09
 @item mips(el)-qemu_mips      @tab 2MiB    @tab 253 MiB