From 6114ffe50f212f8df22a18f7f170dce649613a08 Mon Sep 17 00:00:00 2001
From: Tom Stellard <tstellar@redhat.com>
Date: May 04 2018 04:23:03 +0000
Subject: 5.0.2 Release


---

diff --git a/.gitignore b/.gitignore
index c7607cd..cc9c618 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,3 +34,4 @@
 /llvm-4.0.1.src.tar.xz
 /llvm-5.0.0.src.tar.xz
 /llvm-5.0.1.src.tar.xz
+/llvm-5.0.2.src.tar.xz
diff --git a/0001-Merging-r323155.patch b/0001-Merging-r323155.patch
deleted file mode 100644
index 3b74f8c..0000000
--- a/0001-Merging-r323155.patch
+++ /dev/null
@@ -1,2005 +0,0 @@
-From 54049c8506baee2ef9a1a075b523038f8c463015 Mon Sep 17 00:00:00 2001
-From: Reid Kleckner <rnk@google.com>
-Date: Thu, 1 Feb 2018 21:28:26 +0000
-Subject: [PATCH] Merging r323155:
- ------------------------------------------------------------------------
- r323155 | chandlerc | 2018-01-22 14:05:25 -0800 (Mon, 22 Jan 2018) | 133
- lines
-
-Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
-
-Summary:
-First, we need to explain the core of the vulnerability. Note that this
-is a very incomplete description, please see the Project Zero blog post
-for details:
-https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
-
-The basis for branch target injection is to direct speculative execution
-of the processor to some "gadget" of executable code by poisoning the
-prediction of indirect branches with the address of that gadget. The
-gadget in turn contains an operation that provides a side channel for
-reading data. Most commonly, this will look like a load of secret data
-followed by a branch on the loaded value and then a load of some
-predictable cache line. The attacker then uses timing of the processors
-cache to determine which direction the branch took *in the speculative
-execution*, and in turn what one bit of the loaded value was. Due to the
-nature of these timing side channels and the branch predictor on Intel
-processors, this allows an attacker to leak data only accessible to
-a privileged domain (like the kernel) back into an unprivileged domain.
-
-The goal is simple: avoid generating code which contains an indirect
-branch that could have its prediction poisoned by an attacker. In many
-cases, the compiler can simply use directed conditional branches and
-a small search tree. LLVM already has support for lowering switches in
-this way and the first step of this patch is to disable jump-table
-lowering of switches and introduce a pass to rewrite explicit indirectbr
-sequences into a switch over integers.
-
-However, there is no fully general alternative to indirect calls. We
-introduce a new construct we call a "retpoline" to implement indirect
-calls in a non-speculatable way. It can be thought of loosely as
-a trampoline for indirect calls which uses the RET instruction on x86.
-Further, we arrange for a specific call->ret sequence which ensures the
-processor predicts the return to go to a controlled, known location. The
-retpoline then "smashes" the return address pushed onto the stack by the
-call with the desired target of the original indirect call. The result
-is a predicted return to the next instruction after a call (which can be
-used to trap speculative execution within an infinite loop) and an
-actual indirect branch to an arbitrary address.
-
-On 64-bit x86 ABIs, this is especially easily done in the compiler by
-using a guaranteed scratch register to pass the target into this device.
-For 32-bit ABIs there isn't a guaranteed scratch register and so several
-different retpoline variants are introduced to use a scratch register if
-one is available in the calling convention and to otherwise use direct
-stack push/pop sequences to pass the target address.
-
-This "retpoline" mitigation is fully described in the following blog
-post: https://support.google.com/faqs/answer/7625886
-
-We also support a target feature that disables emission of the retpoline
-thunk by the compiler to allow for custom thunks if users want them.
-These are particularly useful in environments like kernels that
-routinely do hot-patching on boot and want to hot-patch their thunk to
-different code sequences. They can write this custom thunk and use
-`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
-case, on x86-64 thu thunk names must be:
-```
-  __llvm_external_retpoline_r11
-```
-or on 32-bit:
-```
-  __llvm_external_retpoline_eax
-  __llvm_external_retpoline_ecx
-  __llvm_external_retpoline_edx
-  __llvm_external_retpoline_push
-```
-And the target of the retpoline is passed in the named register, or in
-the case of the `push` suffix on the top of the stack via a `pushl`
-instruction.
-
-There is one other important source of indirect branches in x86 ELF
-binaries: the PLT. These patches also include support for LLD to
-generate PLT entries that perform a retpoline-style indirection.
-
-The only other indirect branches remaining that we are aware of are from
-precompiled runtimes (such as crt0.o and similar). The ones we have
-found are not really attackable, and so we have not focused on them
-here, but eventually these runtimes should also be replicated for
-retpoline-ed configurations for completeness.
-
-For kernels or other freestanding or fully static executables, the
-compiler switch `-mretpoline` is sufficient to fully mitigate this
-particular attack. For dynamic executables, you must compile *all*
-libraries with `-mretpoline` and additionally link the dynamic
-executable and all shared libraries with LLD and pass `-z retpolineplt`
-(or use similar functionality from some other linker). We strongly
-recommend also using `-z now` as non-lazy binding allows the
-retpoline-mitigated PLT to be substantially smaller.
-
-When manually apply similar transformations to `-mretpoline` to the
-Linux kernel we observed very small performance hits to applications
-running typical workloads, and relatively minor hits (approximately 2%)
-even for extremely syscall-heavy applications. This is largely due to
-the small number of indirect branches that occur in performance
-sensitive paths of the kernel.
-
-When using these patches on statically linked applications, especially
-C++ applications, you should expect to see a much more dramatic
-performance hit. For microbenchmarks that are switch, indirect-, or
-virtual-call heavy we have seen overheads ranging from 10% to 50%.
-
-However, real-world workloads exhibit substantially lower performance
-impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
-the impact of hot indirect calls (by speculatively promoting them to
-direct calls) and allow optimized search trees to be used to lower
-switches. If you need to deploy these techniques in C++ applications, we
-*strongly* recommend that you ensure all hot call targets are statically
-linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
-tuned servers using all of these techniques saw 5% - 10% overhead from
-the use of retpoline.
-
-We will add detailed documentation covering these components in
-subsequent patches, but wanted to make the core functionality available
-as soon as possible. Happy for more code review, but we'd really like to
-get these patches landed and backported ASAP for obvious reasons. We're
-planning to backport this to both 6.0 and 5.0 release streams and get
-a 5.0 release with just this cherry picked ASAP for distros and vendors.
-
-This patch is the work of a number of people over the past month: Eric, Reid,
-Rui, and myself. I'm mailing it out as a single commit due to the time
-sensitive nature of landing this and the need to backport it. Huge thanks to
-everyone who helped out here, and everyone at Intel who helped out in
-discussions about how to craft this. Also, credit goes to Paul Turner (at
-Google, but not an LLVM contributor) for much of the underlying retpoline
-design.
-
-Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
-
-Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
-
-Differential Revision: https://reviews.llvm.org/D41723
-------------------------------------------------------------------------
-
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@324007 91177308-0d34-0410-b5e6-96231b3b80d8
----
- include/llvm/CodeGen/Passes.h             |   3 +
- include/llvm/CodeGen/TargetPassConfig.h   |   7 +
- include/llvm/InitializePasses.h           |   1 +
- include/llvm/Target/TargetLowering.h      |   2 +-
- include/llvm/Target/TargetSubtargetInfo.h |   3 +
- lib/CodeGen/CMakeLists.txt                |   1 +
- lib/CodeGen/CodeGen.cpp                   |   1 +
- lib/CodeGen/IndirectBrExpandPass.cpp      | 221 ++++++++++++++++++
- lib/CodeGen/TargetPassConfig.cpp          |   3 +
- lib/CodeGen/TargetSubtargetInfo.cpp       |   4 +
- lib/Target/X86/CMakeLists.txt             |   1 +
- lib/Target/X86/X86.h                      |   4 +
- lib/Target/X86/X86.td                     |  21 ++
- lib/Target/X86/X86AsmPrinter.h            |   1 +
- lib/Target/X86/X86FastISel.cpp            |   4 +
- lib/Target/X86/X86FrameLowering.cpp       |   9 +
- lib/Target/X86/X86ISelDAGToDAG.cpp        |   6 +-
- lib/Target/X86/X86ISelLowering.cpp        | 123 ++++++++++
- lib/Target/X86/X86ISelLowering.h          |   6 +
- lib/Target/X86/X86InstrCompiler.td        |  16 +-
- lib/Target/X86/X86InstrControl.td         |  31 ++-
- lib/Target/X86/X86InstrInfo.td            |   2 +
- lib/Target/X86/X86MCInstLower.cpp         |   8 +
- lib/Target/X86/X86RetpolineThunks.cpp     | 276 +++++++++++++++++++++++
- lib/Target/X86/X86Subtarget.cpp           |   2 +
- lib/Target/X86/X86Subtarget.h             |  14 ++
- lib/Target/X86/X86TargetMachine.cpp       |  10 +
- test/CodeGen/X86/O0-pipeline.ll           |   3 +
- test/CodeGen/X86/retpoline-external.ll    | 166 ++++++++++++++
- test/CodeGen/X86/retpoline.ll             | 363 ++++++++++++++++++++++++++++++
- test/Transforms/IndirectBrExpand/basic.ll |  63 ++++++
- tools/opt/opt.cpp                         |   1 +
- 32 files changed, 1364 insertions(+), 12 deletions(-)
- create mode 100644 lib/CodeGen/IndirectBrExpandPass.cpp
- create mode 100644 lib/Target/X86/X86RetpolineThunks.cpp
- create mode 100644 test/CodeGen/X86/retpoline-external.ll
- create mode 100644 test/CodeGen/X86/retpoline.ll
- create mode 100644 test/Transforms/IndirectBrExpand/basic.ll
-
-diff --git a/include/llvm/CodeGen/Passes.h b/include/llvm/CodeGen/Passes.h
-index 96cfce5..7bfe30b 100644
---- a/include/llvm/CodeGen/Passes.h
-+++ b/include/llvm/CodeGen/Passes.h
-@@ -420,6 +420,9 @@ namespace llvm {
-   /// shuffles.
-   FunctionPass *createExpandReductionsPass();
- 
-+  // This pass expands indirectbr instructions.
-+  FunctionPass *createIndirectBrExpandPass();
-+
- } // End llvm namespace
- 
- #endif
-diff --git a/include/llvm/CodeGen/TargetPassConfig.h b/include/llvm/CodeGen/TargetPassConfig.h
-index aaf0ab5..195ddff 100644
---- a/include/llvm/CodeGen/TargetPassConfig.h
-+++ b/include/llvm/CodeGen/TargetPassConfig.h
-@@ -406,6 +406,13 @@ protected:
-   /// immediately before machine code is emitted.
-   virtual void addPreEmitPass() { }
- 
-+  /// Targets may add passes immediately before machine code is emitted in this
-+  /// callback. This is called even later than `addPreEmitPass`.
-+  // FIXME: Rename `addPreEmitPass` to something more sensible given its actual
-+  // position and remove the `2` suffix here as this callback is what
-+  // `addPreEmitPass` *should* be but in reality isn't.
-+  virtual void addPreEmitPass2() {}
-+
-   /// Utilities for targets to add passes to the pass manager.
-   ///
- 
-diff --git a/include/llvm/InitializePasses.h b/include/llvm/InitializePasses.h
-index 39ac464..2718c52 100644
---- a/include/llvm/InitializePasses.h
-+++ b/include/llvm/InitializePasses.h
-@@ -157,6 +157,7 @@ void initializeIVUsersWrapperPassPass(PassRegistry&);
- void initializeIfConverterPass(PassRegistry&);
- void initializeImplicitNullChecksPass(PassRegistry&);
- void initializeIndVarSimplifyLegacyPassPass(PassRegistry&);
-+void initializeIndirectBrExpandPassPass(PassRegistry&);
- void initializeInductiveRangeCheckEliminationPass(PassRegistry&);
- void initializeInferAddressSpacesPass(PassRegistry&);
- void initializeInferFunctionAttrsLegacyPassPass(PassRegistry&);
-diff --git a/include/llvm/Target/TargetLowering.h b/include/llvm/Target/TargetLowering.h
-index 23711d6..da6d1c4 100644
---- a/include/llvm/Target/TargetLowering.h
-+++ b/include/llvm/Target/TargetLowering.h
-@@ -799,7 +799,7 @@ public:
-   }
- 
-   /// Return true if lowering to a jump table is allowed.
--  bool areJTsAllowed(const Function *Fn) const {
-+  virtual bool areJTsAllowed(const Function *Fn) const {
-     if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
-       return false;
- 
-diff --git a/include/llvm/Target/TargetSubtargetInfo.h b/include/llvm/Target/TargetSubtargetInfo.h
-index 9440c56..c561a88 100644
---- a/include/llvm/Target/TargetSubtargetInfo.h
-+++ b/include/llvm/Target/TargetSubtargetInfo.h
-@@ -172,6 +172,9 @@ public:
-   /// \brief True if the subtarget should run the atomic expansion pass.
-   virtual bool enableAtomicExpand() const;
- 
-+  /// True if the subtarget should run the indirectbr expansion pass.
-+  virtual bool enableIndirectBrExpand() const;
-+
-   /// \brief Override generic scheduling policy within a region.
-   ///
-   /// This is a convenient way for targets that don't provide any custom
-diff --git a/lib/CodeGen/CMakeLists.txt b/lib/CodeGen/CMakeLists.txt
-index 7f3c6da..7c118a6 100644
---- a/lib/CodeGen/CMakeLists.txt
-+++ b/lib/CodeGen/CMakeLists.txt
-@@ -34,6 +34,7 @@ add_llvm_library(LLVMCodeGen
-   GlobalMerge.cpp
-   IfConversion.cpp
-   ImplicitNullChecks.cpp
-+  IndirectBrExpandPass.cpp
-   InlineSpiller.cpp
-   InterferenceCache.cpp
-   InterleavedAccessPass.cpp
-diff --git a/lib/CodeGen/CodeGen.cpp b/lib/CodeGen/CodeGen.cpp
-index b7fd45a..8074f47 100644
---- a/lib/CodeGen/CodeGen.cpp
-+++ b/lib/CodeGen/CodeGen.cpp
-@@ -39,6 +39,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
-   initializeGCModuleInfoPass(Registry);
-   initializeIfConverterPass(Registry);
-   initializeImplicitNullChecksPass(Registry);
-+  initializeIndirectBrExpandPassPass(Registry);
-   initializeInterleavedAccessPass(Registry);
-   initializeLiveDebugValuesPass(Registry);
-   initializeLiveDebugVariablesPass(Registry);
-diff --git a/lib/CodeGen/IndirectBrExpandPass.cpp b/lib/CodeGen/IndirectBrExpandPass.cpp
-new file mode 100644
-index 0000000..3adcda9
---- /dev/null
-+++ b/lib/CodeGen/IndirectBrExpandPass.cpp
-@@ -0,0 +1,221 @@
-+//===- IndirectBrExpandPass.cpp - Expand indirectbr to switch -------------===//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+/// \file
-+///
-+/// Implements an expansion pass to turn `indirectbr` instructions in the IR
-+/// into `switch` instructions. This works by enumerating the basic blocks in
-+/// a dense range of integers, replacing each `blockaddr` constant with the
-+/// corresponding integer constant, and then building a switch that maps from
-+/// the integers to the actual blocks. All of the indirectbr instructions in the
-+/// function are redirected to this common switch.
-+///
-+/// While this is generically useful if a target is unable to codegen
-+/// `indirectbr` natively, it is primarily useful when there is some desire to
-+/// get the builtin non-jump-table lowering of a switch even when the input
-+/// source contained an explicit indirect branch construct.
-+///
-+/// Note that it doesn't make any sense to enable this pass unless a target also
-+/// disables jump-table lowering of switches. Doing that is likely to pessimize
-+/// the code.
-+///
-+//===----------------------------------------------------------------------===//
-+
-+#include "llvm/ADT/STLExtras.h"
-+#include "llvm/ADT/Sequence.h"
-+#include "llvm/ADT/SmallVector.h"
-+#include "llvm/CodeGen/TargetPassConfig.h"
-+#include "llvm/Target/TargetSubtargetInfo.h"
-+#include "llvm/IR/BasicBlock.h"
-+#include "llvm/IR/Function.h"
-+#include "llvm/IR/IRBuilder.h"
-+#include "llvm/IR/InstIterator.h"
-+#include "llvm/IR/Instruction.h"
-+#include "llvm/IR/Instructions.h"
-+#include "llvm/Pass.h"
-+#include "llvm/Support/Debug.h"
-+#include "llvm/Support/ErrorHandling.h"
-+#include "llvm/Support/raw_ostream.h"
-+#include "llvm/Target/TargetMachine.h"
-+
-+using namespace llvm;
-+
-+#define DEBUG_TYPE "indirectbr-expand"
-+
-+namespace {
-+
-+class IndirectBrExpandPass : public FunctionPass {
-+  const TargetLowering *TLI = nullptr;
-+
-+public:
-+  static char ID; // Pass identification, replacement for typeid
-+
-+  IndirectBrExpandPass() : FunctionPass(ID) {
-+    initializeIndirectBrExpandPassPass(*PassRegistry::getPassRegistry());
-+  }
-+
-+  bool runOnFunction(Function &F) override;
-+};
-+
-+} // end anonymous namespace
-+
-+char IndirectBrExpandPass::ID = 0;
-+
-+INITIALIZE_PASS(IndirectBrExpandPass, DEBUG_TYPE,
-+                "Expand indirectbr instructions", false, false)
-+
-+FunctionPass *llvm::createIndirectBrExpandPass() {
-+  return new IndirectBrExpandPass();
-+}
-+
-+bool IndirectBrExpandPass::runOnFunction(Function &F) {
-+  auto &DL = F.getParent()->getDataLayout();
-+  auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
-+  if (!TPC)
-+    return false;
-+
-+  auto &TM = TPC->getTM<TargetMachine>();
-+  auto &STI = *TM.getSubtargetImpl(F);
-+  if (!STI.enableIndirectBrExpand())
-+    return false;
-+  TLI = STI.getTargetLowering();
-+
-+  SmallVector<IndirectBrInst *, 1> IndirectBrs;
-+
-+  // Set of all potential successors for indirectbr instructions.
-+  SmallPtrSet<BasicBlock *, 4> IndirectBrSuccs;
-+
-+  // Build a list of indirectbrs that we want to rewrite.
-+  for (BasicBlock &BB : F)
-+    if (auto *IBr = dyn_cast<IndirectBrInst>(BB.getTerminator())) {
-+      // Handle the degenerate case of no successors by replacing the indirectbr
-+      // with unreachable as there is no successor available.
-+      if (IBr->getNumSuccessors() == 0) {
-+        (void)new UnreachableInst(F.getContext(), IBr);
-+        IBr->eraseFromParent();
-+        continue;
-+      }
-+
-+      IndirectBrs.push_back(IBr);
-+      for (BasicBlock *SuccBB : IBr->successors())
-+        IndirectBrSuccs.insert(SuccBB);
-+    }
-+
-+  if (IndirectBrs.empty())
-+    return false;
-+
-+  // If we need to replace any indirectbrs we need to establish integer
-+  // constants that will correspond to each of the basic blocks in the function
-+  // whose address escapes. We do that here and rewrite all the blockaddress
-+  // constants to just be those integer constants cast to a pointer type.
-+  SmallVector<BasicBlock *, 4> BBs;
-+
-+  for (BasicBlock &BB : F) {
-+    // Skip blocks that aren't successors to an indirectbr we're going to
-+    // rewrite.
-+    if (!IndirectBrSuccs.count(&BB))
-+      continue;
-+
-+    auto IsBlockAddressUse = [&](const Use &U) {
-+      return isa<BlockAddress>(U.getUser());
-+    };
-+    auto BlockAddressUseIt = llvm::find_if(BB.uses(), IsBlockAddressUse);
-+    if (BlockAddressUseIt == BB.use_end())
-+      continue;
-+
-+    assert(std::find_if(std::next(BlockAddressUseIt), BB.use_end(),
-+                        IsBlockAddressUse) == BB.use_end() &&
-+           "There should only ever be a single blockaddress use because it is "
-+           "a constant and should be uniqued.");
-+
-+    auto *BA = cast<BlockAddress>(BlockAddressUseIt->getUser());
-+
-+    // Skip if the constant was formed but ended up not being used (due to DCE
-+    // or whatever).
-+    if (!BA->isConstantUsed())
-+      continue;
-+
-+    // Compute the index we want to use for this basic block. We can't use zero
-+    // because null can be compared with block addresses.
-+    int BBIndex = BBs.size() + 1;
-+    BBs.push_back(&BB);
-+
-+    auto *ITy = cast<IntegerType>(DL.getIntPtrType(BA->getType()));
-+    ConstantInt *BBIndexC = ConstantInt::get(ITy, BBIndex);
-+
-+    // Now rewrite the blockaddress to an integer constant based on the index.
-+    // FIXME: We could potentially preserve the uses as arguments to inline asm.
-+    // This would allow some uses such as diagnostic information in crashes to
-+    // have higher quality even when this transform is enabled, but would break
-+    // users that round-trip blockaddresses through inline assembly and then
-+    // back into an indirectbr.
-+    BA->replaceAllUsesWith(ConstantExpr::getIntToPtr(BBIndexC, BA->getType()));
-+  }
-+
-+  if (BBs.empty()) {
-+    // There are no blocks whose address is taken, so any indirectbr instruction
-+    // cannot get a valid input and we can replace all of them with unreachable.
-+    for (auto *IBr : IndirectBrs) {
-+      (void)new UnreachableInst(F.getContext(), IBr);
-+      IBr->eraseFromParent();
-+    }
-+    return true;
-+  }
-+
-+  BasicBlock *SwitchBB;
-+  Value *SwitchValue;
-+
-+  // Compute a common integer type across all the indirectbr instructions.
-+  IntegerType *CommonITy = nullptr;
-+  for (auto *IBr : IndirectBrs) {
-+    auto *ITy =
-+        cast<IntegerType>(DL.getIntPtrType(IBr->getAddress()->getType()));
-+    if (!CommonITy || ITy->getBitWidth() > CommonITy->getBitWidth())
-+      CommonITy = ITy;
-+  }
-+
-+  auto GetSwitchValue = [DL, CommonITy](IndirectBrInst *IBr) {
-+    return CastInst::CreatePointerCast(
-+        IBr->getAddress(), CommonITy,
-+        Twine(IBr->getAddress()->getName()) + ".switch_cast", IBr);
-+  };
-+
-+  if (IndirectBrs.size() == 1) {
-+    // If we only have one indirectbr, we can just directly replace it within
-+    // its block.
-+    SwitchBB = IndirectBrs[0]->getParent();
-+    SwitchValue = GetSwitchValue(IndirectBrs[0]);
-+    IndirectBrs[0]->eraseFromParent();
-+  } else {
-+    // Otherwise we need to create a new block to hold the switch across BBs,
-+    // jump to that block instead of each indirectbr, and phi together the
-+    // values for the switch.
-+    SwitchBB = BasicBlock::Create(F.getContext(), "switch_bb", &F);
-+    auto *SwitchPN = PHINode::Create(CommonITy, IndirectBrs.size(),
-+                                     "switch_value_phi", SwitchBB);
-+    SwitchValue = SwitchPN;
-+
-+    // Now replace the indirectbr instructions with direct branches to the
-+    // switch block and fill out the PHI operands.
-+    for (auto *IBr : IndirectBrs) {
-+      SwitchPN->addIncoming(GetSwitchValue(IBr), IBr->getParent());
-+      BranchInst::Create(SwitchBB, IBr);
-+      IBr->eraseFromParent();
-+    }
-+  }
-+
-+  // Now build the switch in the block. The block will have no terminator
-+  // already.
-+  auto *SI = SwitchInst::Create(SwitchValue, BBs[0], BBs.size(), SwitchBB);
-+
-+  // Add a case for each block.
-+  for (int i : llvm::seq<int>(1, BBs.size()))
-+    SI->addCase(ConstantInt::get(CommonITy, i + 1), BBs[i]);
-+
-+  return true;
-+}
-diff --git a/lib/CodeGen/TargetPassConfig.cpp b/lib/CodeGen/TargetPassConfig.cpp
-index 817e58c..624520c 100644
---- a/lib/CodeGen/TargetPassConfig.cpp
-+++ b/lib/CodeGen/TargetPassConfig.cpp
-@@ -790,6 +790,9 @@ void TargetPassConfig::addMachinePasses() {
-   if (EnableMachineOutliner)
-     PM->add(createMachineOutlinerPass());
- 
-+  // Add passes that directly emit MI after all other MI passes.
-+  addPreEmitPass2();
-+
-   AddingMachinePasses = false;
- }
- 
-diff --git a/lib/CodeGen/TargetSubtargetInfo.cpp b/lib/CodeGen/TargetSubtargetInfo.cpp
-index f6d5bc8..d02e39f 100644
---- a/lib/CodeGen/TargetSubtargetInfo.cpp
-+++ b/lib/CodeGen/TargetSubtargetInfo.cpp
-@@ -37,6 +37,10 @@ bool TargetSubtargetInfo::enableAtomicExpand() const {
-   return true;
- }
- 
-+bool TargetSubtargetInfo::enableIndirectBrExpand() const {
-+  return false;
-+}
-+
- bool TargetSubtargetInfo::enableMachineScheduler() const {
-   return false;
- }
-diff --git a/lib/Target/X86/CMakeLists.txt b/lib/Target/X86/CMakeLists.txt
-index 6e08d4c..ae58dbd 100644
---- a/lib/Target/X86/CMakeLists.txt
-+++ b/lib/Target/X86/CMakeLists.txt
-@@ -57,6 +57,7 @@ set(sources
-   X86OptimizeLEAs.cpp
-   X86PadShortFunction.cpp
-   X86RegisterInfo.cpp
-+  X86RetpolineThunks.cpp
-   X86SelectionDAGInfo.cpp
-   X86ShuffleDecodeConstantPool.cpp
-   X86Subtarget.cpp
-diff --git a/lib/Target/X86/X86.h b/lib/Target/X86/X86.h
-index 91201d1..25e4b89 100644
---- a/lib/Target/X86/X86.h
-+++ b/lib/Target/X86/X86.h
-@@ -22,6 +22,7 @@ namespace llvm {
- class FunctionPass;
- class ImmutablePass;
- class InstructionSelector;
-+class ModulePass;
- class PassRegistry;
- class X86RegisterBankInfo;
- class X86Subtarget;
-@@ -98,6 +99,9 @@ void initializeFixupBWInstPassPass(PassRegistry &);
- /// encoding when possible in order to reduce code size.
- FunctionPass *createX86EvexToVexInsts();
- 
-+/// This pass creates the thunks for the retpoline feature.
-+ModulePass *createX86RetpolineThunksPass();
-+
- InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
-                                                   X86Subtarget &,
-                                                   X86RegisterBankInfo &);
-diff --git a/lib/Target/X86/X86.td b/lib/Target/X86/X86.td
-index 54eabea..62543b0 100644
---- a/lib/Target/X86/X86.td
-+++ b/lib/Target/X86/X86.td
-@@ -290,6 +290,27 @@ def FeatureERMSB
-           "ermsb", "HasERMSB", "true",
-           "REP MOVS/STOS are fast">;
- 
-+// Enable mitigation of some aspects of speculative execution related
-+// vulnerabilities by removing speculatable indirect branches. This disables
-+// jump-table formation, rewrites explicit `indirectbr` instructions into
-+// `switch` instructions, and uses a special construct called a "retpoline" to
-+// prevent speculation of the remaining indirect branches (indirect calls and
-+// tail calls).
-+def FeatureRetpoline
-+    : SubtargetFeature<"retpoline", "UseRetpoline", "true",
-+                       "Remove speculation of indirect branches from the "
-+                       "generated code, either by avoiding them entirely or "
-+                       "lowering them with a speculation blocking construct.">;
-+
-+// Rely on external thunks for the emitted retpoline calls. This allows users
-+// to provide their own custom thunk definitions in highly specialized
-+// environments such as a kernel that does boot-time hot patching.
-+def FeatureRetpolineExternalThunk
-+    : SubtargetFeature<
-+          "retpoline-external-thunk", "UseRetpolineExternalThunk", "true",
-+          "Enable retpoline, but with an externally provided thunk.",
-+          [FeatureRetpoline]>;
-+
- //===----------------------------------------------------------------------===//
- // X86 processors supported.
- //===----------------------------------------------------------------------===//
-diff --git a/lib/Target/X86/X86AsmPrinter.h b/lib/Target/X86/X86AsmPrinter.h
-index d7c3b74..3a31bfa 100644
---- a/lib/Target/X86/X86AsmPrinter.h
-+++ b/lib/Target/X86/X86AsmPrinter.h
-@@ -30,6 +30,7 @@ class LLVM_LIBRARY_VISIBILITY X86AsmPrinter : public AsmPrinter {
-   StackMaps SM;
-   FaultMaps FM;
-   std::unique_ptr<MCCodeEmitter> CodeEmitter;
-+  bool NeedsRetpoline = false;
- 
-   // This utility class tracks the length of a stackmap instruction's 'shadow'.
-   // It is used by the X86AsmPrinter to ensure that the stackmap shadow
-diff --git a/lib/Target/X86/X86FastISel.cpp b/lib/Target/X86/X86FastISel.cpp
-index 527e5d5..71f30ba 100644
---- a/lib/Target/X86/X86FastISel.cpp
-+++ b/lib/Target/X86/X86FastISel.cpp
-@@ -3161,6 +3161,10 @@ bool X86FastISel::fastLowerCall(CallLoweringInfo &CLI) {
-       (CalledFn && CalledFn->hasFnAttribute("no_caller_saved_registers")))
-     return false;
- 
-+  // Functions using retpoline should use SDISel for calls.
-+  if (Subtarget->useRetpoline())
-+    return false;
-+
-   // Handle only C, fastcc, and webkit_js calling conventions for now.
-   switch (CC) {
-   default: return false;
-diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
-index f294e81..710ffa9 100644
---- a/lib/Target/X86/X86FrameLowering.cpp
-+++ b/lib/Target/X86/X86FrameLowering.cpp
-@@ -742,6 +742,11 @@ void X86FrameLowering::emitStackProbeCall(MachineFunction &MF,
-                                           bool InProlog) const {
-   bool IsLargeCodeModel = MF.getTarget().getCodeModel() == CodeModel::Large;
- 
-+  // FIXME: Add retpoline support and remove this.
-+  if (Is64Bit && IsLargeCodeModel && STI.useRetpoline())
-+    report_fatal_error("Emitting stack probe calls on 64-bit with the large "
-+                       "code model and retpoline not yet implemented.");
-+
-   unsigned CallOp;
-   if (Is64Bit)
-     CallOp = IsLargeCodeModel ? X86::CALL64r : X86::CALL64pcrel32;
-@@ -2337,6 +2342,10 @@ void X86FrameLowering::adjustForSegmentedStacks(
-     // This solution is not perfect, as it assumes that the .rodata section
-     // is laid out within 2^31 bytes of each function body, but this seems
-     // to be sufficient for JIT.
-+    // FIXME: Add retpoline support and remove the error here..
-+    if (STI.useRetpoline())
-+      report_fatal_error("Emitting morestack calls on 64-bit with the large "
-+                         "code model and retpoline not yet implemented.");
-     BuildMI(allocMBB, DL, TII.get(X86::CALL64m))
-         .addReg(X86::RIP)
-         .addImm(0)
-diff --git a/lib/Target/X86/X86ISelDAGToDAG.cpp b/lib/Target/X86/X86ISelDAGToDAG.cpp
-index 8f24f98..41d1a31 100644
---- a/lib/Target/X86/X86ISelDAGToDAG.cpp
-+++ b/lib/Target/X86/X86ISelDAGToDAG.cpp
-@@ -550,11 +550,11 @@ void X86DAGToDAGISel::PreprocessISelDAG() {
-     SDNode *N = &*I++; // Preincrement iterator to avoid invalidation issues.
- 
-     if (OptLevel != CodeGenOpt::None &&
--        // Only does this when target favors doesn't favor register indirect
--        // call.
-+        // Only do this when the target can fold the load into the call or
-+        // jmp.
-+        !Subtarget->useRetpoline() &&
-         ((N->getOpcode() == X86ISD::CALL && !Subtarget->callRegIndirect()) ||
-          (N->getOpcode() == X86ISD::TC_RETURN &&
--          // Only does this if load can be folded into TC_RETURN.
-           (Subtarget->is64Bit() ||
-            !getTargetMachine().isPositionIndependent())))) {
-       /// Also try moving call address load from outside callseq_start to just
-diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
-index 607bc45..2c2294d 100644
---- a/lib/Target/X86/X86ISelLowering.cpp
-+++ b/lib/Target/X86/X86ISelLowering.cpp
-@@ -24994,6 +24994,15 @@ X86TargetLowering::isVectorClearMaskLegal(const SmallVectorImpl<int> &Mask,
-   return isShuffleMaskLegal(Mask, VT);
- }
- 
-+bool X86TargetLowering::areJTsAllowed(const Function *Fn) const {
-+  // If the subtarget is using retpolines, we need to not generate jump tables.
-+  if (Subtarget.useRetpoline())
-+    return false;
-+
-+  // Otherwise, fallback on the generic logic.
-+  return TargetLowering::areJTsAllowed(Fn);
-+}
-+
- //===----------------------------------------------------------------------===//
- //                           X86 Scheduler Hooks
- //===----------------------------------------------------------------------===//
-@@ -26225,6 +26234,115 @@ X86TargetLowering::EmitLoweredTLSCall(MachineInstr &MI,
-   return BB;
- }
- 
-+static unsigned getOpcodeForRetpoline(unsigned RPOpc) {
-+  switch (RPOpc) {
-+  case X86::RETPOLINE_CALL32:
-+    return X86::CALLpcrel32;
-+  case X86::RETPOLINE_CALL64:
-+    return X86::CALL64pcrel32;
-+  case X86::RETPOLINE_TCRETURN32:
-+    return X86::TCRETURNdi;
-+  case X86::RETPOLINE_TCRETURN64:
-+    return X86::TCRETURNdi64;
-+  }
-+  llvm_unreachable("not retpoline opcode");
-+}
-+
-+static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
-+                                      unsigned Reg) {
-+  switch (Reg) {
-+  case 0:
-+    assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
-+    return Subtarget.useRetpolineExternalThunk()
-+               ? "__llvm_external_retpoline_push"
-+               : "__llvm_retpoline_push";
-+  case X86::EAX:
-+    return Subtarget.useRetpolineExternalThunk()
-+               ? "__llvm_external_retpoline_eax"
-+               : "__llvm_retpoline_eax";
-+  case X86::ECX:
-+    return Subtarget.useRetpolineExternalThunk()
-+               ? "__llvm_external_retpoline_ecx"
-+               : "__llvm_retpoline_ecx";
-+  case X86::EDX:
-+    return Subtarget.useRetpolineExternalThunk()
-+               ? "__llvm_external_retpoline_edx"
-+               : "__llvm_retpoline_edx";
-+  case X86::R11:
-+    return Subtarget.useRetpolineExternalThunk()
-+               ? "__llvm_external_retpoline_r11"
-+               : "__llvm_retpoline_r11";
-+  }
-+  llvm_unreachable("unexpected reg for retpoline");
-+}
-+
-+MachineBasicBlock *
-+X86TargetLowering::EmitLoweredRetpoline(MachineInstr &MI,
-+                                        MachineBasicBlock *BB) const {
-+  // Copy the virtual register into the R11 physical register and
-+  // call the retpoline thunk.
-+  DebugLoc DL = MI.getDebugLoc();
-+  const X86InstrInfo *TII = Subtarget.getInstrInfo();
-+  unsigned CalleeVReg = MI.getOperand(0).getReg();
-+  unsigned Opc = getOpcodeForRetpoline(MI.getOpcode());
-+
-+  // Find an available scratch register to hold the callee. On 64-bit, we can
-+  // just use R11, but we scan for uses anyway to ensure we don't generate
-+  // incorrect code. On 32-bit, we use one of EAX, ECX, or EDX that isn't
-+  // already a register use operand to the call to hold the callee. If none
-+  // are available, push the callee instead. This is less efficient, but is
-+  // necessary for functions using 3 regparms. Such function calls are
-+  // (currently) not eligible for tail call optimization, because there is no
-+  // scratch register available to hold the address of the callee.
-+  SmallVector<unsigned, 3> AvailableRegs;
-+  if (Subtarget.is64Bit())
-+    AvailableRegs.push_back(X86::R11);
-+  else
-+    AvailableRegs.append({X86::EAX, X86::ECX, X86::EDX});
-+
-+  // Zero out any registers that are already used.
-+  for (const auto &MO : MI.operands()) {
-+    if (MO.isReg() && MO.isUse())
-+      for (unsigned &Reg : AvailableRegs)
-+        if (Reg == MO.getReg())
-+          Reg = 0;
-+  }
-+
-+  // Choose the first remaining non-zero available register.
-+  unsigned AvailableReg = 0;
-+  for (unsigned MaybeReg : AvailableRegs) {
-+    if (MaybeReg) {
-+      AvailableReg = MaybeReg;
-+      break;
-+    }
-+  }
-+
-+  const char *Symbol = getRetpolineSymbol(Subtarget, AvailableReg);
-+
-+  if (AvailableReg == 0) {
-+    // No register available. Use PUSH. This must not be a tailcall, and this
-+    // must not be x64.
-+    if (Subtarget.is64Bit())
-+      report_fatal_error(
-+          "Cannot make an indirect call on x86-64 using both retpoline and a "
-+          "calling convention that preservers r11");
-+    if (Opc != X86::CALLpcrel32)
-+      report_fatal_error("Cannot make an indirect tail call on x86 using "
-+                         "retpoline without a preserved register");
-+    BuildMI(*BB, MI, DL, TII->get(X86::PUSH32r)).addReg(CalleeVReg);
-+    MI.getOperand(0).ChangeToES(Symbol);
-+    MI.setDesc(TII->get(Opc));
-+  } else {
-+    BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), AvailableReg)
-+        .addReg(CalleeVReg);
-+    MI.getOperand(0).ChangeToES(Symbol);
-+    MI.setDesc(TII->get(Opc));
-+    MachineInstrBuilder(*BB->getParent(), &MI)
-+        .addReg(AvailableReg, RegState::Implicit | RegState::Kill);
-+  }
-+  return BB;
-+}
-+
- MachineBasicBlock *
- X86TargetLowering::emitEHSjLjSetJmp(MachineInstr &MI,
-                                     MachineBasicBlock *MBB) const {
-@@ -26689,6 +26807,11 @@ X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
-   case X86::TLS_base_addr32:
-   case X86::TLS_base_addr64:
-     return EmitLoweredTLSAddr(MI, BB);
-+  case X86::RETPOLINE_CALL32:
-+  case X86::RETPOLINE_CALL64:
-+  case X86::RETPOLINE_TCRETURN32:
-+  case X86::RETPOLINE_TCRETURN64:
-+    return EmitLoweredRetpoline(MI, BB);
-   case X86::CATCHRET:
-     return EmitLoweredCatchRet(MI, BB);
-   case X86::CATCHPAD:
-diff --git a/lib/Target/X86/X86ISelLowering.h b/lib/Target/X86/X86ISelLowering.h
-index dbbc2bb..7eeb153 100644
---- a/lib/Target/X86/X86ISelLowering.h
-+++ b/lib/Target/X86/X86ISelLowering.h
-@@ -986,6 +986,9 @@ namespace llvm {
-     bool isVectorClearMaskLegal(const SmallVectorImpl<int> &Mask,
-                                 EVT VT) const override;
- 
-+    /// Returns true if lowering to a jump table is allowed.
-+    bool areJTsAllowed(const Function *Fn) const override;
-+
-     /// If true, then instruction selection should
-     /// seek to shrink the FP constant of the specified type to a smaller type
-     /// in order to save space and / or reduce runtime.
-@@ -1289,6 +1292,9 @@ namespace llvm {
-     MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,
-                                           MachineBasicBlock *BB) const;
- 
-+    MachineBasicBlock *EmitLoweredRetpoline(MachineInstr &MI,
-+                                            MachineBasicBlock *BB) const;
-+
-     MachineBasicBlock *emitEHSjLjSetJmp(MachineInstr &MI,
-                                         MachineBasicBlock *MBB) const;
- 
-diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
-index d003d02..296ea69 100644
---- a/lib/Target/X86/X86InstrCompiler.td
-+++ b/lib/Target/X86/X86InstrCompiler.td
-@@ -1106,14 +1106,14 @@ def X86tcret_6regs : PatFrag<(ops node:$ptr, node:$off),
- 
- def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
-           (TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,
--          Requires<[Not64BitMode]>;
-+          Requires<[Not64BitMode, NotUseRetpoline]>;
- 
- // FIXME: This is disabled for 32-bit PIC mode because the global base
- // register which is part of the address mode may be assigned a
- // callee-saved register.
- def : Pat<(X86tcret (load addr:$dst), imm:$off),
-           (TCRETURNmi addr:$dst, imm:$off)>,
--          Requires<[Not64BitMode, IsNotPIC]>;
-+          Requires<[Not64BitMode, IsNotPIC, NotUseRetpoline]>;
- 
- def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
-           (TCRETURNdi tglobaladdr:$dst, imm:$off)>,
-@@ -1125,13 +1125,21 @@ def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
- 
- def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
-           (TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,
--          Requires<[In64BitMode]>;
-+          Requires<[In64BitMode, NotUseRetpoline]>;
- 
- // Don't fold loads into X86tcret requiring more than 6 regs.
- // There wouldn't be enough scratch registers for base+index.
- def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),
-           (TCRETURNmi64 addr:$dst, imm:$off)>,
--          Requires<[In64BitMode]>;
-+          Requires<[In64BitMode, NotUseRetpoline]>;
-+
-+def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
-+          (RETPOLINE_TCRETURN64 ptr_rc_tailcall:$dst, imm:$off)>,
-+          Requires<[In64BitMode, UseRetpoline]>;
-+
-+def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
-+          (RETPOLINE_TCRETURN32 ptr_rc_tailcall:$dst, imm:$off)>,
-+          Requires<[Not64BitMode, UseRetpoline]>;
- 
- def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
-           (TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
-diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
-index 4ea223e..f139364 100644
---- a/lib/Target/X86/X86InstrControl.td
-+++ b/lib/Target/X86/X86InstrControl.td
-@@ -211,11 +211,12 @@ let isCall = 1 in
-                       Sched<[WriteJumpLd]>;
-     def CALL32r     : I<0xFF, MRM2r, (outs), (ins GR32:$dst),
-                         "call{l}\t{*}$dst", [(X86call GR32:$dst)], IIC_CALL_RI>,
--                      OpSize32, Requires<[Not64BitMode]>, Sched<[WriteJump]>;
-+                      OpSize32, Requires<[Not64BitMode,NotUseRetpoline]>,
-+                      Sched<[WriteJump]>;
-     def CALL32m     : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),
-                         "call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))],
-                         IIC_CALL_MEM>, OpSize32,
--                      Requires<[Not64BitMode,FavorMemIndirectCall]>,
-+                      Requires<[Not64BitMode,FavorMemIndirectCall,NotUseRetpoline]>,
-                       Sched<[WriteJumpLd]>;
- 
-     let Predicates = [Not64BitMode] in {
-@@ -298,11 +299,12 @@ let isCall = 1, Uses = [RSP], SchedRW = [WriteJump] in {
-   def CALL64r       : I<0xFF, MRM2r, (outs), (ins GR64:$dst),
-                         "call{q}\t{*}$dst", [(X86call GR64:$dst)],
-                         IIC_CALL_RI>,
--                      Requires<[In64BitMode]>;
-+                      Requires<[In64BitMode,NotUseRetpoline]>;
-   def CALL64m       : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),
-                         "call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))],
-                         IIC_CALL_MEM>,
--                      Requires<[In64BitMode,FavorMemIndirectCall]>;
-+                      Requires<[In64BitMode,FavorMemIndirectCall,
-+                                NotUseRetpoline]>;
- 
-   def FARCALL64   : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
-                        "lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>;
-@@ -341,6 +343,27 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
-   }
- }
- 
-+let isPseudo = 1, isCall = 1, isCodeGenOnly = 1,
-+    Uses = [RSP],
-+    usesCustomInserter = 1,
-+    SchedRW = [WriteJump] in {
-+  def RETPOLINE_CALL32 :
-+    PseudoI<(outs), (ins GR32:$dst), [(X86call GR32:$dst)]>,
-+            Requires<[Not64BitMode,UseRetpoline]>;
-+
-+  def RETPOLINE_CALL64 :
-+    PseudoI<(outs), (ins GR64:$dst), [(X86call GR64:$dst)]>,
-+            Requires<[In64BitMode,UseRetpoline]>;
-+
-+  // Retpoline variant of indirect tail calls.
-+  let isTerminator = 1, isReturn = 1, isBarrier = 1 in {
-+    def RETPOLINE_TCRETURN64 :
-+      PseudoI<(outs), (ins GR64:$dst, i32imm:$offset), []>;
-+    def RETPOLINE_TCRETURN32 :
-+      PseudoI<(outs), (ins GR32:$dst, i32imm:$offset), []>;
-+  }
-+}
-+
- // Conditional tail calls are similar to the above, but they are branches
- // rather than barriers, and they use EFLAGS.
- let isCall = 1, isTerminator = 1, isReturn = 1, isBranch = 1,
-diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
-index fab70e9..0ba2d3a 100644
---- a/lib/Target/X86/X86InstrInfo.td
-+++ b/lib/Target/X86/X86InstrInfo.td
-@@ -917,6 +917,8 @@ def HasFastLZCNT : Predicate<"Subtarget->hasFastLZCNT()">;
- def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;
- def HasERMSB : Predicate<"Subtarget->hasERMSB()">;
- def HasMFence    : Predicate<"Subtarget->hasMFence()">;
-+def UseRetpoline : Predicate<"Subtarget->useRetpoline()">;
-+def NotUseRetpoline : Predicate<"!Subtarget->useRetpoline()">;
- 
- //===----------------------------------------------------------------------===//
- // X86 Instruction Format Definitions.
-diff --git a/lib/Target/X86/X86MCInstLower.cpp b/lib/Target/X86/X86MCInstLower.cpp
-index fd2837b..a1dc5f9 100644
---- a/lib/Target/X86/X86MCInstLower.cpp
-+++ b/lib/Target/X86/X86MCInstLower.cpp
-@@ -874,6 +874,10 @@ void X86AsmPrinter::LowerSTATEPOINT(const MachineInstr &MI,
-       // address is to far away. (TODO: support non-relative addressing)
-       break;
-     case MachineOperand::MO_Register:
-+      // FIXME: Add retpoline support and remove this.
-+      if (Subtarget->useRetpoline())
-+        report_fatal_error("Lowering register statepoints with retpoline not "
-+                           "yet implemented.");
-       CallTargetMCOp = MCOperand::createReg(CallTarget.getReg());
-       CallOpcode = X86::CALL64r;
-       break;
-@@ -1028,6 +1032,10 @@ void X86AsmPrinter::LowerPATCHPOINT(const MachineInstr &MI,
- 
-     EmitAndCountInstruction(
-         MCInstBuilder(X86::MOV64ri).addReg(ScratchReg).addOperand(CalleeMCOp));
-+    // FIXME: Add retpoline support and remove this.
-+    if (Subtarget->useRetpoline())
-+      report_fatal_error(
-+          "Lowering patchpoint with retpoline not yet implemented.");
-     EmitAndCountInstruction(MCInstBuilder(X86::CALL64r).addReg(ScratchReg));
-   }
- 
-diff --git a/lib/Target/X86/X86RetpolineThunks.cpp b/lib/Target/X86/X86RetpolineThunks.cpp
-new file mode 100644
-index 0000000..6b4bc8a
---- /dev/null
-+++ b/lib/Target/X86/X86RetpolineThunks.cpp
-@@ -0,0 +1,276 @@
-+//======- X86RetpolineThunks.cpp - Construct retpoline thunks for x86  --=====//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+/// \file
-+///
-+/// Pass that injects an MI thunk implementing a "retpoline". This is
-+/// a RET-implemented trampoline that is used to lower indirect calls in a way
-+/// that prevents speculation on some x86 processors and can be used to mitigate
-+/// security vulnerabilities due to targeted speculative execution and side
-+/// channels such as CVE-2017-5715.
-+///
-+/// TODO(chandlerc): All of this code could use better comments and
-+/// documentation.
-+///
-+//===----------------------------------------------------------------------===//
-+
-+#include "X86.h"
-+#include "X86InstrBuilder.h"
-+#include "X86Subtarget.h"
-+#include "llvm/CodeGen/MachineFunction.h"
-+#include "llvm/CodeGen/MachineInstrBuilder.h"
-+#include "llvm/CodeGen/MachineModuleInfo.h"
-+#include "llvm/CodeGen/Passes.h"
-+#include "llvm/CodeGen/TargetPassConfig.h"
-+#include "llvm/IR/IRBuilder.h"
-+#include "llvm/IR/Instructions.h"
-+#include "llvm/IR/Module.h"
-+#include "llvm/Support/CommandLine.h"
-+#include "llvm/Support/Debug.h"
-+#include "llvm/Support/raw_ostream.h"
-+
-+using namespace llvm;
-+
-+#define DEBUG_TYPE "x86-retpoline-thunks"
-+
-+namespace {
-+class X86RetpolineThunks : public ModulePass {
-+public:
-+  static char ID;
-+
-+  X86RetpolineThunks() : ModulePass(ID) {}
-+
-+  StringRef getPassName() const override { return "X86 Retpoline Thunks"; }
-+
-+  bool runOnModule(Module &M) override;
-+
-+  void getAnalysisUsage(AnalysisUsage &AU) const override {
-+    AU.addRequired<MachineModuleInfo>();
-+    AU.addPreserved<MachineModuleInfo>();
-+  }
-+
-+private:
-+  MachineModuleInfo *MMI;
-+  const TargetMachine *TM;
-+  bool Is64Bit;
-+  const X86Subtarget *STI;
-+  const X86InstrInfo *TII;
-+
-+  Function *createThunkFunction(Module &M, StringRef Name);
-+  void insertRegReturnAddrClobber(MachineBasicBlock &MBB, unsigned Reg);
-+  void insert32BitPushReturnAddrClobber(MachineBasicBlock &MBB);
-+  void createThunk(Module &M, StringRef NameSuffix,
-+                   Optional<unsigned> Reg = None);
-+};
-+
-+} // end anonymous namespace
-+
-+ModulePass *llvm::createX86RetpolineThunksPass() {
-+  return new X86RetpolineThunks();
-+}
-+
-+char X86RetpolineThunks::ID = 0;
-+
-+bool X86RetpolineThunks::runOnModule(Module &M) {
-+  DEBUG(dbgs() << getPassName() << '\n');
-+
-+  auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
-+  assert(TPC && "X86-specific target pass should not be run without a target "
-+                "pass config!");
-+
-+  MMI = &getAnalysis<MachineModuleInfo>();
-+  TM = &TPC->getTM<TargetMachine>();
-+  Is64Bit = TM->getTargetTriple().getArch() == Triple::x86_64;
-+
-+  // Only add a thunk if we have at least one function that has the retpoline
-+  // feature enabled in its subtarget.
-+  // FIXME: Conditionalize on indirect calls so we don't emit a thunk when
-+  // nothing will end up calling it.
-+  // FIXME: It's a little silly to look at every function just to enumerate
-+  // the subtargets, but eventually we'll want to look at them for indirect
-+  // calls, so maybe this is OK.
-+  if (!llvm::any_of(M, [&](const Function &F) {
-+        // Save the subtarget we find for use in emitting the subsequent
-+        // thunk.
-+        STI = &TM->getSubtarget<X86Subtarget>(F);
-+        return STI->useRetpoline() && !STI->useRetpolineExternalThunk();
-+      }))
-+    return false;
-+
-+  // If we have a relevant subtarget, get the instr info as well.
-+  TII = STI->getInstrInfo();
-+
-+  if (Is64Bit) {
-+    // __llvm_retpoline_r11:
-+    //   callq .Lr11_call_target
-+    // .Lr11_capture_spec:
-+    //   pause
-+    //   lfence
-+    //   jmp .Lr11_capture_spec
-+    // .align 16
-+    // .Lr11_call_target:
-+    //   movq %r11, (%rsp)
-+    //   retq
-+
-+    createThunk(M, "r11", X86::R11);
-+  } else {
-+    // For 32-bit targets we need to emit a collection of thunks for various
-+    // possible scratch registers as well as a fallback that is used when
-+    // there are no scratch registers and assumes the retpoline target has
-+    // been pushed.
-+    //   __llvm_retpoline_eax:
-+    //         calll .Leax_call_target
-+    //   .Leax_capture_spec:
-+    //         pause
-+    //         jmp .Leax_capture_spec
-+    //   .align 16
-+    //   .Leax_call_target:
-+    //         movl %eax, (%esp)  # Clobber return addr
-+    //         retl
-+    //
-+    //   __llvm_retpoline_ecx:
-+    //   ... # Same setup
-+    //         movl %ecx, (%esp)
-+    //         retl
-+    //
-+    //   __llvm_retpoline_edx:
-+    //   ... # Same setup
-+    //         movl %edx, (%esp)
-+    //         retl
-+    //
-+    // This last one is a bit more special and so needs a little extra
-+    // handling.
-+    // __llvm_retpoline_push:
-+    //         calll .Lpush_call_target
-+    // .Lpush_capture_spec:
-+    //         pause
-+    //         lfence
-+    //         jmp .Lpush_capture_spec
-+    // .align 16
-+    // .Lpush_call_target:
-+    //         # Clear pause_loop return address.
-+    //         addl $4, %esp
-+    //         # Top of stack words are: Callee, RA. Exchange Callee and RA.
-+    //         pushl 4(%esp)  # Push callee
-+    //         pushl 4(%esp)  # Push RA
-+    //         popl 8(%esp)   # Pop RA to final RA
-+    //         popl (%esp)    # Pop callee to next top of stack
-+    //         retl           # Ret to callee
-+    createThunk(M, "eax", X86::EAX);
-+    createThunk(M, "ecx", X86::ECX);
-+    createThunk(M, "edx", X86::EDX);
-+    createThunk(M, "push");
-+  }
-+
-+  return true;
-+}
-+
-+Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
-+  LLVMContext &Ctx = M.getContext();
-+  auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
-+  Function *F =
-+      Function::Create(Type, GlobalValue::LinkOnceODRLinkage, Name, &M);
-+  F->setVisibility(GlobalValue::HiddenVisibility);
-+  F->setComdat(M.getOrInsertComdat(Name));
-+
-+  // Add Attributes so that we don't create a frame, unwind information, or
-+  // inline.
-+  AttrBuilder B;
-+  B.addAttribute(llvm::Attribute::NoUnwind);
-+  B.addAttribute(llvm::Attribute::Naked);
-+  F->addAttributes(llvm::AttributeList::FunctionIndex, B);
-+
-+  // Populate our function a bit so that we can verify.
-+  BasicBlock *Entry = BasicBlock::Create(Ctx, "entry", F);
-+  IRBuilder<> Builder(Entry);
-+
-+  Builder.CreateRetVoid();
-+  return F;
-+}
-+
-+void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
-+                                                    unsigned Reg) {
-+  const unsigned MovOpc = Is64Bit ? X86::MOV64mr : X86::MOV32mr;
-+  const unsigned SPReg = Is64Bit ? X86::RSP : X86::ESP;
-+  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(MovOpc)), SPReg, false, 0)
-+      .addReg(Reg);
-+}
-+void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
-+    MachineBasicBlock &MBB) {
-+  // The instruction sequence we use to replace the return address without
-+  // a scratch register is somewhat complicated:
-+  //   # Clear capture_spec from return address.
-+  //   addl $4, %esp
-+  //   # Top of stack words are: Callee, RA. Exchange Callee and RA.
-+  //   pushl 4(%esp)  # Push callee
-+  //   pushl 4(%esp)  # Push RA
-+  //   popl 8(%esp)   # Pop RA to final RA
-+  //   popl (%esp)    # Pop callee to next top of stack
-+  //   retl           # Ret to callee
-+  BuildMI(&MBB, DebugLoc(), TII->get(X86::ADD32ri), X86::ESP)
-+      .addReg(X86::ESP)
-+      .addImm(4);
-+  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
-+               false, 4);
-+  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
-+               false, 4);
-+  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
-+               false, 8);
-+  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
-+               false, 0);
-+}
-+
-+void X86RetpolineThunks::createThunk(Module &M, StringRef NameSuffix,
-+                                     Optional<unsigned> Reg) {
-+  Function &F =
-+      *createThunkFunction(M, (Twine("__llvm_retpoline_") + NameSuffix).str());
-+  MachineFunction &MF = MMI->getOrCreateMachineFunction(F);
-+
-+  // Set MF properties. We never use vregs...
-+  MF.getProperties().set(MachineFunctionProperties::Property::NoVRegs);
-+
-+  BasicBlock &OrigEntryBB = F.getEntryBlock();
-+  MachineBasicBlock *Entry = MF.CreateMachineBasicBlock(&OrigEntryBB);
-+  MachineBasicBlock *CaptureSpec = MF.CreateMachineBasicBlock(&OrigEntryBB);
-+  MachineBasicBlock *CallTarget = MF.CreateMachineBasicBlock(&OrigEntryBB);
-+
-+  MF.push_back(Entry);
-+  MF.push_back(CaptureSpec);
-+  MF.push_back(CallTarget);
-+
-+  const unsigned CallOpc = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;
-+  const unsigned RetOpc = Is64Bit ? X86::RETQ : X86::RETL;
-+
-+  BuildMI(Entry, DebugLoc(), TII->get(CallOpc)).addMBB(CallTarget);
-+  Entry->addSuccessor(CallTarget);
-+  Entry->addSuccessor(CaptureSpec);
-+  CallTarget->setHasAddressTaken();
-+
-+  // In the capture loop for speculation, we want to stop the processor from
-+  // speculating as fast as possible. On Intel processors, the PAUSE instruction
-+  // will block speculation without consuming any execution resources. On AMD
-+  // processors, the PAUSE instruction is (essentially) a nop, so we also use an
-+  // LFENCE instruction which they have advised will stop speculation as well
-+  // with minimal resource utilization. We still end the capture with a jump to
-+  // form an infinite loop to fully guarantee that no matter what implementation
-+  // of the x86 ISA, speculating this code path never escapes.
-+  BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::PAUSE));
-+  BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::LFENCE));
-+  BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::JMP_1)).addMBB(CaptureSpec);
-+  CaptureSpec->setHasAddressTaken();
-+  CaptureSpec->addSuccessor(CaptureSpec);
-+
-+  CallTarget->setAlignment(4);
-+  if (Reg) {
-+    insertRegReturnAddrClobber(*CallTarget, *Reg);
-+  } else {
-+    assert(!Is64Bit && "We only support non-reg thunks on 32-bit x86!");
-+    insert32BitPushReturnAddrClobber(*CallTarget);
-+  }
-+  BuildMI(CallTarget, DebugLoc(), TII->get(RetOpc));
-+}
-diff --git a/lib/Target/X86/X86Subtarget.cpp b/lib/Target/X86/X86Subtarget.cpp
-index 24845be..0180090 100644
---- a/lib/Target/X86/X86Subtarget.cpp
-+++ b/lib/Target/X86/X86Subtarget.cpp
-@@ -315,6 +315,8 @@ void X86Subtarget::initializeEnvironment() {
-   HasCLFLUSHOPT = false;
-   HasCLWB = false;
-   IsBTMemSlow = false;
-+  UseRetpoline = false;
-+  UseRetpolineExternalThunk = false;
-   IsPMULLDSlow = false;
-   IsSHLDSlow = false;
-   IsUAMem16Slow = false;
-diff --git a/lib/Target/X86/X86Subtarget.h b/lib/Target/X86/X86Subtarget.h
-index 427a000..614f833 100644
---- a/lib/Target/X86/X86Subtarget.h
-+++ b/lib/Target/X86/X86Subtarget.h
-@@ -297,6 +297,14 @@ protected:
-   /// Processor supports Cache Line Write Back instruction
-   bool HasCLWB;
- 
-+  /// Use a retpoline thunk rather than indirect calls to block speculative
-+  /// execution.
-+  bool UseRetpoline;
-+
-+  /// When using a retpoline thunk, call an externally provided thunk rather
-+  /// than emitting one inside the compiler.
-+  bool UseRetpolineExternalThunk;
-+
-   /// Use software floating point for code generation.
-   bool UseSoftFloat;
- 
-@@ -506,6 +514,8 @@ public:
-   bool hasPKU() const { return HasPKU; }
-   bool hasMPX() const { return HasMPX; }
-   bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }
-+  bool useRetpoline() const { return UseRetpoline; }
-+  bool useRetpolineExternalThunk() const { return UseRetpolineExternalThunk; }
- 
-   bool isXRaySupported() const override { return is64Bit(); }
- 
-@@ -639,6 +649,10 @@ public:
-   /// compiler runtime or math libraries.
-   bool hasSinCos() const;
- 
-+  /// If we are using retpolines, we need to expand indirectbr to avoid it
-+  /// lowering to an actual indirect jump.
-+  bool enableIndirectBrExpand() const override { return useRetpoline(); }
-+
-   /// Enable the MachineScheduler pass for all X86 subtargets.
-   bool enableMachineScheduler() const override { return true; }
- 
-diff --git a/lib/Target/X86/X86TargetMachine.cpp b/lib/Target/X86/X86TargetMachine.cpp
-index 08c2cda..939e447 100644
---- a/lib/Target/X86/X86TargetMachine.cpp
-+++ b/lib/Target/X86/X86TargetMachine.cpp
-@@ -305,6 +305,7 @@ public:
-   void addPreRegAlloc() override;
-   void addPostRegAlloc() override;
-   void addPreEmitPass() override;
-+  void addPreEmitPass2() override;
-   void addPreSched2() override;
- };
- 
-@@ -334,6 +335,11 @@ void X86PassConfig::addIRPasses() {
- 
-   if (TM->getOptLevel() != CodeGenOpt::None)
-     addPass(createInterleavedAccessPass());
-+
-+  // Add passes that handle indirect branch removal and insertion of a retpoline
-+  // thunk. These will be a no-op unless a function subtarget has the retpoline
-+  // feature enabled.
-+  addPass(createIndirectBrExpandPass());
- }
- 
- bool X86PassConfig::addInstSelector() {
-@@ -418,3 +424,7 @@ void X86PassConfig::addPreEmitPass() {
-     addPass(createX86EvexToVexInsts());
-   }
- }
-+
-+void X86PassConfig::addPreEmitPass2() {
-+  addPass(createX86RetpolineThunksPass());
-+}
-diff --git a/test/CodeGen/X86/O0-pipeline.ll b/test/CodeGen/X86/O0-pipeline.ll
-index 5e375cc..f9bd66f 100644
---- a/test/CodeGen/X86/O0-pipeline.ll
-+++ b/test/CodeGen/X86/O0-pipeline.ll
-@@ -25,6 +25,7 @@
- ; CHECK-NEXT:       Inserts calls to mcount-like functions
- ; CHECK-NEXT:       Scalarize Masked Memory Intrinsics
- ; CHECK-NEXT:       Expand reduction intrinsics
-+; CHECK-NEXT:       Expand indirectbr instructions
- ; CHECK-NEXT:     Rewrite Symbols
- ; CHECK-NEXT:     FunctionPass Manager
- ; CHECK-NEXT:       Dominator Tree Construction
-@@ -55,6 +56,8 @@
- ; CHECK-NEXT:       Machine Natural Loop Construction
- ; CHECK-NEXT:       Insert XRay ops
- ; CHECK-NEXT:       Implement the 'patchable-function' attribute
-+; CHECK-NEXT:     X86 Retpoline Thunks
-+; CHECK-NEXT:     FunctionPass Manager
- ; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
- ; CHECK-NEXT:       Machine Optimization Remark Emitter
- ; CHECK-NEXT:       MachineDominator Tree Construction
-diff --git a/test/CodeGen/X86/retpoline-external.ll b/test/CodeGen/X86/retpoline-external.ll
-new file mode 100644
-index 0000000..66d32ba
---- /dev/null
-+++ b/test/CodeGen/X86/retpoline-external.ll
-@@ -0,0 +1,166 @@
-+; RUN: llc -mtriple=x86_64-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64
-+; RUN: llc -mtriple=x86_64-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64FAST
-+
-+; RUN: llc -mtriple=i686-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86
-+; RUN: llc -mtriple=i686-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86FAST
-+
-+declare void @bar(i32)
-+
-+; Test a simple indirect call and tail call.
-+define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
-+entry:
-+  tail call void @bar(i32 %x)
-+  tail call void %fp(i32 %x)
-+  tail call void @bar(i32 %x)
-+  tail call void %fp(i32 %x)
-+  ret void
-+}
-+
-+; X64-LABEL: icall_reg:
-+; X64-DAG:   movq %rdi, %[[fp:[^ ]*]]
-+; X64-DAG:   movl %esi, %[[x:[^ ]*]]
-+; X64:       movl %[[x]], %edi
-+; X64:       callq bar
-+; X64-DAG:   movl %[[x]], %edi
-+; X64-DAG:   movq %[[fp]], %r11
-+; X64:       callq __llvm_external_retpoline_r11
-+; X64:       movl %[[x]], %edi
-+; X64:       callq bar
-+; X64-DAG:   movl %[[x]], %edi
-+; X64-DAG:   movq %[[fp]], %r11
-+; X64:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+
-+; X64FAST-LABEL: icall_reg:
-+; X64FAST:       callq bar
-+; X64FAST:       callq __llvm_external_retpoline_r11
-+; X64FAST:       callq bar
-+; X64FAST:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+
-+; X86-LABEL: icall_reg:
-+; X86-DAG:   movl 12(%esp), %[[fp:[^ ]*]]
-+; X86-DAG:   movl 16(%esp), %[[x:[^ ]*]]
-+; X86:       pushl %[[x]]
-+; X86:       calll bar
-+; X86:       movl %[[fp]], %eax
-+; X86:       pushl %[[x]]
-+; X86:       calll __llvm_external_retpoline_eax
-+; X86:       pushl %[[x]]
-+; X86:       calll bar
-+; X86:       movl %[[fp]], %eax
-+; X86:       pushl %[[x]]
-+; X86:       calll __llvm_external_retpoline_eax
-+; X86-NOT:   # TAILCALL
-+
-+; X86FAST-LABEL: icall_reg:
-+; X86FAST:       calll bar
-+; X86FAST:       calll __llvm_external_retpoline_eax
-+; X86FAST:       calll bar
-+; X86FAST:       calll __llvm_external_retpoline_eax
-+
-+
-+@global_fp = external global void (i32)*
-+
-+; Test an indirect call through a global variable.
-+define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
-+  %fp1 = load void (i32)*, void (i32)** @global_fp
-+  call void %fp1(i32 %x)
-+  %fp2 = load void (i32)*, void (i32)** @global_fp
-+  tail call void %fp2(i32 %x)
-+  ret void
-+}
-+
-+; X64-LABEL: icall_global_fp:
-+; X64-DAG:   movl %edi, %[[x:[^ ]*]]
-+; X64-DAG:   movq global_fp(%rip), %r11
-+; X64:       callq __llvm_external_retpoline_r11
-+; X64-DAG:   movl %[[x]], %edi
-+; X64-DAG:   movq global_fp(%rip), %r11
-+; X64:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+
-+; X64FAST-LABEL: icall_global_fp:
-+; X64FAST:       movq global_fp(%rip), %r11
-+; X64FAST:       callq __llvm_external_retpoline_r11
-+; X64FAST:       movq global_fp(%rip), %r11
-+; X64FAST:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+
-+; X86-LABEL: icall_global_fp:
-+; X86:       movl global_fp, %eax
-+; X86:       pushl 4(%esp)
-+; X86:       calll __llvm_external_retpoline_eax
-+; X86:       addl $4, %esp
-+; X86:       movl global_fp, %eax
-+; X86:       jmp __llvm_external_retpoline_eax # TAILCALL
-+
-+; X86FAST-LABEL: icall_global_fp:
-+; X86FAST:       calll __llvm_external_retpoline_eax
-+; X86FAST:       jmp __llvm_external_retpoline_eax # TAILCALL
-+
-+
-+%struct.Foo = type { void (%struct.Foo*)** }
-+
-+; Test an indirect call through a vtable.
-+define void @vcall(%struct.Foo* %obj) #0 {
-+  %vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
-+  %vptr = load void (%struct.Foo*)**, void (%struct.Foo*)*** %vptr_field
-+  %vslot = getelementptr void(%struct.Foo*)*, void(%struct.Foo*)** %vptr, i32 1
-+  %fp = load void(%struct.Foo*)*, void(%struct.Foo*)** %vslot
-+  tail call void %fp(%struct.Foo* %obj)
-+  tail call void %fp(%struct.Foo* %obj)
-+  ret void
-+}
-+
-+; X64-LABEL: vcall:
-+; X64:       movq %rdi, %[[obj:[^ ]*]]
-+; X64:       movq (%[[obj]]), %[[vptr:[^ ]*]]
-+; X64:       movq 8(%[[vptr]]), %[[fp:[^ ]*]]
-+; X64:       movq %[[fp]], %r11
-+; X64:       callq __llvm_external_retpoline_r11
-+; X64-DAG:   movq %[[obj]], %rdi
-+; X64-DAG:   movq %[[fp]], %r11
-+; X64:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+
-+; X64FAST-LABEL: vcall:
-+; X64FAST:       callq __llvm_external_retpoline_r11
-+; X64FAST:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+
-+; X86-LABEL: vcall:
-+; X86:       movl 8(%esp), %[[obj:[^ ]*]]
-+; X86:       movl (%[[obj]]), %[[vptr:[^ ]*]]
-+; X86:       movl 4(%[[vptr]]), %[[fp:[^ ]*]]
-+; X86:       movl %[[fp]], %eax
-+; X86:       pushl %[[obj]]
-+; X86:       calll __llvm_external_retpoline_eax
-+; X86:       addl $4, %esp
-+; X86:       movl %[[fp]], %eax
-+; X86:       jmp __llvm_external_retpoline_eax # TAILCALL
-+
-+; X86FAST-LABEL: vcall:
-+; X86FAST:       calll __llvm_external_retpoline_eax
-+; X86FAST:       jmp __llvm_external_retpoline_eax # TAILCALL
-+
-+
-+declare void @direct_callee()
-+
-+define void @direct_tail() #0 {
-+  tail call void @direct_callee()
-+  ret void
-+}
-+
-+; X64-LABEL: direct_tail:
-+; X64:       jmp direct_callee # TAILCALL
-+; X64FAST-LABEL: direct_tail:
-+; X64FAST:   jmp direct_callee # TAILCALL
-+; X86-LABEL: direct_tail:
-+; X86:       jmp direct_callee # TAILCALL
-+; X86FAST-LABEL: direct_tail:
-+; X86FAST:   jmp direct_callee # TAILCALL
-+
-+
-+; Lastly check that no thunks were emitted.
-+; X64-NOT: __{{.*}}_retpoline_{{.*}}:
-+; X64FAST-NOT: __{{.*}}_retpoline_{{.*}}:
-+; X86-NOT: __{{.*}}_retpoline_{{.*}}:
-+; X86FAST-NOT: __{{.*}}_retpoline_{{.*}}:
-+
-+
-+attributes #0 = { "target-features"="+retpoline-external-thunk" }
-diff --git a/test/CodeGen/X86/retpoline.ll b/test/CodeGen/X86/retpoline.ll
-new file mode 100644
-index 0000000..b0d4c85
---- /dev/null
-+++ b/test/CodeGen/X86/retpoline.ll
-@@ -0,0 +1,363 @@
-+; RUN: llc -mtriple=x86_64-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64
-+; RUN: llc -mtriple=x86_64-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64FAST
-+
-+; RUN: llc -mtriple=i686-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86
-+; RUN: llc -mtriple=i686-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86FAST
-+
-+declare void @bar(i32)
-+
-+; Test a simple indirect call and tail call.
-+define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
-+entry:
-+  tail call void @bar(i32 %x)
-+  tail call void %fp(i32 %x)
-+  tail call void @bar(i32 %x)
-+  tail call void %fp(i32 %x)
-+  ret void
-+}
-+
-+; X64-LABEL: icall_reg:
-+; X64-DAG:   movq %rdi, %[[fp:[^ ]*]]
-+; X64-DAG:   movl %esi, %[[x:[^ ]*]]
-+; X64:       movl %[[x]], %edi
-+; X64:       callq bar
-+; X64-DAG:   movl %[[x]], %edi
-+; X64-DAG:   movq %[[fp]], %r11
-+; X64:       callq __llvm_retpoline_r11
-+; X64:       movl %[[x]], %edi
-+; X64:       callq bar
-+; X64-DAG:   movl %[[x]], %edi
-+; X64-DAG:   movq %[[fp]], %r11
-+; X64:       jmp __llvm_retpoline_r11 # TAILCALL
-+
-+; X64FAST-LABEL: icall_reg:
-+; X64FAST:       callq bar
-+; X64FAST:       callq __llvm_retpoline_r11
-+; X64FAST:       callq bar
-+; X64FAST:       jmp __llvm_retpoline_r11 # TAILCALL
-+
-+; X86-LABEL: icall_reg:
-+; X86-DAG:   movl 12(%esp), %[[fp:[^ ]*]]
-+; X86-DAG:   movl 16(%esp), %[[x:[^ ]*]]
-+; X86:       pushl %[[x]]
-+; X86:       calll bar
-+; X86:       movl %[[fp]], %eax
-+; X86:       pushl %[[x]]
-+; X86:       calll __llvm_retpoline_eax
-+; X86:       pushl %[[x]]
-+; X86:       calll bar
-+; X86:       movl %[[fp]], %eax
-+; X86:       pushl %[[x]]
-+; X86:       calll __llvm_retpoline_eax
-+; X86-NOT:   # TAILCALL
-+
-+; X86FAST-LABEL: icall_reg:
-+; X86FAST:       calll bar
-+; X86FAST:       calll __llvm_retpoline_eax
-+; X86FAST:       calll bar
-+; X86FAST:       calll __llvm_retpoline_eax
-+
-+
-+@global_fp = external global void (i32)*
-+
-+; Test an indirect call through a global variable.
-+define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
-+  %fp1 = load void (i32)*, void (i32)** @global_fp
-+  call void %fp1(i32 %x)
-+  %fp2 = load void (i32)*, void (i32)** @global_fp
-+  tail call void %fp2(i32 %x)
-+  ret void
-+}
-+
-+; X64-LABEL: icall_global_fp:
-+; X64-DAG:   movl %edi, %[[x:[^ ]*]]
-+; X64-DAG:   movq global_fp(%rip), %r11
-+; X64:       callq __llvm_retpoline_r11
-+; X64-DAG:   movl %[[x]], %edi
-+; X64-DAG:   movq global_fp(%rip), %r11
-+; X64:       jmp __llvm_retpoline_r11 # TAILCALL
-+
-+; X64FAST-LABEL: icall_global_fp:
-+; X64FAST:       movq global_fp(%rip), %r11
-+; X64FAST:       callq __llvm_retpoline_r11
-+; X64FAST:       movq global_fp(%rip), %r11
-+; X64FAST:       jmp __llvm_retpoline_r11 # TAILCALL
-+
-+; X86-LABEL: icall_global_fp:
-+; X86:       movl global_fp, %eax
-+; X86:       pushl 4(%esp)
-+; X86:       calll __llvm_retpoline_eax
-+; X86:       addl $4, %esp
-+; X86:       movl global_fp, %eax
-+; X86:       jmp __llvm_retpoline_eax # TAILCALL
-+
-+; X86FAST-LABEL: icall_global_fp:
-+; X86FAST:       calll __llvm_retpoline_eax
-+; X86FAST:       jmp __llvm_retpoline_eax # TAILCALL
-+
-+
-+%struct.Foo = type { void (%struct.Foo*)** }
-+
-+; Test an indirect call through a vtable.
-+define void @vcall(%struct.Foo* %obj) #0 {
-+  %vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
-+  %vptr = load void (%struct.Foo*)**, void (%struct.Foo*)*** %vptr_field
-+  %vslot = getelementptr void(%struct.Foo*)*, void(%struct.Foo*)** %vptr, i32 1
-+  %fp = load void(%struct.Foo*)*, void(%struct.Foo*)** %vslot
-+  tail call void %fp(%struct.Foo* %obj)
-+  tail call void %fp(%struct.Foo* %obj)
-+  ret void
-+}
-+
-+; X64-LABEL: vcall:
-+; X64:       movq %rdi, %[[obj:[^ ]*]]
-+; X64:       movq (%[[obj]]), %[[vptr:[^ ]*]]
-+; X64:       movq 8(%[[vptr]]), %[[fp:[^ ]*]]
-+; X64:       movq %[[fp]], %r11
-+; X64:       callq __llvm_retpoline_r11
-+; X64-DAG:   movq %[[obj]], %rdi
-+; X64-DAG:   movq %[[fp]], %r11
-+; X64:       jmp __llvm_retpoline_r11 # TAILCALL
-+
-+; X64FAST-LABEL: vcall:
-+; X64FAST:       callq __llvm_retpoline_r11
-+; X64FAST:       jmp __llvm_retpoline_r11 # TAILCALL
-+
-+; X86-LABEL: vcall:
-+; X86:       movl 8(%esp), %[[obj:[^ ]*]]
-+; X86:       movl (%[[obj]]), %[[vptr:[^ ]*]]
-+; X86:       movl 4(%[[vptr]]), %[[fp:[^ ]*]]
-+; X86:       movl %[[fp]], %eax
-+; X86:       pushl %[[obj]]
-+; X86:       calll __llvm_retpoline_eax
-+; X86:       addl $4, %esp
-+; X86:       movl %[[fp]], %eax
-+; X86:       jmp __llvm_retpoline_eax # TAILCALL
-+
-+; X86FAST-LABEL: vcall:
-+; X86FAST:       calll __llvm_retpoline_eax
-+; X86FAST:       jmp __llvm_retpoline_eax # TAILCALL
-+
-+
-+declare void @direct_callee()
-+
-+define void @direct_tail() #0 {
-+  tail call void @direct_callee()
-+  ret void
-+}
-+
-+; X64-LABEL: direct_tail:
-+; X64:       jmp direct_callee # TAILCALL
-+; X64FAST-LABEL: direct_tail:
-+; X64FAST:   jmp direct_callee # TAILCALL
-+; X86-LABEL: direct_tail:
-+; X86:       jmp direct_callee # TAILCALL
-+; X86FAST-LABEL: direct_tail:
-+; X86FAST:   jmp direct_callee # TAILCALL
-+
-+
-+declare void @nonlazybind_callee() #1
-+
-+define void @nonlazybind_caller() #0 {
-+  call void @nonlazybind_callee()
-+  tail call void @nonlazybind_callee()
-+  ret void
-+}
-+
-+; nonlazybind wasn't implemented in LLVM 5.0, so this looks the same as direct.
-+; X64-LABEL: nonlazybind_caller:
-+; X64:       callq nonlazybind_callee
-+; X64:       jmp nonlazybind_callee # TAILCALL
-+; X64FAST-LABEL: nonlazybind_caller:
-+; X64FAST:   callq nonlazybind_callee
-+; X64FAST:   jmp nonlazybind_callee # TAILCALL
-+; X86-LABEL: nonlazybind_caller:
-+; X86:       calll nonlazybind_callee
-+; X86:       jmp nonlazybind_callee # TAILCALL
-+; X86FAST-LABEL: nonlazybind_caller:
-+; X86FAST:   calll nonlazybind_callee
-+; X86FAST:   jmp nonlazybind_callee # TAILCALL
-+
-+
-+@indirectbr_rewrite.targets = constant [10 x i8*] [i8* blockaddress(@indirectbr_rewrite, %bb0),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb1),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb2),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb3),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb4),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb5),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb6),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb7),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb8),
-+                                                   i8* blockaddress(@indirectbr_rewrite, %bb9)]
-+
-+; Check that when retpolines are enabled a function with indirectbr gets
-+; rewritten to use switch, and that in turn doesn't get lowered as a jump
-+; table.
-+define void @indirectbr_rewrite(i64* readonly %p, i64* %sink) #0 {
-+; X64-LABEL: indirectbr_rewrite:
-+; X64-NOT:     jmpq
-+; X86-LABEL: indirectbr_rewrite:
-+; X86-NOT:     jmpl
-+entry:
-+  %i0 = load i64, i64* %p
-+  %target.i0 = getelementptr [10 x i8*], [10 x i8*]* @indirectbr_rewrite.targets, i64 0, i64 %i0
-+  %target0 = load i8*, i8** %target.i0
-+  indirectbr i8* %target0, [label %bb1, label %bb3]
-+
-+bb0:
-+  store volatile i64 0, i64* %sink
-+  br label %latch
-+
-+bb1:
-+  store volatile i64 1, i64* %sink
-+  br label %latch
-+
-+bb2:
-+  store volatile i64 2, i64* %sink
-+  br label %latch
-+
-+bb3:
-+  store volatile i64 3, i64* %sink
-+  br label %latch
-+
-+bb4:
-+  store volatile i64 4, i64* %sink
-+  br label %latch
-+
-+bb5:
-+  store volatile i64 5, i64* %sink
-+  br label %latch
-+
-+bb6:
-+  store volatile i64 6, i64* %sink
-+  br label %latch
-+
-+bb7:
-+  store volatile i64 7, i64* %sink
-+  br label %latch
-+
-+bb8:
-+  store volatile i64 8, i64* %sink
-+  br label %latch
-+
-+bb9:
-+  store volatile i64 9, i64* %sink
-+  br label %latch
-+
-+latch:
-+  %i.next = load i64, i64* %p
-+  %target.i.next = getelementptr [10 x i8*], [10 x i8*]* @indirectbr_rewrite.targets, i64 0, i64 %i.next
-+  %target.next = load i8*, i8** %target.i.next
-+  ; Potentially hit a full 10 successors here so that even if we rewrite as
-+  ; a switch it will try to be lowered with a jump table.
-+  indirectbr i8* %target.next, [label %bb0,
-+                                label %bb1,
-+                                label %bb2,
-+                                label %bb3,
-+                                label %bb4,
-+                                label %bb5,
-+                                label %bb6,
-+                                label %bb7,
-+                                label %bb8,
-+                                label %bb9]
-+}
-+
-+; Lastly check that the necessary thunks were emitted.
-+;
-+; X64-LABEL:         .section        .text.__llvm_retpoline_r11,{{.*}},__llvm_retpoline_r11,comdat
-+; X64-NEXT:          .hidden __llvm_retpoline_r11
-+; X64-NEXT:          .weak   __llvm_retpoline_r11
-+; X64:       __llvm_retpoline_r11:
-+; X64-NEXT:  # {{.*}}                                # %entry
-+; X64-NEXT:          callq   [[CALL_TARGET:.*]]
-+; X64-NEXT:  [[CAPTURE_SPEC:.*]]:                    # Block address taken
-+; X64-NEXT:                                          # %entry
-+; X64-NEXT:                                          # =>This Inner Loop Header: Depth=1
-+; X64-NEXT:          pause
-+; X64-NEXT:          lfence
-+; X64-NEXT:          jmp     [[CAPTURE_SPEC]]
-+; X64-NEXT:          .p2align        4, 0x90
-+; X64-NEXT:  [[CALL_TARGET]]:                        # Block address taken
-+; X64-NEXT:                                          # %entry
-+; X64-NEXT:          movq    %r11, (%rsp)
-+; X64-NEXT:          retq
-+;
-+; X86-LABEL:         .section        .text.__llvm_retpoline_eax,{{.*}},__llvm_retpoline_eax,comdat
-+; X86-NEXT:          .hidden __llvm_retpoline_eax
-+; X86-NEXT:          .weak   __llvm_retpoline_eax
-+; X86:       __llvm_retpoline_eax:
-+; X86-NEXT:  # {{.*}}                                # %entry
-+; X86-NEXT:          calll   [[CALL_TARGET:.*]]
-+; X86-NEXT:  [[CAPTURE_SPEC:.*]]:                    # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:                                          # =>This Inner Loop Header: Depth=1
-+; X86-NEXT:          pause
-+; X86-NEXT:          lfence
-+; X86-NEXT:          jmp     [[CAPTURE_SPEC]]
-+; X86-NEXT:          .p2align        4, 0x90
-+; X86-NEXT:  [[CALL_TARGET]]:                        # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:          movl    %eax, (%esp)
-+; X86-NEXT:          retl
-+;
-+; X86-LABEL:         .section        .text.__llvm_retpoline_ecx,{{.*}},__llvm_retpoline_ecx,comdat
-+; X86-NEXT:          .hidden __llvm_retpoline_ecx
-+; X86-NEXT:          .weak   __llvm_retpoline_ecx
-+; X86:       __llvm_retpoline_ecx:
-+; X86-NEXT:  # {{.*}}                                # %entry
-+; X86-NEXT:          calll   [[CALL_TARGET:.*]]
-+; X86-NEXT:  [[CAPTURE_SPEC:.*]]:                    # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:                                          # =>This Inner Loop Header: Depth=1
-+; X86-NEXT:          pause
-+; X86-NEXT:          lfence
-+; X86-NEXT:          jmp     [[CAPTURE_SPEC]]
-+; X86-NEXT:          .p2align        4, 0x90
-+; X86-NEXT:  [[CALL_TARGET]]:                        # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:          movl    %ecx, (%esp)
-+; X86-NEXT:          retl
-+;
-+; X86-LABEL:         .section        .text.__llvm_retpoline_edx,{{.*}},__llvm_retpoline_edx,comdat
-+; X86-NEXT:          .hidden __llvm_retpoline_edx
-+; X86-NEXT:          .weak   __llvm_retpoline_edx
-+; X86:       __llvm_retpoline_edx:
-+; X86-NEXT:  # {{.*}}                                # %entry
-+; X86-NEXT:          calll   [[CALL_TARGET:.*]]
-+; X86-NEXT:  [[CAPTURE_SPEC:.*]]:                    # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:                                          # =>This Inner Loop Header: Depth=1
-+; X86-NEXT:          pause
-+; X86-NEXT:          lfence
-+; X86-NEXT:          jmp     [[CAPTURE_SPEC]]
-+; X86-NEXT:          .p2align        4, 0x90
-+; X86-NEXT:  [[CALL_TARGET]]:                        # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:          movl    %edx, (%esp)
-+; X86-NEXT:          retl
-+;
-+; X86-LABEL:         .section        .text.__llvm_retpoline_push,{{.*}},__llvm_retpoline_push,comdat
-+; X86-NEXT:          .hidden __llvm_retpoline_push
-+; X86-NEXT:          .weak   __llvm_retpoline_push
-+; X86:       __llvm_retpoline_push:
-+; X86-NEXT:  # {{.*}}                                # %entry
-+; X86-NEXT:          calll   [[CALL_TARGET:.*]]
-+; X86-NEXT:  [[CAPTURE_SPEC:.*]]:                    # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:                                          # =>This Inner Loop Header: Depth=1
-+; X86-NEXT:          pause
-+; X86-NEXT:          lfence
-+; X86-NEXT:          jmp     [[CAPTURE_SPEC]]
-+; X86-NEXT:          .p2align        4, 0x90
-+; X86-NEXT:  [[CALL_TARGET]]:                        # Block address taken
-+; X86-NEXT:                                          # %entry
-+; X86-NEXT:          addl    $4, %esp
-+; X86-NEXT:          pushl   4(%esp)
-+; X86-NEXT:          pushl   4(%esp)
-+; X86-NEXT:          popl    8(%esp)
-+; X86-NEXT:          popl    (%esp)
-+; X86-NEXT:          retl
-+
-+
-+attributes #0 = { "target-features"="+retpoline" }
-+attributes #1 = { nonlazybind }
-diff --git a/test/Transforms/IndirectBrExpand/basic.ll b/test/Transforms/IndirectBrExpand/basic.ll
-new file mode 100644
-index 0000000..d0319c6
---- /dev/null
-+++ b/test/Transforms/IndirectBrExpand/basic.ll
-@@ -0,0 +1,63 @@
-+; RUN: opt < %s -indirectbr-expand -S | FileCheck %s
-+;
-+; REQUIRES: x86-registered-target
-+
-+target triple = "x86_64-unknown-linux-gnu"
-+
-+@test1.targets = constant [4 x i8*] [i8* blockaddress(@test1, %bb0),
-+                                     i8* blockaddress(@test1, %bb1),
-+                                     i8* blockaddress(@test1, %bb2),
-+                                     i8* blockaddress(@test1, %bb3)]
-+; CHECK-LABEL: @test1.targets = constant [4 x i8*]
-+; CHECK:       [i8* inttoptr (i64 1 to i8*),
-+; CHECK:        i8* inttoptr (i64 2 to i8*),
-+; CHECK:        i8* inttoptr (i64 3 to i8*),
-+; CHECK:        i8* blockaddress(@test1, %bb3)]
-+
-+define void @test1(i64* readonly %p, i64* %sink) #0 {
-+; CHECK-LABEL: define void @test1(
-+entry:
-+  %i0 = load i64, i64* %p
-+  %target.i0 = getelementptr [4 x i8*], [4 x i8*]* @test1.targets, i64 0, i64 %i0
-+  %target0 = load i8*, i8** %target.i0
-+  ; Only a subset of blocks are viable successors here.
-+  indirectbr i8* %target0, [label %bb0, label %bb1]
-+; CHECK-NOT:     indirectbr
-+; CHECK:         %[[ENTRY_V:.*]] = ptrtoint i8* %{{.*}} to i64
-+; CHECK-NEXT:    br label %[[SWITCH_BB:.*]]
-+
-+bb0:
-+  store volatile i64 0, i64* %sink
-+  br label %latch
-+
-+bb1:
-+  store volatile i64 1, i64* %sink
-+  br label %latch
-+
-+bb2:
-+  store volatile i64 2, i64* %sink
-+  br label %latch
-+
-+bb3:
-+  store volatile i64 3, i64* %sink
-+  br label %latch
-+
-+latch:
-+  %i.next = load i64, i64* %p
-+  %target.i.next = getelementptr [4 x i8*], [4 x i8*]* @test1.targets, i64 0, i64 %i.next
-+  %target.next = load i8*, i8** %target.i.next
-+  ; A different subset of blocks are viable successors here.
-+  indirectbr i8* %target.next, [label %bb1, label %bb2]
-+; CHECK-NOT:     indirectbr
-+; CHECK:         %[[LATCH_V:.*]] = ptrtoint i8* %{{.*}} to i64
-+; CHECK-NEXT:    br label %[[SWITCH_BB]]
-+;
-+; CHECK:       [[SWITCH_BB]]:
-+; CHECK-NEXT:    %[[V:.*]] = phi i64 [ %[[ENTRY_V]], %entry ], [ %[[LATCH_V]], %latch ]
-+; CHECK-NEXT:    switch i64 %[[V]], label %bb0 [
-+; CHECK-NEXT:      i64 2, label %bb1
-+; CHECK-NEXT:      i64 3, label %bb2
-+; CHECK-NEXT:    ]
-+}
-+
-+attributes #0 = { "target-features"="+retpoline" }
-diff --git a/tools/opt/opt.cpp b/tools/opt/opt.cpp
-index 24cce58..1c4a599 100644
---- a/tools/opt/opt.cpp
-+++ b/tools/opt/opt.cpp
-@@ -401,6 +401,7 @@ int main(int argc, char **argv) {
-   initializeSjLjEHPreparePass(Registry);
-   initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
-   initializeGlobalMergePass(Registry);
-+  initializeIndirectBrExpandPassPass(Registry);
-   initializeInterleavedAccessPass(Registry);
-   initializeCountingFunctionInserterPass(Registry);
-   initializeUnreachableBlockElimLegacyPassPass(Registry);
--- 
-1.8.3.1
-
diff --git a/0001-Merging-r323915.patch b/0001-Merging-r323915.patch
deleted file mode 100644
index d076c67..0000000
--- a/0001-Merging-r323915.patch
+++ /dev/null
@@ -1,287 +0,0 @@
-From b4b2cc0cca3595185683aa7aa4d29c4a151a679e Mon Sep 17 00:00:00 2001
-From: Reid Kleckner <rnk@google.com>
-Date: Thu, 1 Feb 2018 21:31:35 +0000
-Subject: [PATCH] Merging r323915:
- ------------------------------------------------------------------------
- r323915 | chandlerc | 2018-01-31 12:56:37 -0800 (Wed, 31 Jan 2018) | 17 lines
-
-[x86] Make the retpoline thunk insertion a machine function pass.
-
-Summary:
-This removes the need for a machine module pass using some deeply
-questionable hacks. This should address PR36123 which is a case where in
-full LTO the memory usage of a machine module pass actually ended up
-being significant.
-
-We should revert this on trunk as soon as we understand and fix the
-memory usage issue, but we should include this in any backports of
-retpolines themselves.
-
-Reviewers: echristo, MatzeB
-
-Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits
-
-Differential Revision: https://reviews.llvm.org/D42726
-------------------------------------------------------------------------
-
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@324009 91177308-0d34-0410-b5e6-96231b3b80d8
----
- lib/Target/X86/X86.h                  |   2 +-
- lib/Target/X86/X86RetpolineThunks.cpp | 135 +++++++++++++++++++++-------------
- test/CodeGen/X86/O0-pipeline.ll       |   3 +-
- 3 files changed, 87 insertions(+), 53 deletions(-)
-
-diff --git a/lib/Target/X86/X86.h b/lib/Target/X86/X86.h
-index 25e4b89..2e3ace2 100644
---- a/lib/Target/X86/X86.h
-+++ b/lib/Target/X86/X86.h
-@@ -100,7 +100,7 @@ void initializeFixupBWInstPassPass(PassRegistry &);
- FunctionPass *createX86EvexToVexInsts();
- 
- /// This pass creates the thunks for the retpoline feature.
--ModulePass *createX86RetpolineThunksPass();
-+FunctionPass *createX86RetpolineThunksPass();
- 
- InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
-                                                   X86Subtarget &,
-diff --git a/lib/Target/X86/X86RetpolineThunks.cpp b/lib/Target/X86/X86RetpolineThunks.cpp
-index 6b4bc8a..223fa57 100644
---- a/lib/Target/X86/X86RetpolineThunks.cpp
-+++ b/lib/Target/X86/X86RetpolineThunks.cpp
-@@ -38,18 +38,27 @@ using namespace llvm;
- 
- #define DEBUG_TYPE "x86-retpoline-thunks"
- 
-+static const char ThunkNamePrefix[] = "__llvm_retpoline_";
-+static const char R11ThunkName[]    = "__llvm_retpoline_r11";
-+static const char EAXThunkName[]    = "__llvm_retpoline_eax";
-+static const char ECXThunkName[]    = "__llvm_retpoline_ecx";
-+static const char EDXThunkName[]    = "__llvm_retpoline_edx";
-+static const char PushThunkName[]   = "__llvm_retpoline_push";
-+
- namespace {
--class X86RetpolineThunks : public ModulePass {
-+class X86RetpolineThunks : public MachineFunctionPass {
- public:
-   static char ID;
- 
--  X86RetpolineThunks() : ModulePass(ID) {}
-+  X86RetpolineThunks() : MachineFunctionPass(ID) {}
- 
-   StringRef getPassName() const override { return "X86 Retpoline Thunks"; }
- 
--  bool runOnModule(Module &M) override;
-+  bool doInitialization(Module &M) override;
-+  bool runOnMachineFunction(MachineFunction &F) override;
- 
-   void getAnalysisUsage(AnalysisUsage &AU) const override {
-+    MachineFunctionPass::getAnalysisUsage(AU);
-     AU.addRequired<MachineModuleInfo>();
-     AU.addPreserved<MachineModuleInfo>();
-   }
-@@ -61,51 +70,74 @@ private:
-   const X86Subtarget *STI;
-   const X86InstrInfo *TII;
- 
--  Function *createThunkFunction(Module &M, StringRef Name);
-+  bool InsertedThunks;
-+
-+  void createThunkFunction(Module &M, StringRef Name);
-   void insertRegReturnAddrClobber(MachineBasicBlock &MBB, unsigned Reg);
-   void insert32BitPushReturnAddrClobber(MachineBasicBlock &MBB);
--  void createThunk(Module &M, StringRef NameSuffix,
--                   Optional<unsigned> Reg = None);
-+  void populateThunk(MachineFunction &MF, Optional<unsigned> Reg = None);
- };
- 
- } // end anonymous namespace
- 
--ModulePass *llvm::createX86RetpolineThunksPass() {
-+FunctionPass *llvm::createX86RetpolineThunksPass() {
-   return new X86RetpolineThunks();
- }
- 
- char X86RetpolineThunks::ID = 0;
- 
--bool X86RetpolineThunks::runOnModule(Module &M) {
--  DEBUG(dbgs() << getPassName() << '\n');
-+bool X86RetpolineThunks::doInitialization(Module &M) {
-+  InsertedThunks = false;
-+  return false;
-+}
- 
--  auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
--  assert(TPC && "X86-specific target pass should not be run without a target "
--                "pass config!");
-+bool X86RetpolineThunks::runOnMachineFunction(MachineFunction &MF) {
-+  DEBUG(dbgs() << getPassName() << '\n');
- 
--  MMI = &getAnalysis<MachineModuleInfo>();
--  TM = &TPC->getTM<TargetMachine>();
-+  TM = &MF.getTarget();;
-+  STI = &MF.getSubtarget<X86Subtarget>();
-+  TII = STI->getInstrInfo();
-   Is64Bit = TM->getTargetTriple().getArch() == Triple::x86_64;
- 
--  // Only add a thunk if we have at least one function that has the retpoline
--  // feature enabled in its subtarget.
--  // FIXME: Conditionalize on indirect calls so we don't emit a thunk when
--  // nothing will end up calling it.
--  // FIXME: It's a little silly to look at every function just to enumerate
--  // the subtargets, but eventually we'll want to look at them for indirect
--  // calls, so maybe this is OK.
--  if (!llvm::any_of(M, [&](const Function &F) {
--        // Save the subtarget we find for use in emitting the subsequent
--        // thunk.
--        STI = &TM->getSubtarget<X86Subtarget>(F);
--        return STI->useRetpoline() && !STI->useRetpolineExternalThunk();
--      }))
--    return false;
--
--  // If we have a relevant subtarget, get the instr info as well.
--  TII = STI->getInstrInfo();
-+  MMI = &getAnalysis<MachineModuleInfo>();
-+  Module &M = const_cast<Module &>(*MMI->getModule());
-+
-+  // If this function is not a thunk, check to see if we need to insert
-+  // a thunk.
-+  if (!MF.getName().startswith(ThunkNamePrefix)) {
-+    // If we've already inserted a thunk, nothing else to do.
-+    if (InsertedThunks)
-+      return false;
-+
-+    // Only add a thunk if one of the functions has the retpoline feature
-+    // enabled in its subtarget, and doesn't enable external thunks.
-+    // FIXME: Conditionalize on indirect calls so we don't emit a thunk when
-+    // nothing will end up calling it.
-+    // FIXME: It's a little silly to look at every function just to enumerate
-+    // the subtargets, but eventually we'll want to look at them for indirect
-+    // calls, so maybe this is OK.
-+    if (!STI->useRetpoline() || STI->useRetpolineExternalThunk())
-+      return false;
-+
-+    // Otherwise, we need to insert the thunk.
-+    // WARNING: This is not really a well behaving thing to do in a function
-+    // pass. We extract the module and insert a new function (and machine
-+    // function) directly into the module.
-+    if (Is64Bit)
-+      createThunkFunction(M, R11ThunkName);
-+    else
-+      for (StringRef Name :
-+           {EAXThunkName, ECXThunkName, EDXThunkName, PushThunkName})
-+        createThunkFunction(M, Name);
-+    InsertedThunks = true;
-+    return true;
-+  }
- 
-+  // If this *is* a thunk function, we need to populate it with the correct MI.
-   if (Is64Bit) {
-+    assert(MF.getName() == "__llvm_retpoline_r11" &&
-+           "Should only have an r11 thunk on 64-bit targets");
-+
-     // __llvm_retpoline_r11:
-     //   callq .Lr11_call_target
-     // .Lr11_capture_spec:
-@@ -116,8 +148,7 @@ bool X86RetpolineThunks::runOnModule(Module &M) {
-     // .Lr11_call_target:
-     //   movq %r11, (%rsp)
-     //   retq
--
--    createThunk(M, "r11", X86::R11);
-+    populateThunk(MF, X86::R11);
-   } else {
-     // For 32-bit targets we need to emit a collection of thunks for various
-     // possible scratch registers as well as a fallback that is used when
-@@ -161,16 +192,25 @@ bool X86RetpolineThunks::runOnModule(Module &M) {
-     //         popl 8(%esp)   # Pop RA to final RA
-     //         popl (%esp)    # Pop callee to next top of stack
-     //         retl           # Ret to callee
--    createThunk(M, "eax", X86::EAX);
--    createThunk(M, "ecx", X86::ECX);
--    createThunk(M, "edx", X86::EDX);
--    createThunk(M, "push");
-+    if (MF.getName() == EAXThunkName)
-+      populateThunk(MF, X86::EAX);
-+    else if (MF.getName() == ECXThunkName)
-+      populateThunk(MF, X86::ECX);
-+    else if (MF.getName() == EDXThunkName)
-+      populateThunk(MF, X86::EDX);
-+    else if (MF.getName() == PushThunkName)
-+      populateThunk(MF);
-+    else
-+      llvm_unreachable("Invalid thunk name on x86-32!");
-   }
- 
-   return true;
- }
- 
--Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
-+void X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
-+  assert(Name.startswith(ThunkNamePrefix) &&
-+         "Created a thunk with an unexpected prefix!");
-+
-   LLVMContext &Ctx = M.getContext();
-   auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
-   Function *F =
-@@ -190,7 +230,6 @@ Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
-   IRBuilder<> Builder(Entry);
- 
-   Builder.CreateRetVoid();
--  return F;
- }
- 
- void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
-@@ -200,6 +239,7 @@ void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
-   addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(MovOpc)), SPReg, false, 0)
-       .addReg(Reg);
- }
-+
- void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
-     MachineBasicBlock &MBB) {
-   // The instruction sequence we use to replace the return address without
-@@ -225,21 +265,16 @@ void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
-                false, 0);
- }
- 
--void X86RetpolineThunks::createThunk(Module &M, StringRef NameSuffix,
--                                     Optional<unsigned> Reg) {
--  Function &F =
--      *createThunkFunction(M, (Twine("__llvm_retpoline_") + NameSuffix).str());
--  MachineFunction &MF = MMI->getOrCreateMachineFunction(F);
--
-+void X86RetpolineThunks::populateThunk(MachineFunction &MF,
-+                                       Optional<unsigned> Reg) {
-   // Set MF properties. We never use vregs...
-   MF.getProperties().set(MachineFunctionProperties::Property::NoVRegs);
- 
--  BasicBlock &OrigEntryBB = F.getEntryBlock();
--  MachineBasicBlock *Entry = MF.CreateMachineBasicBlock(&OrigEntryBB);
--  MachineBasicBlock *CaptureSpec = MF.CreateMachineBasicBlock(&OrigEntryBB);
--  MachineBasicBlock *CallTarget = MF.CreateMachineBasicBlock(&OrigEntryBB);
-+  MachineBasicBlock *Entry = &MF.front();
-+  Entry->clear();
- 
--  MF.push_back(Entry);
-+  MachineBasicBlock *CaptureSpec = MF.CreateMachineBasicBlock(Entry->getBasicBlock());
-+  MachineBasicBlock *CallTarget = MF.CreateMachineBasicBlock(Entry->getBasicBlock());
-   MF.push_back(CaptureSpec);
-   MF.push_back(CallTarget);
- 
-diff --git a/test/CodeGen/X86/O0-pipeline.ll b/test/CodeGen/X86/O0-pipeline.ll
-index f9bd66f..123dcf6 100644
---- a/test/CodeGen/X86/O0-pipeline.ll
-+++ b/test/CodeGen/X86/O0-pipeline.ll
-@@ -56,8 +56,7 @@
- ; CHECK-NEXT:       Machine Natural Loop Construction
- ; CHECK-NEXT:       Insert XRay ops
- ; CHECK-NEXT:       Implement the 'patchable-function' attribute
--; CHECK-NEXT:     X86 Retpoline Thunks
--; CHECK-NEXT:     FunctionPass Manager
-+; CHECK-NEXT:       X86 Retpoline Thunks
- ; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
- ; CHECK-NEXT:       Machine Optimization Remark Emitter
- ; CHECK-NEXT:       MachineDominator Tree Construction
--- 
-1.8.3.1
-
diff --git a/0001-Merging-r324449.patch b/0001-Merging-r324449.patch
deleted file mode 100644
index 864962a..0000000
--- a/0001-Merging-r324449.patch
+++ /dev/null
@@ -1,237 +0,0 @@
-From 4e5fddc22a28e0e59d6409a98fb22eba32d0eae7 Mon Sep 17 00:00:00 2001
-From: Reid Kleckner <rnk@google.com>
-Date: Wed, 14 Feb 2018 00:32:26 +0000
-Subject: [PATCH 1/4] Merging r324449:
- ------------------------------------------------------------------------
- r324449 | chandlerc | 2018-02-06 22:16:24 -0800 (Tue, 06 Feb 2018) | 15 lines
-
-[x86/retpoline] Make the external thunk names exactly match the names
-that happened to end up in GCC.
-
-This is really unfortunate, as the names don't have much rhyme or reason
-to them. Originally in the discussions it seemed fine to rely on aliases
-to map different names to whatever external thunk code developers wished
-to use but there are practical problems with that in the kernel it turns
-out. And since we're discovering this practical problems late and since
-GCC has already shipped a release with one set of names, we are forced,
-yet again, to blindly match what is there.
-
-Somewhat rushing this patch out for the Linux kernel folks to test and
-so we can get it patched into our releases.
-
-Differential Revision: https://reviews.llvm.org/D42998
-------------------------------------------------------------------------
-
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@325088 91177308-0d34-0410-b5e6-96231b3b80d8
----
- lib/Target/X86/X86ISelLowering.cpp     | 59 +++++++++++++++++++++++++---------
- test/CodeGen/X86/retpoline-external.ll | 48 +++++++++++++--------------
- 2 files changed, 68 insertions(+), 39 deletions(-)
-
-diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
-index 2c2294d..9aa3023 100644
---- a/lib/Target/X86/X86ISelLowering.cpp
-+++ b/lib/Target/X86/X86ISelLowering.cpp
-@@ -26250,28 +26250,57 @@ static unsigned getOpcodeForRetpoline(unsigned RPOpc) {
- 
- static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
-                                       unsigned Reg) {
-+  if (Subtarget.useRetpolineExternalThunk()) {
-+    // When using an external thunk for retpolines, we pick names that match the
-+    // names GCC happens to use as well. This helps simplify the implementation
-+    // of the thunks for kernels where they have no easy ability to create
-+    // aliases and are doing non-trivial configuration of the thunk's body. For
-+    // example, the Linux kernel will do boot-time hot patching of the thunk
-+    // bodies and cannot easily export aliases of these to loaded modules.
-+    //
-+    // Note that at any point in the future, we may need to change the semantics
-+    // of how we implement retpolines and at that time will likely change the
-+    // name of the called thunk. Essentially, there is no hard guarantee that
-+    // LLVM will generate calls to specific thunks, we merely make a best-effort
-+    // attempt to help out kernels and other systems where duplicating the
-+    // thunks is costly.
-+    switch (Reg) {
-+    case 0:
-+      assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
-+      return "__x86_indirect_thunk";
-+    case X86::EAX:
-+      assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+      return "__x86_indirect_thunk_eax";
-+    case X86::ECX:
-+      assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+      return "__x86_indirect_thunk_ecx";
-+    case X86::EDX:
-+      assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+      return "__x86_indirect_thunk_edx";
-+    case X86::R11:
-+      assert(Subtarget.is64Bit() && "Should not be using a 64-bit thunk!");
-+      return "__x86_indirect_thunk_r11";
-+    }
-+    llvm_unreachable("unexpected reg for retpoline");
-+  }
-+
-+  // When targeting an internal COMDAT thunk use an LLVM-specific name.
-   switch (Reg) {
-   case 0:
-     assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
--    return Subtarget.useRetpolineExternalThunk()
--               ? "__llvm_external_retpoline_push"
--               : "__llvm_retpoline_push";
-+    return "__llvm_retpoline_push";
-   case X86::EAX:
--    return Subtarget.useRetpolineExternalThunk()
--               ? "__llvm_external_retpoline_eax"
--               : "__llvm_retpoline_eax";
-+    assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+    return "__llvm_retpoline_eax";
-   case X86::ECX:
--    return Subtarget.useRetpolineExternalThunk()
--               ? "__llvm_external_retpoline_ecx"
--               : "__llvm_retpoline_ecx";
-+    assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+    return "__llvm_retpoline_ecx";
-   case X86::EDX:
--    return Subtarget.useRetpolineExternalThunk()
--               ? "__llvm_external_retpoline_edx"
--               : "__llvm_retpoline_edx";
-+    assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+    return "__llvm_retpoline_edx";
-   case X86::R11:
--    return Subtarget.useRetpolineExternalThunk()
--               ? "__llvm_external_retpoline_r11"
--               : "__llvm_retpoline_r11";
-+    assert(Subtarget.is64Bit() && "Should not be using a 64-bit thunk!");
-+    return "__llvm_retpoline_r11";
-   }
-   llvm_unreachable("unexpected reg for retpoline");
- }
-diff --git a/test/CodeGen/X86/retpoline-external.ll b/test/CodeGen/X86/retpoline-external.ll
-index 66d32ba..2f21bb2 100644
---- a/test/CodeGen/X86/retpoline-external.ll
-+++ b/test/CodeGen/X86/retpoline-external.ll
-@@ -23,18 +23,18 @@ entry:
- ; X64:       callq bar
- ; X64-DAG:   movl %[[x]], %edi
- ; X64-DAG:   movq %[[fp]], %r11
--; X64:       callq __llvm_external_retpoline_r11
-+; X64:       callq __x86_indirect_thunk_r11
- ; X64:       movl %[[x]], %edi
- ; X64:       callq bar
- ; X64-DAG:   movl %[[x]], %edi
- ; X64-DAG:   movq %[[fp]], %r11
--; X64:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+; X64:       jmp __x86_indirect_thunk_r11 # TAILCALL
- 
- ; X64FAST-LABEL: icall_reg:
- ; X64FAST:       callq bar
--; X64FAST:       callq __llvm_external_retpoline_r11
-+; X64FAST:       callq __x86_indirect_thunk_r11
- ; X64FAST:       callq bar
--; X64FAST:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+; X64FAST:       jmp __x86_indirect_thunk_r11 # TAILCALL
- 
- ; X86-LABEL: icall_reg:
- ; X86-DAG:   movl 12(%esp), %[[fp:[^ ]*]]
-@@ -43,19 +43,19 @@ entry:
- ; X86:       calll bar
- ; X86:       movl %[[fp]], %eax
- ; X86:       pushl %[[x]]
--; X86:       calll __llvm_external_retpoline_eax
-+; X86:       calll __x86_indirect_thunk_eax
- ; X86:       pushl %[[x]]
- ; X86:       calll bar
- ; X86:       movl %[[fp]], %eax
- ; X86:       pushl %[[x]]
--; X86:       calll __llvm_external_retpoline_eax
-+; X86:       calll __x86_indirect_thunk_eax
- ; X86-NOT:   # TAILCALL
- 
- ; X86FAST-LABEL: icall_reg:
- ; X86FAST:       calll bar
--; X86FAST:       calll __llvm_external_retpoline_eax
-+; X86FAST:       calll __x86_indirect_thunk_eax
- ; X86FAST:       calll bar
--; X86FAST:       calll __llvm_external_retpoline_eax
-+; X86FAST:       calll __x86_indirect_thunk_eax
- 
- 
- @global_fp = external global void (i32)*
-@@ -72,28 +72,28 @@ define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
- ; X64-LABEL: icall_global_fp:
- ; X64-DAG:   movl %edi, %[[x:[^ ]*]]
- ; X64-DAG:   movq global_fp(%rip), %r11
--; X64:       callq __llvm_external_retpoline_r11
-+; X64:       callq __x86_indirect_thunk_r11
- ; X64-DAG:   movl %[[x]], %edi
- ; X64-DAG:   movq global_fp(%rip), %r11
--; X64:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+; X64:       jmp __x86_indirect_thunk_r11 # TAILCALL
- 
- ; X64FAST-LABEL: icall_global_fp:
- ; X64FAST:       movq global_fp(%rip), %r11
--; X64FAST:       callq __llvm_external_retpoline_r11
-+; X64FAST:       callq __x86_indirect_thunk_r11
- ; X64FAST:       movq global_fp(%rip), %r11
--; X64FAST:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+; X64FAST:       jmp __x86_indirect_thunk_r11 # TAILCALL
- 
- ; X86-LABEL: icall_global_fp:
- ; X86:       movl global_fp, %eax
- ; X86:       pushl 4(%esp)
--; X86:       calll __llvm_external_retpoline_eax
-+; X86:       calll __x86_indirect_thunk_eax
- ; X86:       addl $4, %esp
- ; X86:       movl global_fp, %eax
--; X86:       jmp __llvm_external_retpoline_eax # TAILCALL
-+; X86:       jmp __x86_indirect_thunk_eax # TAILCALL
- 
- ; X86FAST-LABEL: icall_global_fp:
--; X86FAST:       calll __llvm_external_retpoline_eax
--; X86FAST:       jmp __llvm_external_retpoline_eax # TAILCALL
-+; X86FAST:       calll __x86_indirect_thunk_eax
-+; X86FAST:       jmp __x86_indirect_thunk_eax # TAILCALL
- 
- 
- %struct.Foo = type { void (%struct.Foo*)** }
-@@ -114,14 +114,14 @@ define void @vcall(%struct.Foo* %obj) #0 {
- ; X64:       movq (%[[obj]]), %[[vptr:[^ ]*]]
- ; X64:       movq 8(%[[vptr]]), %[[fp:[^ ]*]]
- ; X64:       movq %[[fp]], %r11
--; X64:       callq __llvm_external_retpoline_r11
-+; X64:       callq __x86_indirect_thunk_r11
- ; X64-DAG:   movq %[[obj]], %rdi
- ; X64-DAG:   movq %[[fp]], %r11
--; X64:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+; X64:       jmp __x86_indirect_thunk_r11 # TAILCALL
- 
- ; X64FAST-LABEL: vcall:
--; X64FAST:       callq __llvm_external_retpoline_r11
--; X64FAST:       jmp __llvm_external_retpoline_r11 # TAILCALL
-+; X64FAST:       callq __x86_indirect_thunk_r11
-+; X64FAST:       jmp __x86_indirect_thunk_r11 # TAILCALL
- 
- ; X86-LABEL: vcall:
- ; X86:       movl 8(%esp), %[[obj:[^ ]*]]
-@@ -129,14 +129,14 @@ define void @vcall(%struct.Foo* %obj) #0 {
- ; X86:       movl 4(%[[vptr]]), %[[fp:[^ ]*]]
- ; X86:       movl %[[fp]], %eax
- ; X86:       pushl %[[obj]]
--; X86:       calll __llvm_external_retpoline_eax
-+; X86:       calll __x86_indirect_thunk_eax
- ; X86:       addl $4, %esp
- ; X86:       movl %[[fp]], %eax
--; X86:       jmp __llvm_external_retpoline_eax # TAILCALL
-+; X86:       jmp __x86_indirect_thunk_eax # TAILCALL
- 
- ; X86FAST-LABEL: vcall:
--; X86FAST:       calll __llvm_external_retpoline_eax
--; X86FAST:       jmp __llvm_external_retpoline_eax # TAILCALL
-+; X86FAST:       calll __x86_indirect_thunk_eax
-+; X86FAST:       jmp __x86_indirect_thunk_eax # TAILCALL
- 
- 
- declare void @direct_callee()
--- 
-1.8.3.1
-
diff --git a/0002-Merging-r324645.patch b/0002-Merging-r324645.patch
deleted file mode 100644
index 381f18e..0000000
--- a/0002-Merging-r324645.patch
+++ /dev/null
@@ -1,88 +0,0 @@
-From 8f5f7f9cb15387ddb010894c17e788b3116fe26d Mon Sep 17 00:00:00 2001
-From: Reid Kleckner <rnk@google.com>
-Date: Wed, 14 Feb 2018 00:33:00 +0000
-Subject: [PATCH 2/4] Merging r324645:
- ------------------------------------------------------------------------
- r324645 | dwmw2 | 2018-02-08 12:06:05 -0800 (Thu, 08 Feb 2018) | 5 lines
-
-[X86] Support 'V' register operand modifier
-
-This allows the register name to be printed without the leading '%'.
-This can be used for emitting calls to the retpoline thunks from inline
-asm.
-------------------------------------------------------------------------
-
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@325089 91177308-0d34-0410-b5e6-96231b3b80d8
----
- lib/Target/X86/X86AsmPrinter.cpp          | 11 ++++++++++-
- test/CodeGen/X86/inline-asm-modifier-V.ll | 14 ++++++++++++++
- 2 files changed, 24 insertions(+), 1 deletion(-)
- create mode 100644 test/CodeGen/X86/inline-asm-modifier-V.ll
-
-diff --git a/lib/Target/X86/X86AsmPrinter.cpp b/lib/Target/X86/X86AsmPrinter.cpp
-index dc15aea..8c7ddd9 100644
---- a/lib/Target/X86/X86AsmPrinter.cpp
-+++ b/lib/Target/X86/X86AsmPrinter.cpp
-@@ -344,6 +344,8 @@ static void printIntelMemReference(X86AsmPrinter &P, const MachineInstr *MI,
- static bool printAsmMRegister(X86AsmPrinter &P, const MachineOperand &MO,
-                               char Mode, raw_ostream &O) {
-   unsigned Reg = MO.getReg();
-+  bool EmitPercent = true;
-+
-   switch (Mode) {
-   default: return true;  // Unknown mode.
-   case 'b': // Print QImode register
-@@ -358,6 +360,9 @@ static bool printAsmMRegister(X86AsmPrinter &P, const MachineOperand &MO,
-   case 'k': // Print SImode register
-     Reg = getX86SubSuperRegister(Reg, 32);
-     break;
-+  case 'V':
-+    EmitPercent = false;
-+    LLVM_FALLTHROUGH;
-   case 'q':
-     // Print 64-bit register names if 64-bit integer registers are available.
-     // Otherwise, print 32-bit register names.
-@@ -365,7 +370,10 @@ static bool printAsmMRegister(X86AsmPrinter &P, const MachineOperand &MO,
-     break;
-   }
- 
--  O << '%' << X86ATTInstPrinter::getRegisterName(Reg);
-+  if (EmitPercent)
-+    O << '%';
-+
-+  O << X86ATTInstPrinter::getRegisterName(Reg);
-   return false;
- }
- 
-@@ -438,6 +446,7 @@ bool X86AsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo,
-     case 'w': // Print HImode register
-     case 'k': // Print SImode register
-     case 'q': // Print DImode register
-+    case 'V': // Print native register without '%'
-       if (MO.isReg())
-         return printAsmMRegister(*this, MO, ExtraCode[0], O);
-       printOperand(*this, MI, OpNo, O);
-diff --git a/test/CodeGen/X86/inline-asm-modifier-V.ll b/test/CodeGen/X86/inline-asm-modifier-V.ll
-new file mode 100644
-index 0000000..5a7f3fd
---- /dev/null
-+++ b/test/CodeGen/X86/inline-asm-modifier-V.ll
-@@ -0,0 +1,14 @@
-+; RUN: llc < %s -mtriple=i686-- -no-integrated-as | FileCheck -check-prefix=X86 %s
-+; RUN: llc < %s -mtriple=x86_64-- -no-integrated-as | FileCheck -check-prefix=X64 %s
-+
-+; If the target does not have 64-bit integer registers, emit 32-bit register
-+; names.
-+
-+; X86: call __x86_indirect_thunk_e{{[abcd]}}x
-+; X64: call __x86_indirect_thunk_r
-+
-+define void @q_modifier(i32* %p) {
-+entry:
-+  tail call void asm sideeffect "call __x86_indirect_thunk_${0:V}", "r,~{dirflag},~{fpsr},~{flags}"(i32* %p)
-+  ret void
-+}
--- 
-1.8.3.1
-
diff --git a/0003-Merging-r325049.patch b/0003-Merging-r325049.patch
deleted file mode 100644
index b207dc7..0000000
--- a/0003-Merging-r325049.patch
+++ /dev/null
@@ -1,308 +0,0 @@
-From 4594a6164d5ae9252825e23a95aa6f2fce304d6e Mon Sep 17 00:00:00 2001
-From: Reid Kleckner <rnk@google.com>
-Date: Wed, 14 Feb 2018 00:34:13 +0000
-Subject: [PATCH 3/4] Merging r325049:
- ------------------------------------------------------------------------
- r325049 | rnk | 2018-02-13 12:47:49 -0800 (Tue, 13 Feb 2018) | 17 lines
-
-[X86] Use EDI for retpoline when no scratch regs are left
-
-Summary:
-Instead of solving the hard problem of how to pass the callee to the indirect
-jump thunk without a register, just use a CSR. At a call boundary, there's
-nothing stopping us from using a CSR to hold the callee as long as we save and
-restore it in the prologue.
-
-Also, add tests for this mregparm=3 case. I wrote execution tests for
-__llvm_retpoline_push, but they never got committed as lit tests, either
-because I never rewrote them or because they got lost in merge conflicts.
-
-Reviewers: chandlerc, dwmw2
-
-Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
-
-Differential Revision: https://reviews.llvm.org/D43214
-------------------------------------------------------------------------
-
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@325090 91177308-0d34-0410-b5e6-96231b3b80d8
----
- lib/Target/X86/X86ISelLowering.cpp    | 50 +++++++++++++----------------------
- lib/Target/X86/X86RetpolineThunks.cpp | 42 ++++++++---------------------
- test/CodeGen/X86/retpoline-regparm.ll | 42 +++++++++++++++++++++++++++++
- test/CodeGen/X86/retpoline.ll         | 14 ++++------
- 4 files changed, 76 insertions(+), 72 deletions(-)
- create mode 100644 test/CodeGen/X86/retpoline-regparm.ll
-
-diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
-index 9aa3023..59a9832 100644
---- a/lib/Target/X86/X86ISelLowering.cpp
-+++ b/lib/Target/X86/X86ISelLowering.cpp
-@@ -26265,9 +26265,6 @@ static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
-     // attempt to help out kernels and other systems where duplicating the
-     // thunks is costly.
-     switch (Reg) {
--    case 0:
--      assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
--      return "__x86_indirect_thunk";
-     case X86::EAX:
-       assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-       return "__x86_indirect_thunk_eax";
-@@ -26277,6 +26274,9 @@ static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
-     case X86::EDX:
-       assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-       return "__x86_indirect_thunk_edx";
-+    case X86::EDI:
-+      assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+      return "__x86_indirect_thunk_edi";
-     case X86::R11:
-       assert(Subtarget.is64Bit() && "Should not be using a 64-bit thunk!");
-       return "__x86_indirect_thunk_r11";
-@@ -26286,9 +26286,6 @@ static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
- 
-   // When targeting an internal COMDAT thunk use an LLVM-specific name.
-   switch (Reg) {
--  case 0:
--    assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
--    return "__llvm_retpoline_push";
-   case X86::EAX:
-     assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-     return "__llvm_retpoline_eax";
-@@ -26298,6 +26295,9 @@ static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
-   case X86::EDX:
-     assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-     return "__llvm_retpoline_edx";
-+  case X86::EDI:
-+    assert(!Subtarget.is64Bit() && "Should not be using a 32-bit thunk!");
-+    return "__llvm_retpoline_edi";
-   case X86::R11:
-     assert(Subtarget.is64Bit() && "Should not be using a 64-bit thunk!");
-     return "__llvm_retpoline_r11";
-@@ -26319,15 +26319,13 @@ X86TargetLowering::EmitLoweredRetpoline(MachineInstr &MI,
-   // just use R11, but we scan for uses anyway to ensure we don't generate
-   // incorrect code. On 32-bit, we use one of EAX, ECX, or EDX that isn't
-   // already a register use operand to the call to hold the callee. If none
--  // are available, push the callee instead. This is less efficient, but is
--  // necessary for functions using 3 regparms. Such function calls are
--  // (currently) not eligible for tail call optimization, because there is no
--  // scratch register available to hold the address of the callee.
-+  // are available, use EDI instead. EDI is chosen because EBX is the PIC base
-+  // register and ESI is the base pointer to realigned stack frames with VLAs.
-   SmallVector<unsigned, 3> AvailableRegs;
-   if (Subtarget.is64Bit())
-     AvailableRegs.push_back(X86::R11);
-   else
--    AvailableRegs.append({X86::EAX, X86::ECX, X86::EDX});
-+    AvailableRegs.append({X86::EAX, X86::ECX, X86::EDX, X86::EDI});
- 
-   // Zero out any registers that are already used.
-   for (const auto &MO : MI.operands()) {
-@@ -26345,30 +26343,18 @@ X86TargetLowering::EmitLoweredRetpoline(MachineInstr &MI,
-       break;
-     }
-   }
-+  if (!AvailableReg)
-+    report_fatal_error("calling convention incompatible with retpoline, no "
-+                       "available registers");
- 
-   const char *Symbol = getRetpolineSymbol(Subtarget, AvailableReg);
- 
--  if (AvailableReg == 0) {
--    // No register available. Use PUSH. This must not be a tailcall, and this
--    // must not be x64.
--    if (Subtarget.is64Bit())
--      report_fatal_error(
--          "Cannot make an indirect call on x86-64 using both retpoline and a "
--          "calling convention that preservers r11");
--    if (Opc != X86::CALLpcrel32)
--      report_fatal_error("Cannot make an indirect tail call on x86 using "
--                         "retpoline without a preserved register");
--    BuildMI(*BB, MI, DL, TII->get(X86::PUSH32r)).addReg(CalleeVReg);
--    MI.getOperand(0).ChangeToES(Symbol);
--    MI.setDesc(TII->get(Opc));
--  } else {
--    BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), AvailableReg)
--        .addReg(CalleeVReg);
--    MI.getOperand(0).ChangeToES(Symbol);
--    MI.setDesc(TII->get(Opc));
--    MachineInstrBuilder(*BB->getParent(), &MI)
--        .addReg(AvailableReg, RegState::Implicit | RegState::Kill);
--  }
-+  BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), AvailableReg)
-+      .addReg(CalleeVReg);
-+  MI.getOperand(0).ChangeToES(Symbol);
-+  MI.setDesc(TII->get(Opc));
-+  MachineInstrBuilder(*BB->getParent(), &MI)
-+      .addReg(AvailableReg, RegState::Implicit | RegState::Kill);
-   return BB;
- }
- 
-diff --git a/lib/Target/X86/X86RetpolineThunks.cpp b/lib/Target/X86/X86RetpolineThunks.cpp
-index 223fa57..59ace3f 100644
---- a/lib/Target/X86/X86RetpolineThunks.cpp
-+++ b/lib/Target/X86/X86RetpolineThunks.cpp
-@@ -43,7 +43,7 @@ static const char R11ThunkName[]    = "__llvm_retpoline_r11";
- static const char EAXThunkName[]    = "__llvm_retpoline_eax";
- static const char ECXThunkName[]    = "__llvm_retpoline_ecx";
- static const char EDXThunkName[]    = "__llvm_retpoline_edx";
--static const char PushThunkName[]   = "__llvm_retpoline_push";
-+static const char EDIThunkName[]    = "__llvm_retpoline_edi";
- 
- namespace {
- class X86RetpolineThunks : public MachineFunctionPass {
-@@ -127,7 +127,7 @@ bool X86RetpolineThunks::runOnMachineFunction(MachineFunction &MF) {
-       createThunkFunction(M, R11ThunkName);
-     else
-       for (StringRef Name :
--           {EAXThunkName, ECXThunkName, EDXThunkName, PushThunkName})
-+           {EAXThunkName, ECXThunkName, EDXThunkName, EDIThunkName})
-         createThunkFunction(M, Name);
-     InsertedThunks = true;
-     return true;
-@@ -151,9 +151,8 @@ bool X86RetpolineThunks::runOnMachineFunction(MachineFunction &MF) {
-     populateThunk(MF, X86::R11);
-   } else {
-     // For 32-bit targets we need to emit a collection of thunks for various
--    // possible scratch registers as well as a fallback that is used when
--    // there are no scratch registers and assumes the retpoline target has
--    // been pushed.
-+    // possible scratch registers as well as a fallback that uses EDI, which is
-+    // normally callee saved.
-     //   __llvm_retpoline_eax:
-     //         calll .Leax_call_target
-     //   .Leax_capture_spec:
-@@ -174,32 +173,18 @@ bool X86RetpolineThunks::runOnMachineFunction(MachineFunction &MF) {
-     //         movl %edx, (%esp)
-     //         retl
-     //
--    // This last one is a bit more special and so needs a little extra
--    // handling.
--    // __llvm_retpoline_push:
--    //         calll .Lpush_call_target
--    // .Lpush_capture_spec:
--    //         pause
--    //         lfence
--    //         jmp .Lpush_capture_spec
--    // .align 16
--    // .Lpush_call_target:
--    //         # Clear pause_loop return address.
--    //         addl $4, %esp
--    //         # Top of stack words are: Callee, RA. Exchange Callee and RA.
--    //         pushl 4(%esp)  # Push callee
--    //         pushl 4(%esp)  # Push RA
--    //         popl 8(%esp)   # Pop RA to final RA
--    //         popl (%esp)    # Pop callee to next top of stack
--    //         retl           # Ret to callee
-+    //   __llvm_retpoline_edi:
-+    //   ... # Same setup
-+    //         movl %edi, (%esp)
-+    //         retl
-     if (MF.getName() == EAXThunkName)
-       populateThunk(MF, X86::EAX);
-     else if (MF.getName() == ECXThunkName)
-       populateThunk(MF, X86::ECX);
-     else if (MF.getName() == EDXThunkName)
-       populateThunk(MF, X86::EDX);
--    else if (MF.getName() == PushThunkName)
--      populateThunk(MF);
-+    else if (MF.getName() == EDIThunkName)
-+      populateThunk(MF, X86::EDI);
-     else
-       llvm_unreachable("Invalid thunk name on x86-32!");
-   }
-@@ -301,11 +286,6 @@ void X86RetpolineThunks::populateThunk(MachineFunction &MF,
-   CaptureSpec->addSuccessor(CaptureSpec);
- 
-   CallTarget->setAlignment(4);
--  if (Reg) {
--    insertRegReturnAddrClobber(*CallTarget, *Reg);
--  } else {
--    assert(!Is64Bit && "We only support non-reg thunks on 32-bit x86!");
--    insert32BitPushReturnAddrClobber(*CallTarget);
--  }
-+  insertRegReturnAddrClobber(*CallTarget, *Reg);
-   BuildMI(CallTarget, DebugLoc(), TII->get(RetOpc));
- }
-diff --git a/test/CodeGen/X86/retpoline-regparm.ll b/test/CodeGen/X86/retpoline-regparm.ll
-new file mode 100644
-index 0000000..13b3274
---- /dev/null
-+++ b/test/CodeGen/X86/retpoline-regparm.ll
-@@ -0,0 +1,42 @@
-+; RUN: llc -mtriple=i686-linux < %s | FileCheck --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" %s
-+
-+; Test 32-bit retpoline when -mregparm=3 is used. This case is interesting
-+; because there are no available scratch registers.  The Linux kernel builds
-+; with -mregparm=3, so we need to support it.  TCO should fail because we need
-+; to restore EDI.
-+
-+define void @call_edi(void (i32, i32, i32)* %fp) #0 {
-+entry:
-+  tail call void %fp(i32 inreg 0, i32 inreg 0, i32 inreg 0)
-+  ret void
-+}
-+
-+; CHECK-LABEL: call_edi:
-+;     EDI is used, so it must be saved.
-+; CHECK: pushl %edi
-+; CHECK-DAG: xorl %eax, %eax
-+; CHECK-DAG: xorl %edx, %edx
-+; CHECK-DAG: xorl %ecx, %ecx
-+; CHECK-DAG: movl {{.*}}, %edi
-+; CHECK: calll __llvm_retpoline_edi
-+; CHECK: popl %edi
-+; CHECK: retl
-+
-+define void @edi_external(void (i32, i32, i32)* %fp) #1 {
-+entry:
-+  tail call void %fp(i32 inreg 0, i32 inreg 0, i32 inreg 0)
-+  ret void
-+}
-+
-+; CHECK-LABEL: edi_external:
-+; CHECK: pushl %edi
-+; CHECK-DAG: xorl %eax, %eax
-+; CHECK-DAG: xorl %edx, %edx
-+; CHECK-DAG: xorl %ecx, %ecx
-+; CHECK-DAG: movl {{.*}}, %edi
-+; CHECK: calll __x86_indirect_thunk_edi
-+; CHECK: popl %edi
-+; CHECK: retl
-+
-+attributes #0 = { "target-features"="+retpoline" }
-+attributes #1 = { "target-features"="+retpoline-external-thunk" }
-diff --git a/test/CodeGen/X86/retpoline.ll b/test/CodeGen/X86/retpoline.ll
-index b0d4c85..562386e 100644
---- a/test/CodeGen/X86/retpoline.ll
-+++ b/test/CodeGen/X86/retpoline.ll
-@@ -336,10 +336,10 @@ latch:
- ; X86-NEXT:          movl    %edx, (%esp)
- ; X86-NEXT:          retl
- ;
--; X86-LABEL:         .section        .text.__llvm_retpoline_push,{{.*}},__llvm_retpoline_push,comdat
--; X86-NEXT:          .hidden __llvm_retpoline_push
--; X86-NEXT:          .weak   __llvm_retpoline_push
--; X86:       __llvm_retpoline_push:
-+; X86-LABEL:         .section        .text.__llvm_retpoline_edi,{{.*}},__llvm_retpoline_edi,comdat
-+; X86-NEXT:          .hidden __llvm_retpoline_edi
-+; X86-NEXT:          .weak   __llvm_retpoline_edi
-+; X86:       __llvm_retpoline_edi:
- ; X86-NEXT:  # {{.*}}                                # %entry
- ; X86-NEXT:          calll   [[CALL_TARGET:.*]]
- ; X86-NEXT:  [[CAPTURE_SPEC:.*]]:                    # Block address taken
-@@ -351,11 +351,7 @@ latch:
- ; X86-NEXT:          .p2align        4, 0x90
- ; X86-NEXT:  [[CALL_TARGET]]:                        # Block address taken
- ; X86-NEXT:                                          # %entry
--; X86-NEXT:          addl    $4, %esp
--; X86-NEXT:          pushl   4(%esp)
--; X86-NEXT:          pushl   4(%esp)
--; X86-NEXT:          popl    8(%esp)
--; X86-NEXT:          popl    (%esp)
-+; X86-NEXT:          movl    %edi, (%esp)
- ; X86-NEXT:          retl
- 
- 
--- 
-1.8.3.1
-
diff --git a/0004-Merging-r325085.patch b/0004-Merging-r325085.patch
deleted file mode 100644
index 6b5bd85..0000000
--- a/0004-Merging-r325085.patch
+++ /dev/null
@@ -1,65 +0,0 @@
-From de9a0f9c449d4b13c70eff8c9a3023948dc21cb7 Mon Sep 17 00:00:00 2001
-From: Reid Kleckner <rnk@google.com>
-Date: Wed, 14 Feb 2018 00:34:35 +0000
-Subject: [PATCH 4/4] Merging r325085:
- ------------------------------------------------------------------------
- r325085 | rnk | 2018-02-13 16:24:29 -0800 (Tue, 13 Feb 2018) | 3 lines
-
-[X86] Remove dead code from retpoline thunk generation
-
-Follow-up to r325049
-------------------------------------------------------------------------
-
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@325091 91177308-0d34-0410-b5e6-96231b3b80d8
----
- lib/Target/X86/X86RetpolineThunks.cpp | 26 --------------------------
- 1 file changed, 26 deletions(-)
-
-diff --git a/lib/Target/X86/X86RetpolineThunks.cpp b/lib/Target/X86/X86RetpolineThunks.cpp
-index 59ace3f..d03826b 100644
---- a/lib/Target/X86/X86RetpolineThunks.cpp
-+++ b/lib/Target/X86/X86RetpolineThunks.cpp
-@@ -74,7 +74,6 @@ private:
- 
-   void createThunkFunction(Module &M, StringRef Name);
-   void insertRegReturnAddrClobber(MachineBasicBlock &MBB, unsigned Reg);
--  void insert32BitPushReturnAddrClobber(MachineBasicBlock &MBB);
-   void populateThunk(MachineFunction &MF, Optional<unsigned> Reg = None);
- };
- 
-@@ -225,31 +224,6 @@ void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
-       .addReg(Reg);
- }
- 
--void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
--    MachineBasicBlock &MBB) {
--  // The instruction sequence we use to replace the return address without
--  // a scratch register is somewhat complicated:
--  //   # Clear capture_spec from return address.
--  //   addl $4, %esp
--  //   # Top of stack words are: Callee, RA. Exchange Callee and RA.
--  //   pushl 4(%esp)  # Push callee
--  //   pushl 4(%esp)  # Push RA
--  //   popl 8(%esp)   # Pop RA to final RA
--  //   popl (%esp)    # Pop callee to next top of stack
--  //   retl           # Ret to callee
--  BuildMI(&MBB, DebugLoc(), TII->get(X86::ADD32ri), X86::ESP)
--      .addReg(X86::ESP)
--      .addImm(4);
--  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
--               false, 4);
--  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
--               false, 4);
--  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
--               false, 8);
--  addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
--               false, 0);
--}
--
- void X86RetpolineThunks::populateThunk(MachineFunction &MF,
-                                        Optional<unsigned> Reg) {
-   // Set MF properties. We never use vregs...
--- 
-1.8.3.1
-
diff --git a/llvm.spec b/llvm.spec
index 932de46..9b9c582 100644
--- a/llvm.spec
+++ b/llvm.spec
@@ -8,11 +8,11 @@
 %global llvm_bindir %{_libdir}/%{name}
 %global maj_ver 5
 %global min_ver 0
-%global patch_ver 1
+%global patch_ver 2
 
 Name:		llvm
 Version:	%{maj_ver}.%{min_ver}.%{patch_ver}
-Release:	6%{?dist}
+Release:	1%{?dist}
 Summary:	The Low Level Virtual Machine
 
 License:	NCSA
@@ -28,12 +28,6 @@ Patch4:		0001-Revert-Add-a-linker-script-to-version-LLVM-symbols.patch
 Patch5:		0001-PowerPC-Don-t-use-xscvdpspn-on-the-P7.patch
 Patch6:		0001-Ignore-all-duplicate-frame-index-expression.patch
 Patch7:		0002-Reinstantiate-old-bad-deduplication-logic-that-was-r.patch
-Patch8:		0001-Merging-r323155.patch
-Patch9:		0001-Merging-r323915.patch
-Patch10:	0001-Merging-r324449.patch
-Patch11:	0002-Merging-r324645.patch
-Patch12:	0003-Merging-r325049.patch
-Patch13:	0004-Merging-r325085.patch
 Patch14:	0001-PPC-Avoid-non-simple-MVT-in-STBRX-optimization.patch
 
 BuildRequires:	cmake
@@ -218,6 +212,9 @@ fi
 %{_libdir}/cmake/llvm/LLVMStaticExports.cmake
 
 %changelog
+* Thu May 03 2018 Tom Stellard <tstellar@redhat.com> - 5.0.2-1
+- 5.0.2 Release
+
 * Tue Mar 27 2018 Tom Stellard <tstellar@redhat.com> - 5.0.1-6
 - Re-enable arm tests that used to hang
 
diff --git a/sources b/sources
index 16fb09f..daea6fc 100644
--- a/sources
+++ b/sources
@@ -1 +1 @@
-SHA512 (llvm-5.0.1.src.tar.xz) = bee1d45fca15ce725b1f2b1339b13eb6f750a3a321cfd099075477ec25835a8ca55b5366172c4aad46592dfd8afe372349ecf264f581463d017f9cee2d63c1cb
+SHA512 (llvm-5.0.2.src.tar.xz) = 3588be5ed969c3f7f6f16f56a12a6af2814d3d3c960d4a36ffebb0446cc75f19220bccee7fc605f9b01f5d5c188a905a046193cc12dec42dd5922048b5c27fe1