diff options
author | Dimitry Andric <dim@FreeBSD.org> | 2018-09-11 10:09:45 +0000 |
---|---|---|
committer | Dimitry Andric <dim@FreeBSD.org> | 2018-09-11 10:09:45 +0000 |
commit | 36272db3cad448211389168cced4baac39a1a0d1 (patch) | |
tree | e570502f0d6730e432657fc86304fa02a2de80fa | |
parent | 8568f9cb5af587ccee4088af3e2d617b3c30d403 (diff) |
Vendor import of llvm release_70 branch r341916:vendor/llvm/llvm-release_700-r342383vendor/llvm/llvm-release_70-r341916
Notes
Notes:
svn path=/vendor/llvm/dist-release_70/; revision=338575
svn path=/vendor/llvm/llvm-release_700-r342383/; revision=338716; tag=vendor/llvm/llvm-release_700-r342383
33 files changed, 648 insertions, 159 deletions
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst index 7dce9e2d60dd..158f0978bbbf 100644 --- a/docs/ReleaseNotes.rst +++ b/docs/ReleaseNotes.rst @@ -5,11 +5,6 @@ LLVM 7.0.0 Release Notes .. contents:: :local: -.. warning:: - These are in-progress notes for the upcoming LLVM 7 release. - Release notes for previous releases can be found on - `the Download Page <http://releases.llvm.org/download.html>`_. - Introduction ============ @@ -18,38 +13,27 @@ This document contains the release notes for the LLVM Compiler Infrastructure, release 7.0.0. Here we describe the status of LLVM, including major improvements from the previous release, improvements in various subprojects of LLVM, and some of the current users of the code. All LLVM releases may be downloaded -from the `LLVM releases web site <http://llvm.org/releases/>`_. +from the `LLVM releases web site <https://llvm.org/releases/>`_. For more information about LLVM, including information about the latest -release, please check out the `main LLVM web site <http://llvm.org/>`_. If you +release, please check out the `main LLVM web site <https://llvm.org/>`_. If you have questions or comments, the `LLVM Developer's Mailing List -<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send +<https://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send them. -Note that if you are reading this file from a Subversion checkout or the main -LLVM web page, this document applies to the *next* release, not the current -one. To see the release notes for a specific release, please see the `releases -page <http://llvm.org/releases/>`_. - Non-comprehensive list of changes in this release ================================================= -.. NOTE - For small 1-3 sentence descriptions, just add an entry at the end of - this list. If your description won't fit comfortably in one bullet - point (e.g. maybe you would like to give an example of the - functionality, or simply have a lot to talk about), see the `NOTE` below - for adding a new subsection. * The Windows installer no longer includes a Visual Studio integration. Instead, a new - `LLVM Compiler Toolchain Visual Studio extension <https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain>` - is available on the Visual Studio Marketplace. The new integration includes - support for Visual Studio 2017. + `LLVM Compiler Toolchain Visual Studio extension <https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain>`_ + is available on the Visual Studio Marketplace. The new integration + supports Visual Studio 2017. * Libraries have been renamed from 7.0 to 7. This change also impacts downstream libraries like lldb. -* The LoopInstSimplify pass (-loop-instsimplify) has been removed. +* The LoopInstSimplify pass (``-loop-instsimplify``) has been removed. * Symbols starting with ``?`` are no longer mangled by LLVM when using the Windows ``x`` or ``w`` IR mangling schemes. @@ -64,16 +48,13 @@ Non-comprehensive list of changes in this release information available in LLVM to statically predict the performance of machine code for a specific CPU. -* The optimization flag to merge constants (-fmerge-all-constants) is no longer - applied by default. - * Optimization of floating-point casts is improved. This may cause surprising - results for code that is relying on the undefined behavior of overflowing + results for code that is relying on the undefined behavior of overflowing casts. The optimization can be disabled by specifying a function attribute: - "strict-float-cast-overflow"="false". This attribute may be created by the + ``"strict-float-cast-overflow"="false"``. This attribute may be created by the clang option ``-fno-strict-float-cast-overflow``. - Code sanitizers can be used to detect affected patterns. The option for - detecting this problem alone is "-fsanitize=float-cast-overflow": + Code sanitizers can be used to detect affected patterns. The clang option for + detecting this problem alone is ``-fsanitize=float-cast-overflow``: .. code-block:: c @@ -86,7 +67,7 @@ Non-comprehensive list of changes in this release .. code-block:: bash - clang -O1 ftrunc.c -fsanitize=float-cast-overflow ; ./a.out + clang -O1 ftrunc.c -fsanitize=float-cast-overflow ; ./a.out ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int' junk in the ftrunc: 0.000000 @@ -104,19 +85,20 @@ Non-comprehensive list of changes in this release git grep -l 'DEBUG' | xargs perl -pi -e 's/\bDEBUG\s?\(/LLVM_DEBUG(/g' git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM -* Early support for UBsan, X-Ray instrumentation and libFuzzer (x86 and x86_64) for OpenBSD. Support for MSan - (x86_64), X-Ray instrumentation and libFuzzer (x86 and x86_64) for FreeBSD. +* Early support for UBsan, X-Ray instrumentation and libFuzzer (x86 and x86_64) + for OpenBSD. Support for MSan (x86_64), X-Ray instrumentation and libFuzzer + (x86 and x86_64) for FreeBSD. * ``SmallVector<T, 0>`` shrank from ``sizeof(void*) * 4 + sizeof(T)`` to ``sizeof(void*) + sizeof(unsigned) * 2``, smaller than ``std::vector<T>`` on 64-bit platforms. The maximum capacity is now restricted to ``UINT32_MAX``. Since SmallVector doesn't have the exception-safety pessimizations some - implementations saddle std::vector with and is better at using ``realloc``, - it's now a better choice even on the heap (although when TinyPtrVector works, - it's even smaller). + implementations saddle ``std::vector`` with and is better at using ``realloc``, + it's now a better choice even on the heap (although when ``TinyPtrVector`` works, + that's even smaller). * Preliminary/experimental support for DWARF v5 debugging information, - including the new .debug_names accelerator table. DWARF emitted at ``-O0`` + including the new ``.debug_names`` accelerator table. DWARF emitted at ``-O0`` should be fully DWARF v5 compliant. Type units and split DWARF are known not to be compliant, and higher optimization levels will still emit some information in v4 format. @@ -129,30 +111,24 @@ Non-comprehensive list of changes in this release but it can now handle leftover C declarations in preprocessor output, if given output from a preprocessor run externally.) -* CodeView debug info can now be emitted MinGW configurations, if requested. +* CodeView debug info can now be emitted for MinGW configurations, if requested. -* Note.. +* The :program:`opt` tool now supports the ``-load-pass-plugin`` option for + loading pass plugins for the new PassManager. -.. NOTE - If you would like to document a larger change, then you can add a - subsection about it right here. You can copy the following boilerplate - and un-indent it (the indentation causes it to be inside this comment). +* Support for profiling JITed code with perf. - Special New Feature - ------------------- - - Makes programs 10x faster by doing Special New Thing. Changes to the LLVM IR ---------------------- -* The signatures for the builtins @llvm.memcpy, @llvm.memmove, and @llvm.memset - have changed. Alignment is no longer an argument, and are instead conveyed as - parameter attributes. +* The signatures for the builtins ``@llvm.memcpy``, ``@llvm.memmove``, and + ``@llvm.memset`` have changed. Alignment is no longer an argument, and are + instead conveyed as parameter attributes. -* invariant.group.barrier has been renamed to launder.invariant.group. +* ``invariant.group.barrier`` has been renamed to ``launder.invariant.group``. -* invariant.group metadata can now refer only empty metadata nodes. +* ``invariant.group`` metadata can now refer only to empty metadata nodes. Changes to the AArch64 Target ----------------------------- @@ -160,10 +136,13 @@ Changes to the AArch64 Target * The ``.inst`` assembler directive is now usable on both COFF and Mach-O targets, in addition to ELF. -* Support for most remaining COFF relocations have been added. +* Support for most remaining COFF relocations has been added. * Support for TLS on Windows has been added. +* Assembler and disassembler support for the ARM Scalable Vector Extension has + been added. + Changes to the ARM Target ------------------------- @@ -187,13 +166,74 @@ Changes to the Hexagon Target Changes to the MIPS Target -------------------------- - During this release ... +During this release the MIPS target has: + +* Added support for Virtualization, Global INValidate ASE, + and CRC ASE instructions. + +* Introduced definitions of ``[d]rem``, ``[d]remu``, + and microMIPSR6 ``ll/sc`` instructions. + +* Shrink-wrapping is now supported and enabled by default (except for ``-O0``). + +* Extended size reduction pass by the LWP and SWP instructions. + +* Gained initial support of GlobalISel instruction selection framework. + +* Updated the P5600 scheduler model not to use instruction itineraries. +* Added disassembly support for comparison and fused (negative) multiply + ``add/sub`` instructions. + +* Improved the selection of multiple instructions. + +* Load/store ``lb``, ``sb``, ``ld``, ``sd``, ``lld``, ... instructions + now support 32/64-bit offsets. + +* Added support for ``y``, ``M``, and ``L`` inline assembler operand codes. + +* Extended list of relocations supported by the ``.reloc`` directive + +* Fixed using a wrong register class for creating an emergency + spill slot for mips3 / n64 ABI. + +* MIPS relocation types were generated for microMIPS code. + +* Corrected definitions of multiple instructions (``lwp``, ``swp``, ``ctc2``, + ``cfc2``, ``sync``, ``synci``, ``cvt.d.w``, ...). + +* Fixed atomic operations at ``-O0`` level. + +* Fixed local dynamic TLS with Sym64 Changes to the PowerPC Target ----------------------------- - During this release ... +During this release the PowerPC target has: + +* Replaced the list scheduler for post register allocation with the machine scheduler. + +* Added support for ``coldcc`` calling convention. + +* Added support for ``symbol@high`` and ``symbol@higha`` symbol modifiers. + +* Added support for quad-precision floating point type (``__float128``) under the llvm option ``-enable-ppc-quad-precision``. + +* Added dump function to ``LatencyPriorityQueue``. + +* Completed the Power9 scheduler model. + +* Optimized TLS code generation. + +* Improved MachineLICM for hoisting constant stores. + +* Improved code generation to reduce register use by using more register + immediate instructions. + +* Improved code generation to better exploit rotate-and-mask instructions. + +* Fixed the bug in dynamic loader for JIT which crashed NNVM. + +* Numerous bug fixes and code cleanups. Changes to the SystemZ Target ----------------------------- @@ -226,57 +266,61 @@ Changes to the X86 Target environments - in MSVC environments, long doubles are the same size as normal doubles.) -Changes to the AMDGPU Target ------------------------------ - - During this release ... - -Changes to the AVR Target ------------------------------ - - During this release ... - Changes to the OCaml bindings ----------------------------- -* Remove ``add_bb_vectorize``. +* Removed ``add_bb_vectorize``. Changes to the C API -------------------- -* Remove ``LLVMAddBBVectorizePass``. The implementation was removed and the C +* Removed ``LLVMAddBBVectorizePass``. The implementation was removed and the C interface was made a deprecated no-op in LLVM 5. Use ``LLVMAddSLPVectorizePass`` instead to get the supported SLP vectorizer. +* Expanded the OrcJIT APIs so they can register event listeners like debuggers + and profilers. + Changes to the DAG infrastructure --------------------------------- -* ADDC/ADDE/SUBC/SUBE are now deprecated and will default to expand. Backends - that wish to continue to use these opcodes should explicitely request so +* ``ADDC``/``ADDE``/``SUBC``/``SUBE`` are now deprecated and will default to expand. Backends + that wish to continue to use these opcodes should explicitely request to do so using ``setOperationAction`` in their ``TargetLowering``. New backends - should use UADDO/ADDCARRY/USUBO/SUBCARRY instead of the deprecated opcodes. + should use ``UADDO``/``ADDCARRY``/``USUBO``/``SUBCARRY`` instead of the deprecated opcodes. -* The SETCCE opcode has now been removed in favor of SETCCCARRY. +* The ``SETCCE`` opcode has now been removed in favor of ``SETCCCARRY``. + +* TableGen now supports multi-alternative pattern fragments via the ``PatFrags`` + class. ``PatFrag`` is now derived from ``PatFrags``, which may require minor + changes to backends that directly access ``PatFrag`` members. -* TableGen now supports multi-alternative pattern fragments via the PatFrags - class. PatFrag is now derived from PatFrags, which may require minor - changes to backends that directly access PatFrag members. External Open Source Projects Using LLVM 7 ========================================== -* A project... +Zig Programming Language +------------------------ + +`Zig <https://ziglang.org>`_ is an open-source programming language designed +for robustness, optimality, and clarity. Zig is an alternative to C, providing +high level features such as generics, compile time function execution, partial +evaluation, and LLVM-based coroutines, while exposing low level LLVM IR +features such as aliases and intrinsics. Zig uses Clang to provide automatic +import of .h symbols - even inline functions and macros. Zig uses LLD combined +with lazily building compiler-rt to provide out-of-the-box cross-compiling for +all supported targets. Additional Information ====================== A wide variety of additional information is available on the `LLVM web page -<http://llvm.org/>`_, in particular in the `documentation -<http://llvm.org/docs/>`_ section. The web page also contains versions of the +<https://llvm.org/>`_, in particular in the `documentation +<https://llvm.org/docs/>`_ section. The web page also contains versions of the API documentation which is up-to-date with the Subversion version of the source code. You can access versions of these documents specific to this release by going into the ``llvm/docs/`` directory in the LLVM tree. If you have any questions or comments about LLVM, please feel free to contact -us via the `mailing lists <http://llvm.org/docs/#maillist>`_. +us via the `mailing lists <https://llvm.org/docs/#mailing-lists>`_. diff --git a/docs/index.rst b/docs/index.rst index 2173f94459dd..4e8f10dfbe62 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,11 +1,6 @@ Overview ======== -.. warning:: - - If you are using a released version of LLVM, see `the download page - <http://llvm.org/releases/>`_ to find your documentation. - The LLVM compiler infrastructure supports a wide range of projects, from industrial strength compilers to specialized JIT applications to small research projects. diff --git a/lib/MC/MCParser/AsmParser.cpp b/lib/MC/MCParser/AsmParser.cpp index 501a1cccf60e..d88c6f76826f 100644 --- a/lib/MC/MCParser/AsmParser.cpp +++ b/lib/MC/MCParser/AsmParser.cpp @@ -3348,17 +3348,17 @@ bool AsmParser::parseDirectiveFile(SMLoc DirectiveLoc) { } } - // In case there is a -g option as well as debug info from directive .file, - // we turn off the -g option, directly use the existing debug info instead. - // Also reset any implicit ".file 0" for the assembler source. - if (Ctx.getGenDwarfForAssembly()) { - Ctx.getMCDwarfLineTable(0).resetRootFile(); - Ctx.setGenDwarfForAssembly(false); - } - if (FileNumber == -1) getStreamer().EmitFileDirective(Filename); else { + // In case there is a -g option as well as debug info from directive .file, + // we turn off the -g option, directly use the existing debug info instead. + // Also reset any implicit ".file 0" for the assembler source. + if (Ctx.getGenDwarfForAssembly()) { + Ctx.getMCDwarfLineTable(0).resetRootFile(); + Ctx.setGenDwarfForAssembly(false); + } + MD5::MD5Result *CKMem = nullptr; if (HasMD5) { CKMem = (MD5::MD5Result *)Ctx.allocate(sizeof(MD5::MD5Result), 1); diff --git a/lib/Support/Unix/Path.inc b/lib/Support/Unix/Path.inc index 7ad57d892ff1..b4279d4fcc0c 100644 --- a/lib/Support/Unix/Path.inc +++ b/lib/Support/Unix/Path.inc @@ -769,8 +769,10 @@ std::error_code openFile(const Twine &Name, int &ResultFD, SmallString<128> Storage; StringRef P = Name.toNullTerminatedStringRef(Storage); - if ((ResultFD = sys::RetryAfterSignal(-1, ::open, P.begin(), OpenFlags, Mode)) < - 0) + // Call ::open in a lambda to avoid overload resolution in RetryAfterSignal + // when open is overloaded, such as in Bionic. + auto Open = [&]() { return ::open(P.begin(), OpenFlags, Mode); }; + if ((ResultFD = sys::RetryAfterSignal(-1, Open)) < 0) return std::error_code(errno, std::generic_category()); #ifndef O_CLOEXEC if (!(Flags & OF_ChildInherit)) { diff --git a/lib/Support/Unix/Process.inc b/lib/Support/Unix/Process.inc index fa515d44f3f2..3185f45a3a61 100644 --- a/lib/Support/Unix/Process.inc +++ b/lib/Support/Unix/Process.inc @@ -211,7 +211,10 @@ std::error_code Process::FixupStandardFileDescriptors() { assert(errno == EBADF && "expected errno to have EBADF at this point!"); if (NullFD < 0) { - if ((NullFD = RetryAfterSignal(-1, ::open, "/dev/null", O_RDWR)) < 0) + // Call ::open in a lambda to avoid overload resolution in + // RetryAfterSignal when open is overloaded, such as in Bionic. + auto Open = [&]() { return ::open("/dev/null", O_RDWR); }; + if ((NullFD = RetryAfterSignal(-1, Open)) < 0) return std::error_code(errno, std::generic_category()); } diff --git a/lib/Target/AMDGPU/AMDGPU.h b/lib/Target/AMDGPU/AMDGPU.h index 796766d94622..2b49c2ea88e1 100644 --- a/lib/Target/AMDGPU/AMDGPU.h +++ b/lib/Target/AMDGPU/AMDGPU.h @@ -229,7 +229,7 @@ struct AMDGPUAS { enum : unsigned { // The maximum value for flat, generic, local, private, constant and region. - MAX_COMMON_ADDRESS = 5, + MAX_AMDGPU_ADDRESS = 6, GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0). CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2) diff --git a/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp b/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp index ef4b69d09d9f..974fbcb87191 100644 --- a/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp +++ b/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp @@ -50,47 +50,51 @@ void AMDGPUAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const { AMDGPUAAResult::ASAliasRulesTy::ASAliasRulesTy(AMDGPUAS AS_, Triple::ArchType Arch_) : Arch(Arch_), AS(AS_) { // These arrarys are indexed by address space value - // enum elements 0 ... to 5 - static const AliasResult ASAliasRulesPrivIsZero[6][6] = { - /* Private Global Constant Group Flat Region*/ - /* Private */ {MayAlias, NoAlias , NoAlias , NoAlias , MayAlias, NoAlias}, - /* Global */ {NoAlias , MayAlias, NoAlias , NoAlias , MayAlias, NoAlias}, - /* Constant */ {NoAlias , NoAlias , MayAlias, NoAlias , MayAlias, NoAlias}, - /* Group */ {NoAlias , NoAlias , NoAlias , MayAlias, MayAlias, NoAlias}, - /* Flat */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias}, - /* Region */ {NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, MayAlias} + // enum elements 0 ... to 6 + static const AliasResult ASAliasRulesPrivIsZero[7][7] = { + /* Private Global Constant Group Flat Region Constant 32-bit */ + /* Private */ {MayAlias, NoAlias , NoAlias , NoAlias , MayAlias, NoAlias , NoAlias}, + /* Global */ {NoAlias , MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , MayAlias}, + /* Constant */ {NoAlias , MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , MayAlias}, + /* Group */ {NoAlias , NoAlias , NoAlias , MayAlias, MayAlias, NoAlias , NoAlias}, + /* Flat */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias}, + /* Region */ {NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, MayAlias, NoAlias}, + /* Constant 32-bit */ {NoAlias , MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , MayAlias} }; - static const AliasResult ASAliasRulesGenIsZero[6][6] = { - /* Flat Global Region Group Constant Private */ - /* Flat */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias}, - /* Global */ {MayAlias, MayAlias, NoAlias , NoAlias , NoAlias , NoAlias}, - /* Constant */ {MayAlias, NoAlias , MayAlias, NoAlias , NoAlias, NoAlias}, - /* Group */ {MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , NoAlias}, - /* Region */ {MayAlias, NoAlias , NoAlias , NoAlias, MayAlias, NoAlias}, - /* Private */ {MayAlias, NoAlias , NoAlias , NoAlias , NoAlias , MayAlias} + static const AliasResult ASAliasRulesGenIsZero[7][7] = { + /* Flat Global Region Group Constant Private Constant 32-bit */ + /* Flat */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias}, + /* Global */ {MayAlias, MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , MayAlias}, + /* Region */ {MayAlias, NoAlias , NoAlias , NoAlias, MayAlias, NoAlias , MayAlias}, + /* Group */ {MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , NoAlias , NoAlias}, + /* Constant */ {MayAlias, MayAlias, MayAlias, NoAlias , NoAlias, NoAlias , MayAlias}, + /* Private */ {MayAlias, NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, NoAlias}, + /* Constant 32-bit */ {MayAlias, MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , NoAlias} }; - assert(AS.MAX_COMMON_ADDRESS <= 5); + static_assert(AMDGPUAS::MAX_AMDGPU_ADDRESS <= 6, "Addr space out of range"); if (AS.FLAT_ADDRESS == 0) { - assert(AS.GLOBAL_ADDRESS == 1 && - AS.REGION_ADDRESS == 2 && - AS.LOCAL_ADDRESS == 3 && - AS.CONSTANT_ADDRESS == 4 && - AS.PRIVATE_ADDRESS == 5); + assert(AS.GLOBAL_ADDRESS == 1 && + AS.REGION_ADDRESS == 2 && + AS.LOCAL_ADDRESS == 3 && + AS.CONSTANT_ADDRESS == 4 && + AS.PRIVATE_ADDRESS == 5 && + AS.CONSTANT_ADDRESS_32BIT == 6); ASAliasRules = &ASAliasRulesGenIsZero; } else { - assert(AS.PRIVATE_ADDRESS == 0 && - AS.GLOBAL_ADDRESS == 1 && - AS.CONSTANT_ADDRESS == 2 && - AS.LOCAL_ADDRESS == 3 && - AS.FLAT_ADDRESS == 4 && - AS.REGION_ADDRESS == 5); + assert(AS.PRIVATE_ADDRESS == 0 && + AS.GLOBAL_ADDRESS == 1 && + AS.CONSTANT_ADDRESS == 2 && + AS.LOCAL_ADDRESS == 3 && + AS.FLAT_ADDRESS == 4 && + AS.REGION_ADDRESS == 5 && + AS.CONSTANT_ADDRESS_32BIT == 6); ASAliasRules = &ASAliasRulesPrivIsZero; } } AliasResult AMDGPUAAResult::ASAliasRulesTy::getAliasResult(unsigned AS1, unsigned AS2) const { - if (AS1 > AS.MAX_COMMON_ADDRESS || AS2 > AS.MAX_COMMON_ADDRESS) { + if (AS1 > AS.MAX_AMDGPU_ADDRESS || AS2 > AS.MAX_AMDGPU_ADDRESS) { if (Arch == Triple::amdgcn) report_fatal_error("Pointer address space out of range"); return AS1 == AS2 ? MayAlias : NoAlias; diff --git a/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h b/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h index 645a38af753c..09ad51d5e42f 100644 --- a/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h +++ b/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h @@ -63,7 +63,7 @@ private: private: Triple::ArchType Arch; AMDGPUAS AS; - const AliasResult (*ASAliasRules)[6][6]; + const AliasResult (*ASAliasRules)[7][7]; } ASAliasRules; }; diff --git a/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index f25f4d4693ea..7cb0e12a6809 100644 --- a/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -1451,7 +1451,11 @@ bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset, bool &Imm) const { SDLoc SL(Addr); - if (CurDAG->isBaseWithConstantOffset(Addr)) { + // A 32-bit (address + offset) should not cause unsigned 32-bit integer + // wraparound, because s_load instructions perform the addition in 64 bits. + if ((Addr.getValueType() != MVT::i32 || + Addr->getFlags().hasNoUnsignedWrap()) && + CurDAG->isBaseWithConstantOffset(Addr)) { SDValue N0 = Addr.getOperand(0); SDValue N1 = Addr.getOperand(1); diff --git a/lib/Target/ARM/ARMFrameLowering.cpp b/lib/Target/ARM/ARMFrameLowering.cpp index a8c75702d7b5..56ad7a0f0446 100644 --- a/lib/Target/ARM/ARMFrameLowering.cpp +++ b/lib/Target/ARM/ARMFrameLowering.cpp @@ -1514,6 +1514,7 @@ static unsigned estimateRSStackSizeLimit(MachineFunction &MF, break; case ARMII::AddrMode5: case ARMII::AddrModeT2_i8s4: + case ARMII::AddrModeT2_ldrex: Limit = std::min(Limit, ((1U << 8) - 1) * 4); break; case ARMII::AddrModeT2_i12: diff --git a/lib/Target/ARM/ARMInstrFormats.td b/lib/Target/ARM/ARMInstrFormats.td index 70aded247f65..1d3b1414f090 100644 --- a/lib/Target/ARM/ARMInstrFormats.td +++ b/lib/Target/ARM/ARMInstrFormats.td @@ -109,6 +109,7 @@ def AddrModeT2_pc : AddrMode<14>; def AddrModeT2_i8s4 : AddrMode<15>; def AddrMode_i12 : AddrMode<16>; def AddrMode5FP16 : AddrMode<17>; +def AddrModeT2_ldrex : AddrMode<18>; // Load / store index mode. class IndexMode<bits<2> val> { diff --git a/lib/Target/ARM/ARMInstrThumb2.td b/lib/Target/ARM/ARMInstrThumb2.td index c7133b6483ef..f67075fbf9fd 100644 --- a/lib/Target/ARM/ARMInstrThumb2.td +++ b/lib/Target/ARM/ARMInstrThumb2.td @@ -3267,7 +3267,7 @@ def t2LDREXH : T2I_ldrex<0b0101, (outs rGPR:$Rt), (ins addr_offset_none:$addr), [(set rGPR:$Rt, (ldrex_2 addr_offset_none:$addr))]>, Requires<[IsThumb, HasV8MBaseline]>; def t2LDREX : Thumb2I<(outs rGPR:$Rt), (ins t2addrmode_imm0_1020s4:$addr), - AddrModeNone, 4, NoItinerary, + AddrModeT2_ldrex, 4, NoItinerary, "ldrex", "\t$Rt, $addr", "", [(set rGPR:$Rt, (ldrex_4 t2addrmode_imm0_1020s4:$addr))]>, Requires<[IsThumb, HasV8MBaseline]> { @@ -3346,7 +3346,7 @@ def t2STREXH : T2I_strex<0b0101, (outs rGPR:$Rd), def t2STREX : Thumb2I<(outs rGPR:$Rd), (ins rGPR:$Rt, t2addrmode_imm0_1020s4:$addr), - AddrModeNone, 4, NoItinerary, + AddrModeT2_ldrex, 4, NoItinerary, "strex", "\t$Rd, $Rt, $addr", "", [(set rGPR:$Rd, (strex_4 rGPR:$Rt, t2addrmode_imm0_1020s4:$addr))]>, diff --git a/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h b/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h index b918006fe9e3..beeb5dec4baf 100644 --- a/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h +++ b/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h @@ -201,7 +201,8 @@ namespace ARMII { AddrModeT2_pc = 14, // +/- i12 for pc relative data AddrModeT2_i8s4 = 15, // i8 * 4 AddrMode_i12 = 16, - AddrMode5FP16 = 17 // i8 * 2 + AddrMode5FP16 = 17, // i8 * 2 + AddrModeT2_ldrex = 18, // i8 * 4, with unscaled offset in MCInst }; inline static const char *AddrModeToString(AddrMode addrmode) { @@ -224,6 +225,7 @@ namespace ARMII { case AddrModeT2_pc: return "AddrModeT2_pc"; case AddrModeT2_i8s4: return "AddrModeT2_i8s4"; case AddrMode_i12: return "AddrMode_i12"; + case AddrModeT2_ldrex:return "AddrModeT2_ldrex"; } } diff --git a/lib/Target/ARM/Thumb2InstrInfo.cpp b/lib/Target/ARM/Thumb2InstrInfo.cpp index d5f0ba9ee485..1a91a7030657 100644 --- a/lib/Target/ARM/Thumb2InstrInfo.cpp +++ b/lib/Target/ARM/Thumb2InstrInfo.cpp @@ -621,6 +621,11 @@ bool llvm::rewriteT2FrameIndex(MachineInstr &MI, unsigned FrameRegIdx, // MCInst operand expects already scaled value. Scale = 1; assert((Offset & 3) == 0 && "Can't encode this offset!"); + } else if (AddrMode == ARMII::AddrModeT2_ldrex) { + Offset += MI.getOperand(FrameRegIdx + 1).getImm() * 4; + NumBits = 8; // 8 bits scaled by 4 + Scale = 4; + assert((Offset & 3) == 0 && "Can't encode this offset!"); } else { llvm_unreachable("Unsupported addressing mode!"); } diff --git a/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp b/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp index 6c255e9ef780..1822d8688fa2 100644 --- a/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp +++ b/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp @@ -10,6 +10,8 @@ #include "MCTargetDesc/BPFMCTargetDesc.h" #include "llvm/ADT/StringRef.h" #include "llvm/MC/MCAsmBackend.h" +#include "llvm/MC/MCAssembler.h" +#include "llvm/MC/MCContext.h" #include "llvm/MC/MCFixup.h" #include "llvm/MC/MCObjectWriter.h" #include "llvm/Support/EndianStream.h" @@ -71,7 +73,12 @@ void BPFAsmBackend::applyFixup(const MCAssembler &Asm, const MCFixup &Fixup, bool IsResolved, const MCSubtargetInfo *STI) const { if (Fixup.getKind() == FK_SecRel_4 || Fixup.getKind() == FK_SecRel_8) { - assert(Value == 0); + if (Value) { + MCContext &Ctx = Asm.getContext(); + Ctx.reportError(Fixup.getLoc(), + "Unsupported relocation: try to compile with -O2 or above, " + "or check your static variable usage"); + } } else if (Fixup.getKind() == FK_Data_4) { support::endian::write<uint32_t>(&Data[Fixup.getOffset()], Value, Endian); } else if (Fixup.getKind() == FK_Data_8) { diff --git a/lib/Target/X86/AsmParser/X86AsmParser.cpp b/lib/Target/X86/AsmParser/X86AsmParser.cpp index b02e4d80fbba..8b7b250e1a09 100644 --- a/lib/Target/X86/AsmParser/X86AsmParser.cpp +++ b/lib/Target/X86/AsmParser/X86AsmParser.cpp @@ -1054,7 +1054,7 @@ static bool CheckBaseRegAndIndexRegAndScale(unsigned BaseReg, unsigned IndexReg, // RIP/EIP-relative addressing is only supported in 64-bit mode. if (!Is64BitMode && BaseReg != 0 && (BaseReg == X86::RIP || BaseReg == X86::EIP)) { - ErrMsg = "RIP-relative addressing requires 64-bit mode"; + ErrMsg = "IP-relative addressing requires 64-bit mode"; return true; } @@ -1099,7 +1099,7 @@ bool X86AsmParser::ParseRegister(unsigned &RegNo, // checked. // FIXME: Check AH, CH, DH, BH cannot be used in an instruction requiring a // REX prefix. - if (RegNo == X86::RIZ || RegNo == X86::RIP || RegNo == X86::EIP || + if (RegNo == X86::RIZ || RegNo == X86::RIP || X86MCRegisterClasses[X86::GR64RegClassID].contains(RegNo) || X86II::isX86_64NonExtLowByteReg(RegNo) || X86II::isX86_64ExtendedReg(RegNo)) diff --git a/lib/Transforms/Scalar/LoopSink.cpp b/lib/Transforms/Scalar/LoopSink.cpp index 760177c9c5e9..7d62349d4719 100644 --- a/lib/Transforms/Scalar/LoopSink.cpp +++ b/lib/Transforms/Scalar/LoopSink.cpp @@ -152,6 +152,14 @@ findBBsToSinkInto(const Loop &L, const SmallPtrSetImpl<BasicBlock *> &UseBBs, } } + // Can't sink into blocks that have no valid insertion point. + for (BasicBlock *BB : BBsToSinkInto) { + if (BB->getFirstInsertionPt() == BB->end()) { + BBsToSinkInto.clear(); + break; + } + } + // If the total frequency of BBsToSinkInto is larger than preheader frequency, // do not sink. if (adjustedSumFreq(BBsToSinkInto, BFI) > diff --git a/lib/Transforms/Scalar/SROA.cpp b/lib/Transforms/Scalar/SROA.cpp index de16b608f752..bf482bf5272e 100644 --- a/lib/Transforms/Scalar/SROA.cpp +++ b/lib/Transforms/Scalar/SROA.cpp @@ -3046,6 +3046,42 @@ private: return true; } + void fixLoadStoreAlign(Instruction &Root) { + // This algorithm implements the same visitor loop as + // hasUnsafePHIOrSelectUse, and fixes the alignment of each load + // or store found. + SmallPtrSet<Instruction *, 4> Visited; + SmallVector<Instruction *, 4> Uses; + Visited.insert(&Root); + Uses.push_back(&Root); + do { + Instruction *I = Uses.pop_back_val(); + + if (LoadInst *LI = dyn_cast<LoadInst>(I)) { + unsigned LoadAlign = LI->getAlignment(); + if (!LoadAlign) + LoadAlign = DL.getABITypeAlignment(LI->getType()); + LI->setAlignment(std::min(LoadAlign, getSliceAlign())); + continue; + } + if (StoreInst *SI = dyn_cast<StoreInst>(I)) { + unsigned StoreAlign = SI->getAlignment(); + if (!StoreAlign) { + Value *Op = SI->getOperand(0); + StoreAlign = DL.getABITypeAlignment(Op->getType()); + } + SI->setAlignment(std::min(StoreAlign, getSliceAlign())); + continue; + } + + assert(isa<BitCastInst>(I) || isa<PHINode>(I) || + isa<SelectInst>(I) || isa<GetElementPtrInst>(I)); + for (User *U : I->users()) + if (Visited.insert(cast<Instruction>(U)).second) + Uses.push_back(cast<Instruction>(U)); + } while (!Uses.empty()); + } + bool visitPHINode(PHINode &PN) { LLVM_DEBUG(dbgs() << " original: " << PN << "\n"); assert(BeginOffset >= NewAllocaBeginOffset && "PHIs are unsplittable"); @@ -3069,6 +3105,9 @@ private: LLVM_DEBUG(dbgs() << " to: " << PN << "\n"); deleteIfTriviallyDead(OldPtr); + // Fix the alignment of any loads or stores using this PHI node. + fixLoadStoreAlign(PN); + // PHIs can't be promoted on their own, but often can be speculated. We // check the speculation outside of the rewriter so that we see the // fully-rewritten alloca. @@ -3093,6 +3132,9 @@ private: LLVM_DEBUG(dbgs() << " to: " << SI << "\n"); deleteIfTriviallyDead(OldPtr); + // Fix the alignment of any loads or stores using this select. + fixLoadStoreAlign(SI); + // Selects can't be promoted on their own, but often can be speculated. We // check the speculation outside of the rewriter so that we see the // fully-rewritten alloca. diff --git a/lib/Transforms/Utils/CloneFunction.cpp b/lib/Transforms/Utils/CloneFunction.cpp index 807360340055..9ae60962a631 100644 --- a/lib/Transforms/Utils/CloneFunction.cpp +++ b/lib/Transforms/Utils/CloneFunction.cpp @@ -636,6 +636,22 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc, Function::iterator Begin = cast<BasicBlock>(VMap[StartingBB])->getIterator(); Function::iterator I = Begin; while (I != NewFunc->end()) { + // We need to simplify conditional branches and switches with a constant + // operand. We try to prune these out when cloning, but if the + // simplification required looking through PHI nodes, those are only + // available after forming the full basic block. That may leave some here, + // and we still want to prune the dead code as early as possible. + // + // Do the folding before we check if the block is dead since we want code + // like + // bb: + // br i1 undef, label %bb, label %bb + // to be simplified to + // bb: + // br label %bb + // before we call I->getSinglePredecessor(). + ConstantFoldTerminator(&*I); + // Check if this block has become dead during inlining or other // simplifications. Note that the first block will appear dead, as it has // not yet been wired up properly. @@ -646,13 +662,6 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc, continue; } - // We need to simplify conditional branches and switches with a constant - // operand. We try to prune these out when cloning, but if the - // simplification required looking through PHI nodes, those are only - // available after forming the full basic block. That may leave some here, - // and we still want to prune the dead code as early as possible. - ConstantFoldTerminator(&*I); - BranchInst *BI = dyn_cast<BranchInst>(I->getTerminator()); if (!BI || BI->isConditional()) { ++I; continue; } diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp index 859d0c92ca5a..1c7d0a63a5ca 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -4510,6 +4510,13 @@ void LoopVectorizationCostModel::collectLoopUniforms(unsigned VF) { for (auto OV : I->operand_values()) { if (isOutOfScope(OV)) continue; + // First order recurrence Phi's should typically be considered + // non-uniform. + auto *OP = dyn_cast<PHINode>(OV); + if (OP && Legal->isFirstOrderRecurrence(OP)) + continue; + // If all the users of the operand are uniform, then add the + // operand into the uniform worklist. auto *OI = cast<Instruction>(OV); if (llvm::all_of(OI->users(), [&](User *U) -> bool { auto *J = cast<Instruction>(U); diff --git a/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll b/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll index 51d96498c53e..35614634e763 100644 --- a/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll +++ b/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll @@ -7,3 +7,27 @@ define void @test(i8 addrspace(5)* %p, i8 addrspace(1)* %p1) { ret void } +; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(4)* %p + +define void @test_constant_vs_global(i8 addrspace(4)* %p, i8 addrspace(1)* %p1) { + ret void +} + +; CHECK: MayAlias: i8 addrspace(1)* %p, i8 addrspace(4)* %p1 + +define void @test_global_vs_constant(i8 addrspace(1)* %p, i8 addrspace(4)* %p1) { + ret void +} + +; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(6)* %p + +define void @test_constant_32bit_vs_global(i8 addrspace(6)* %p, i8 addrspace(1)* %p1) { + ret void +} + +; CHECK: MayAlias: i8 addrspace(4)* %p1, i8 addrspace(6)* %p + +define void @test_constant_32bit_vs_constant(i8 addrspace(6)* %p, i8 addrspace(4)* %p1) { + ret void +} + diff --git a/test/CodeGen/AMDGPU/constant-address-space-32bit.ll b/test/CodeGen/AMDGPU/constant-address-space-32bit.ll index 4522497a85e8..040bcbc01827 100644 --- a/test/CodeGen/AMDGPU/constant-address-space-32bit.ll +++ b/test/CodeGen/AMDGPU/constant-address-space-32bit.ll @@ -12,7 +12,7 @@ ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0 ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8 define amdgpu_vs float @load_i32(i32 addrspace(6)* inreg %p0, i32 addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr i32, i32 addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds i32, i32 addrspace(6)* %p1, i32 2 %r0 = load i32, i32 addrspace(6)* %p0 %r1 = load i32, i32 addrspace(6)* %gep1 %r = add i32 %r0, %r1 @@ -29,7 +29,7 @@ define amdgpu_vs float @load_i32(i32 addrspace(6)* inreg %p0, i32 addrspace(6)* ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10 define amdgpu_vs <2 x float> @load_v2i32(<2 x i32> addrspace(6)* inreg %p0, <2 x i32> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <2 x i32>, <2 x i32> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <2 x i32>, <2 x i32> addrspace(6)* %p1, i32 2 %r0 = load <2 x i32>, <2 x i32> addrspace(6)* %p0 %r1 = load <2 x i32>, <2 x i32> addrspace(6)* %gep1 %r = add <2 x i32> %r0, %r1 @@ -46,7 +46,7 @@ define amdgpu_vs <2 x float> @load_v2i32(<2 x i32> addrspace(6)* inreg %p0, <2 x ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20 define amdgpu_vs <4 x float> @load_v4i32(<4 x i32> addrspace(6)* inreg %p0, <4 x i32> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <4 x i32>, <4 x i32> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(6)* %p1, i32 2 %r0 = load <4 x i32>, <4 x i32> addrspace(6)* %p0 %r1 = load <4 x i32>, <4 x i32> addrspace(6)* %gep1 %r = add <4 x i32> %r0, %r1 @@ -63,7 +63,7 @@ define amdgpu_vs <4 x float> @load_v4i32(<4 x i32> addrspace(6)* inreg %p0, <4 x ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40 define amdgpu_vs <8 x float> @load_v8i32(<8 x i32> addrspace(6)* inreg %p0, <8 x i32> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <8 x i32>, <8 x i32> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <8 x i32>, <8 x i32> addrspace(6)* %p1, i32 2 %r0 = load <8 x i32>, <8 x i32> addrspace(6)* %p0 %r1 = load <8 x i32>, <8 x i32> addrspace(6)* %gep1 %r = add <8 x i32> %r0, %r1 @@ -80,7 +80,7 @@ define amdgpu_vs <8 x float> @load_v8i32(<8 x i32> addrspace(6)* inreg %p0, <8 x ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80 define amdgpu_vs <16 x float> @load_v16i32(<16 x i32> addrspace(6)* inreg %p0, <16 x i32> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <16 x i32>, <16 x i32> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <16 x i32>, <16 x i32> addrspace(6)* %p1, i32 2 %r0 = load <16 x i32>, <16 x i32> addrspace(6)* %p0 %r1 = load <16 x i32>, <16 x i32> addrspace(6)* %gep1 %r = add <16 x i32> %r0, %r1 @@ -97,7 +97,7 @@ define amdgpu_vs <16 x float> @load_v16i32(<16 x i32> addrspace(6)* inreg %p0, < ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0 ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8 define amdgpu_vs float @load_float(float addrspace(6)* inreg %p0, float addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr float, float addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds float, float addrspace(6)* %p1, i32 2 %r0 = load float, float addrspace(6)* %p0 %r1 = load float, float addrspace(6)* %gep1 %r = fadd float %r0, %r1 @@ -113,7 +113,7 @@ define amdgpu_vs float @load_float(float addrspace(6)* inreg %p0, float addrspac ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10 define amdgpu_vs <2 x float> @load_v2float(<2 x float> addrspace(6)* inreg %p0, <2 x float> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <2 x float>, <2 x float> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <2 x float>, <2 x float> addrspace(6)* %p1, i32 2 %r0 = load <2 x float>, <2 x float> addrspace(6)* %p0 %r1 = load <2 x float>, <2 x float> addrspace(6)* %gep1 %r = fadd <2 x float> %r0, %r1 @@ -129,7 +129,7 @@ define amdgpu_vs <2 x float> @load_v2float(<2 x float> addrspace(6)* inreg %p0, ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20 define amdgpu_vs <4 x float> @load_v4float(<4 x float> addrspace(6)* inreg %p0, <4 x float> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <4 x float>, <4 x float> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <4 x float>, <4 x float> addrspace(6)* %p1, i32 2 %r0 = load <4 x float>, <4 x float> addrspace(6)* %p0 %r1 = load <4 x float>, <4 x float> addrspace(6)* %gep1 %r = fadd <4 x float> %r0, %r1 @@ -145,7 +145,7 @@ define amdgpu_vs <4 x float> @load_v4float(<4 x float> addrspace(6)* inreg %p0, ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40 define amdgpu_vs <8 x float> @load_v8float(<8 x float> addrspace(6)* inreg %p0, <8 x float> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <8 x float>, <8 x float> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <8 x float>, <8 x float> addrspace(6)* %p1, i32 2 %r0 = load <8 x float>, <8 x float> addrspace(6)* %p0 %r1 = load <8 x float>, <8 x float> addrspace(6)* %gep1 %r = fadd <8 x float> %r0, %r1 @@ -161,7 +161,7 @@ define amdgpu_vs <8 x float> @load_v8float(<8 x float> addrspace(6)* inreg %p0, ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0 ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80 define amdgpu_vs <16 x float> @load_v16float(<16 x float> addrspace(6)* inreg %p0, <16 x float> addrspace(6)* inreg %p1) #0 { - %gep1 = getelementptr <16 x float>, <16 x float> addrspace(6)* %p1, i64 2 + %gep1 = getelementptr inbounds <16 x float>, <16 x float> addrspace(6)* %p1, i32 2 %r0 = load <16 x float>, <16 x float> addrspace(6)* %p0 %r1 = load <16 x float>, <16 x float> addrspace(6)* %gep1 %r = fadd <16 x float> %r0, %r1 @@ -212,12 +212,12 @@ main_body: %22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8 %23 = bitcast float %22 to i32 %24 = shl i32 %23, 1 - %25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0 + %25 = getelementptr inbounds [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0 %26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0 %27 = shl i32 %23, 2 %28 = or i32 %27, 3 %29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)* - %30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0 + %30 = getelementptr inbounds [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0 %31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0 %32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8 %33 = extractelement <4 x float> %32, i32 0 @@ -246,12 +246,12 @@ main_body: %22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8 %23 = bitcast float %22 to i32 %24 = shl i32 %23, 1 - %25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24 + %25 = getelementptr inbounds [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24 %26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0 %27 = shl i32 %23, 2 %28 = or i32 %27, 3 %29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)* - %30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28 + %30 = getelementptr inbounds [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28 %31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0 %32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8 %33 = extractelement <4 x float> %32, i32 0 @@ -268,6 +268,17 @@ main_body: ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43 } +; GCN-LABEL: {{^}}load_addr_no_fold: +; GCN-DAG: s_add_i32 s0, s0, 4 +; GCN-DAG: s_mov_b32 s1, 0 +; GCN: s_load_dword s{{[0-9]}}, s[0:1], 0x0 +define amdgpu_vs float @load_addr_no_fold(i32 addrspace(6)* inreg noalias %p0) #0 { + %gep1 = getelementptr i32, i32 addrspace(6)* %p0, i32 1 + %r1 = load i32, i32 addrspace(6)* %gep1 + %r2 = bitcast i32 %r1 to float + ret float %r2 +} + ; Function Attrs: nounwind readnone speculatable declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #6 diff --git a/test/CodeGen/ARM/ldrex-frame-size.ll b/test/CodeGen/ARM/ldrex-frame-size.ll new file mode 100644 index 000000000000..595540578a00 --- /dev/null +++ b/test/CodeGen/ARM/ldrex-frame-size.ll @@ -0,0 +1,36 @@ +; RUN: llc -mtriple=thumbv7-linux-gnueabi -o - %s | FileCheck %s + +; This alloca is just large enough that FrameLowering decides it needs a frame +; to guarantee access, based on the range of ldrex. + +; The actual alloca size is a bit of black magic, unfortunately: the real +; maximum accessible is 1020, but FrameLowering adds 16 bytes to its estimated +; stack size just because so the alloca is not actually the what the limit gets +; compared to. The important point is that we don't go up to ~4096, which is the +; default with no strange instructions. +define void @test_large_frame() { +; CHECK-LABEL: test_large_frame: +; CHECK: push +; CHECK: sub.w sp, sp, #1004 + + %ptr = alloca i32, i32 251 + + %addr = getelementptr i32, i32* %ptr, i32 1 + call i32 @llvm.arm.ldrex.p0i32(i32* %addr) + ret void +} + +; This alloca is just is just the other side of the limit, so no frame +define void @test_small_frame() { +; CHECK-LABEL: test_small_frame: +; CHECK-NOT: push +; CHECK: sub.w sp, sp, #1000 + + %ptr = alloca i32, i32 250 + + %addr = getelementptr i32, i32* %ptr, i32 1 + call i32 @llvm.arm.ldrex.p0i32(i32* %addr) + ret void +} + +declare i32 @llvm.arm.ldrex.p0i32(i32*) diff --git a/test/CodeGen/ARM/ldstrex.ll b/test/CodeGen/ARM/ldstrex.ll index 59349f72a8fe..73afa0e27469 100644 --- a/test/CodeGen/ARM/ldstrex.ll +++ b/test/CodeGen/ARM/ldstrex.ll @@ -142,6 +142,91 @@ define void @excl_addrmode() { ret void } +define void @test_excl_addrmode_folded() { +; CHECK-LABEL: test_excl_addrmode_folded: + %local = alloca i8, i32 4096 + + %local.0 = getelementptr i8, i8* %local, i32 4 + %local32.0 = bitcast i8* %local.0 to i32* + call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0) + call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0) +; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [sp, #4] +; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [sp, #4] + + %local.1 = getelementptr i8, i8* %local, i32 1020 + %local32.1 = bitcast i8* %local.1 to i32* + call i32 @llvm.arm.ldrex.p0i32(i32* %local32.1) + call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.1) +; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [sp, #1020] +; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [sp, #1020] + + ret void +} + +define void @test_excl_addrmode_range() { +; CHECK-LABEL: test_excl_addrmode_range: + %local = alloca i8, i32 4096 + + %local.0 = getelementptr i8, i8* %local, i32 1024 + %local32.0 = bitcast i8* %local.0 to i32* + call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0) + call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0) +; CHECK-T2ADDRMODE: mov r[[TMP:[0-9]+]], sp +; CHECK-T2ADDRMODE: add.w r[[ADDR:[0-9]+]], r[[TMP]], #1024 +; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]] +; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]] + + ret void +} + +define void @test_excl_addrmode_align() { +; CHECK-LABEL: test_excl_addrmode_align: + %local = alloca i8, i32 4096 + + %local.0 = getelementptr i8, i8* %local, i32 2 + %local32.0 = bitcast i8* %local.0 to i32* + call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0) + call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0) +; CHECK-T2ADDRMODE: mov r[[ADDR:[0-9]+]], sp +; CHECK-T2ADDRMODE: adds r[[ADDR:[0-9]+]], #2 +; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]] +; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]] + + ret void +} + +define void @test_excl_addrmode_sign() { +; CHECK-LABEL: test_excl_addrmode_sign: + %local = alloca i8, i32 4096 + + %local.0 = getelementptr i8, i8* %local, i32 -4 + %local32.0 = bitcast i8* %local.0 to i32* + call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0) + call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0) +; CHECK-T2ADDRMODE: mov r[[ADDR:[0-9]+]], sp +; CHECK-T2ADDRMODE: subs r[[ADDR:[0-9]+]], #4 +; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]] +; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]] + + ret void +} + +define void @test_excl_addrmode_combination() { +; CHECK-LABEL: test_excl_addrmode_combination: + %local = alloca i8, i32 4096 + %unused = alloca i8, i32 64 + + %local.0 = getelementptr i8, i8* %local, i32 4 + %local32.0 = bitcast i8* %local.0 to i32* + call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0) + call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0) +; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [sp, #68] +; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [sp, #68] + + ret void +} + + ; LLVM should know, even across basic blocks, that ldrex is setting the high ; bits of its i32 to 0. There should be no zero-extend operation. define zeroext i8 @test_cross_block_zext_i8(i1 %tst, i8* %addr) { diff --git a/test/CodeGen/X86/eip-addressing-i386.ll b/test/CodeGen/X86/eip-addressing-i386.ll index ddb7c782c204..b686be5727a1 100644 --- a/test/CodeGen/X86/eip-addressing-i386.ll +++ b/test/CodeGen/X86/eip-addressing-i386.ll @@ -1,8 +1,8 @@ ; RUN: not llc -mtriple i386-apple-- -o /dev/null < %s 2>&1| FileCheck %s -; CHECK: <inline asm>:1:13: error: register %eip is only available in 64-bit mode +; CHECK: <inline asm>:1:13: error: IP-relative addressing requires 64-bit mode ; CHECK-NEXT: jmpl *_foo(%eip) -; Make sure that we emit an error if we encounter RIP-relative instructions in +; Make sure that we emit an error if we encounter IP-relative instructions in ; 32-bit mode. define i32 @foo() { ret i32 0 } diff --git a/test/MC/AsmParser/directive_file-3.s b/test/MC/AsmParser/directive_file-3.s new file mode 100644 index 000000000000..c3bdaede2705 --- /dev/null +++ b/test/MC/AsmParser/directive_file-3.s @@ -0,0 +1,24 @@ +// RUN: llvm-mc -g -triple i386-unknown-unknown %s | FileCheck -check-prefix=CHECK-DEFAULT %s +// RUN: llvm-mc -g -triple i386-unknown-unknown %s -filetype=obj | obj2yaml | FileCheck -check-prefix=CHECK-DEBUG %s + +// Test for Bug 38695 +// This testcase has a single function and a .file directive +// without the [file-num] argument. When compiled with -g, +// this testcase will not report error, and generate new +// debug info. + + .file "hello" +.text + +f1: + nop +.size f1, .-f1 + +// CHECK-DEFAULT: .file "hello" + +// CHECK-DEBUG: Sections: +// CHECK-DEBUG: - Name: .text +// CHECK-DEBUG: - Name: .debug_info +// CHECK-DEBUG: - Name: .rel.debug_info +// CHECK-DEBUG: Info: .debug_info +// CHECK-DEBUG: Symbols: diff --git a/test/MC/X86/pr38826.s b/test/MC/X86/pr38826.s new file mode 100644 index 000000000000..76289a147ec5 --- /dev/null +++ b/test/MC/X86/pr38826.s @@ -0,0 +1,24 @@ +// RUN: llvm-mc %s -triple i386-unknown-unknown + +// Make sure %eip is allowed as a register in cfi directives in 32-bit mode + + .text + .align 4 + .globl foo + +foo: + .cfi_startproc + + movl (%edx), %ecx + movl 4(%edx), %ebx + movl 8(%edx), %esi + movl 12(%edx), %edi + movl 16(%edx), %ebp + .cfi_def_cfa %edx, 0 + .cfi_offset %eip, 24 + .cfi_register %esp, %ecx + movl %ecx, %esp + + jmp *24(%edx) + + .cfi_endproc diff --git a/test/MC/X86/x86_errors.s b/test/MC/X86/x86_errors.s index 6aa429c7d80a..1fe0a583e59c 100644 --- a/test/MC/X86/x86_errors.s +++ b/test/MC/X86/x86_errors.s @@ -103,11 +103,11 @@ lea (%si,%bx), %ax // 64: error: invalid 16-bit base register lea (%di,%bx), %ax -// 32: error: register %eip is only available in 64-bit mode +// 32: error: invalid base+index expression // 64: error: invalid base+index expression mov (,%eip), %rbx -// 32: error: register %eip is only available in 64-bit mode +// 32: error: invalid base+index expression // 64: error: invalid base+index expression mov (%eip,%eax), %rbx diff --git a/test/Transforms/Inline/infinite-loop-two-predecessors.ll b/test/Transforms/Inline/infinite-loop-two-predecessors.ll new file mode 100644 index 000000000000..aa07315eb081 --- /dev/null +++ b/test/Transforms/Inline/infinite-loop-two-predecessors.ll @@ -0,0 +1,32 @@ +; RUN: opt -S -o - %s -inline | FileCheck %s + +define void @f1() { +bb.0: + br i1 false, label %bb.2, label %bb.1 + +bb.1: ; preds = %bb.0 + br label %bb.2 + +bb.2: ; preds = %bb.0, %bb.1 + %tmp0 = phi i1 [ true, %bb.1 ], [ false, %bb.0 ] + br i1 %tmp0, label %bb.4, label %bb.3 + +bb.3: ; preds = %bb.3, %bb.3 + br i1 undef, label %bb.3, label %bb.3 + +bb.4: ; preds = %bb.2 + ret void +} + +define void @f2() { +bb.0: + call void @f1() + ret void +} + +; f1 should be inlined into f2 and simplified/collapsed to nothing. + +; CHECK-LABEL: define void @f2() { +; CHECK-NEXT: bb.0: +; CHECK-NEXT: ret void +; CHECK-NEXT: } diff --git a/test/Transforms/LICM/loopsink-pr38462.ll b/test/Transforms/LICM/loopsink-pr38462.ll new file mode 100644 index 000000000000..146e2506b7eb --- /dev/null +++ b/test/Transforms/LICM/loopsink-pr38462.ll @@ -0,0 +1,65 @@ +; RUN: opt -S -loop-sink < %s | FileCheck %s + +target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-pc-windows-msvc19.13.26128" + +%struct.FontInfoData = type { i32 (...)** } +%struct.S = type { i8 } + +; CHECK: @pr38462 +; Make sure not to assert by trying to sink into catch.dispatch. + +define void @pr38462(%struct.FontInfoData* %this) personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*) !prof !1 { +entry: + %s = alloca %struct.S + %call6 = call i32 @f() + %tobool7 = icmp eq i32 %call6, 0 + br i1 %tobool7, label %for.body.lr.ph, label %for.cond.cleanup + +for.body.lr.ph: + %0 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0 + br label %for.body + +for.cond.cleanup.loopexit: + br label %for.cond.cleanup + +for.cond.cleanup: + ret void + +for.body: + %call2 = invoke i32 @f() to label %__try.cont unwind label %catch.dispatch + +catch.dispatch: + %1 = catchswitch within none [label %__except] unwind to caller + +__except: + %2 = catchpad within %1 [i8* null] + catchret from %2 to label %__except3 + +__except3: + call void @llvm.lifetime.start.p0i8(i64 1, i8* nonnull %0) + %call.i = call zeroext i1 @g(%struct.S* nonnull %s) + br i1 %call.i, label %if.then.i, label %exit + +if.then.i: + %call2.i = call i32 @f() + br label %exit + +exit: + call void @llvm.lifetime.end.p0i8(i64 1, i8* nonnull %0) + br label %__try.cont + +__try.cont: + %call = call i32 @f() + %tobool = icmp eq i32 %call, 0 + br i1 %tobool, label %for.body, label %for.cond.cleanup.loopexit +} + +declare i32 @__C_specific_handler(...) +declare i32 @f() +declare zeroext i1 @g(%struct.S*) +declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) +declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) + +!1 = !{!"function_entry_count", i64 1} + diff --git a/test/Transforms/LoopVectorize/X86/uniform-phi.ll b/test/Transforms/LoopVectorize/X86/uniform-phi.ll index 881f29a94cb5..2be565e71105 100644 --- a/test/Transforms/LoopVectorize/X86/uniform-phi.ll +++ b/test/Transforms/LoopVectorize/X86/uniform-phi.ll @@ -75,3 +75,25 @@ for.end: ; preds = %for.body ret i64 %retval } +; CHECK-LABEL: PR38786 +; Check that first order recurrence phis (%phi32 and %phi64) are not uniform. +; CHECK-NOT: LV: Found uniform instruction: %phi +define void @PR38786(double* %y, double* %x, i64 %n) { +entry: + br label %for.body + +for.body: + %phi32 = phi i32 [ 0, %entry ], [ %i32next, %for.body ] + %phi64 = phi i64 [ 0, %entry ], [ %i64next, %for.body ] + %i32next = add i32 %phi32, 1 + %i64next = zext i32 %i32next to i64 + %xip = getelementptr inbounds double, double* %x, i64 %i64next + %yip = getelementptr inbounds double, double* %y, i64 %phi64 + %xi = load double, double* %xip, align 8 + store double %xi, double* %yip, align 8 + %cmp = icmp slt i64 %i64next, %n + br i1 %cmp, label %for.body, label %for.end + +for.end: + ret void +} diff --git a/test/Transforms/SROA/phi-and-select.ll b/test/Transforms/SROA/phi-and-select.ll index fb76548b1d18..e7ba2e89d795 100644 --- a/test/Transforms/SROA/phi-and-select.ll +++ b/test/Transforms/SROA/phi-and-select.ll @@ -600,3 +600,35 @@ if.then5: ; preds = %if.then2, %if.end store %struct.S undef, %struct.S* %f1, align 4 ret void } + +define i32 @phi_align(i32* %z) { +; CHECK-LABEL: @phi_align( +entry: + %a = alloca [8 x i8], align 8 +; CHECK: alloca [7 x i8] + + %a0x = getelementptr [8 x i8], [8 x i8]* %a, i64 0, i32 1 + %a0 = bitcast i8* %a0x to i32* + %a1x = getelementptr [8 x i8], [8 x i8]* %a, i64 0, i32 4 + %a1 = bitcast i8* %a1x to i32* +; CHECK: store i32 0, {{.*}}, align 1 + store i32 0, i32* %a0, align 1 +; CHECK: store i32 1, {{.*}}, align 1 + store i32 1, i32* %a1, align 4 +; CHECK: load {{.*}}, align 1 + %v0 = load i32, i32* %a0, align 1 +; CHECK: load {{.*}}, align 1 + %v1 = load i32, i32* %a1, align 4 + %cond = icmp sle i32 %v0, %v1 + br i1 %cond, label %then, label %exit + +then: + br label %exit + +exit: +; CHECK: %phi = phi i32* [ {{.*}}, %then ], [ %z, %entry ] +; CHECK-NEXT: %result = load i32, i32* %phi, align 1 + %phi = phi i32* [ %a1, %then ], [ %z, %entry ] + %result = load i32, i32* %phi, align 4 + ret i32 %result +} diff --git a/utils/lit/lit/TestRunner.py b/utils/lit/lit/TestRunner.py index e304381ff47a..4d903b4a6f44 100644 --- a/utils/lit/lit/TestRunner.py +++ b/utils/lit/lit/TestRunner.py @@ -879,7 +879,7 @@ def _executeShCmd(cmd, shenv, results, timeoutHelper): # Expand all glob expressions args = expand_glob_expressions(args, cmd_shenv.cwd) if is_builtin_cmd: - args.insert(0, "python") + args.insert(0, sys.executable) args[1] = os.path.join(builtin_commands_dir ,args[1] + ".py") # On Windows, do our own command line quoting for better compatibility |