Index: vendor/llvm/dist/docs/ReleaseNotes.rst
===================================================================
--- vendor/llvm/dist/docs/ReleaseNotes.rst (revision 314974)
+++ vendor/llvm/dist/docs/ReleaseNotes.rst (revision 314975)
@@ -1,314 +1,349 @@
========================
LLVM 4.0.0 Release Notes
========================
.. contents::
:local:
Introduction
============
This document contains the release notes for the LLVM Compiler Infrastructure,
release 4.0.0. Here we describe the status of LLVM, including major improvements
from the previous release, improvements in various subprojects of LLVM, and
some of the current users of the code. All LLVM releases may be downloaded
from the `LLVM releases web site `_.
For more information about LLVM, including information about the latest
release, please check out the `main LLVM web site `_. If you
have questions or comments, the `LLVM Developer's Mailing List
`_ is a good place to send
them.
New Versioning Scheme
=====================
Starting with this release, LLVM is using a
`new versioning scheme `_,
increasing the major version number with each major release. Stable updates to
this release will be versioned 4.0.x, and the next major release, six months
from now, will be version 5.0.0.
Non-comprehensive list of changes in this release
=================================================
-* Minimum compiler version to build has been raised to GCC 4.8 and VS 2015.
+* The minimum compiler version required for building LLVM has been raised to
+ 4.8 for GCC and 2015 for Visual Studio.
* The C API functions ``LLVMAddFunctionAttr``, ``LLVMGetFunctionAttr``,
``LLVMRemoveFunctionAttr``, ``LLVMAddAttribute``, ``LLVMRemoveAttribute``,
``LLVMGetAttribute``, ``LLVMAddInstrAttribute`` and
``LLVMRemoveInstrAttribute`` have been removed.
* The C API enum ``LLVMAttribute`` has been deleted.
* The definition and uses of ``LLVM_ATRIBUTE_UNUSED_RESULT`` in the LLVM source
were replaced with ``LLVM_NODISCARD``, which matches the C++17 ``[[nodiscard]]``
semantics rather than gcc's ``__attribute__((warn_unused_result))``.
* The Timer related APIs now expect a Name and Description. When upgrading code
the previously used names should become descriptions and a short name in the
style of a programming language identifier should be added.
* LLVM now handles ``invariant.group`` across different basic blocks, which makes
it possible to devirtualize virtual calls inside loops.
* The aggressive dead code elimination phase ("adce") now removes
branches which do not effect program behavior. Loops are retained by
default since they may be infinite but these can also be removed
with LLVM option ``-adce-remove-loops`` when the loop body otherwise has
no live operations.
-* The GVNHoist pass is now enabled by default. The new pass based on Global
- Value Numbering detects similar computations in branch code and replaces
- multiple instances of the same computation with a unique expression. The
- transform benefits code size and generates better schedules. GVNHoist is
- more aggressive at ``-Os`` and ``-Oz``, hoisting more expressions at the
- expense of execution time degradations.
+* The llvm-cov tool can now export coverage data as json. Its html output mode
+ has also improved.
- * The llvm-cov tool can now export coverage data as json. Its html output mode
- has also improved.
-
Improvements to ThinLTO (-flto=thin)
------------------------------------
Integration with profile data (PGO). When available, profile data
enables more accurate function importing decisions, as well as
cross-module indirect call promotion.
Significant build-time and binary-size improvements when compiling with
debug info (-g).
LLVM Coroutines
---------------
Experimental support for :doc:`Coroutines` was added, which can be enabled
with ``-enable-coroutines`` in ``opt`` the command tool or using the
``addCoroutinePassesToExtensionPoints`` API when building the optimization
pipeline.
For more information on LLVM Coroutines and the LLVM implementation, see
`2016 LLVM Developers’ Meeting talk on LLVM Coroutines
`_.
Regcall and Vectorcall Calling Conventions
--------------------------------------------------
Support was added for ``_regcall`` calling convention.
Existing ``__vectorcall`` calling convention support was extended to include
correct handling of HVAs.
The ``__vectorcall`` calling convention was introduced by Microsoft to
enhance register usage when passing parameters.
For more information please read `__vectorcall documentation
`_.
The ``__regcall`` calling convention was introduced by Intel to
optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are
passed or returned in registers.
For more information please read `__regcall documentation
`_.
Code Generation Testing
-----------------------
Passes that work on the machine instruction representation can be tested with
the .mir serialization format. ``llc`` supports the ``-run-pass``,
``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to
run a single pass of the code generation pipeline, or to stop or start the code
generation pipeline at a given point.
Additional information can be found in the :doc:`MIRLangRef`. The format is
used by the tests ending in ``.mir`` in the ``test/CodeGen`` directory.
This feature is available since 2015. It is used more often lately and was not
mentioned in the release notes yet.
Intrusive list API overhaul
---------------------------
The intrusive list infrastructure was substantially rewritten over the last
couple of releases, primarily to excise undefined behaviour. The biggest
changes landed in this release.
* ``simple_ilist`` is a lower-level intrusive list that never takes
ownership of its nodes. New intrusive-list clients should consider using it
instead of ``ilist``.
* ``ilist_tag`` allows a single data type to be inserted into two
parallel intrusive lists. A type can inherit twice from ``ilist_node``,
first using ``ilist_node>`` (enabling insertion into
``simple_ilist>``) and second using
``ilist_node>`` (enabling insertion into
``simple_ilist>``), where ``A`` and ``B`` are arbitrary
types.
* ``ilist_sentinel_tracking`` controls whether an iterator knows
whether it's pointing at the sentinel (``end()``). By default, sentinel
tracking is on when ABI-breaking checks are enabled, and off otherwise;
this is used for an assertion when dereferencing ``end()`` (this assertion
triggered often in practice, and many backend bugs were fixed). Explicitly
turning on sentinel tracking also enables ``iterator::isEnd()``. This is
used by ``MachineInstrBundleIterator`` to iterate over bundles.
* ``ilist`` is built on top of ``simple_ilist``, and supports the same
configuration options. As before (and unlike ``simple_ilist``),
``ilist`` takes ownership of its nodes. However, it no longer supports
*allocating* nodes, and is now equivalent to ``iplist``. ``iplist``
will likely be removed in the future.
* ``ilist`` now always uses ``ilist_traits``. Instead of passing a
custom traits class in via a template parameter, clients that want to
customize the traits should specialize ``ilist_traits``. Clients that
want to avoid ownership can specialize ``ilist_alloc_traits`` to inherit
from ``ilist_noalloc_traits`` (or to do something funky); clients that
need callbacks can specialize ``ilist_callback_traits`` directly.
* The underlying data structure is now a simple recursive linked list. The
sentinel node contains only a "next" (``begin()``) and "prev" (``rbegin()``)
pointer and is stored in the same allocation as ``simple_ilist``.
Previously, it was malloc-allocated on-demand by default, although the
now-defunct ``ilist_sentinel_traits`` was sometimes specialized to avoid
this.
* The ``reverse_iterator`` class no longer uses ``std::reverse_iterator``.
Instead, it now has a handle to the same node that it dereferences to.
Reverse iterators now have the same iterator invalidation semantics as
forward iterators.
* ``iterator`` and ``reverse_iterator`` have explicit conversion constructors
that match ``std::reverse_iterator``'s off-by-one semantics, so that
reversing the end points of an iterator range results in the same range
(albeit in reverse). I.e., ``reverse_iterator(begin())`` equals
``rend()``.
* ``iterator::getReverse()`` and ``reverse_iterator::getReverse()`` return an
iterator that dereferences to the *same* node. I.e.,
``begin().getReverse()`` equals ``--rend()``.
* ``ilist_node::getIterator()`` and
``ilist_node::getReverseIterator()`` return the forward and reverse
iterators that dereference to the current node. I.e.,
``begin()->getIterator()`` equals ``begin()`` and
``rbegin()->getReverseIterator()`` equals ``rbegin()``.
* ``iterator`` now stores an ``ilist_node_base*`` instead of a ``T*``. The
implicit conversions between ``ilist::iterator`` and ``T*`` have been
removed. Clients may use ``N->getIterator()`` (if not ``nullptr``) or
``&*I`` (if not ``end()``); alternatively, clients may refactor to use
references for known-good nodes.
Changes to the ARM Targets
--------------------------
**During this release the AArch64 target has:**
* Gained support for ILP32 relocations.
* Gained support for XRay.
* Made even more progress on GlobalISel. There is still some work left before
it is production-ready though.
* Refined the support for Qualcomm's Falkor and Samsung's Exynos CPUs.
* Learned a few new tricks for lowering multiplications by constants, folding
spilled/refilled copies etc.
**During this release the ARM target has:**
* Gained support for ROPI (read-only position independence) and RWPI
(read-write position independence), which can be used to remove the need for
a dynamic linker.
* Gained support for execute-only code, which is placed in pages without read
permissions.
* Gained a machine scheduler for Cortex-R52.
* Gained support for XRay.
* Gained Thumb1 implementations for several compiler-rt builtins. It also
has some support for building the builtins for HF targets.
* Started using the generic bitreverse intrinsic instead of rbit.
* Gained very basic support for GlobalISel.
A lot of work has also been done in LLD for ARM, which now supports more
relocations and TLS.
+Note: From the next release (5.0), the "vulcan" target will be renamed to
+"thunderx2t99", including command line options, assembly directives, etc. This
+release (4.0) will be the last one to accept "vulcan" as its name.
+
Changes to the AVR Target
-----------------------------
This marks the first release where the AVR backend has been completely merged
from a fork into LLVM trunk. The backend is still marked experimental, but
is generally quite usable. All downstream development has halted on
`GitHub `_, and changes now go directly into
LLVM trunk.
* Instruction selector and pseudo instruction expansion pass landed
* `read_register` and `write_register` intrinsics are now supported
* Support stack stores greater than 63-bytes from the bottom of the stack
* A number of assertion errors have been fixed
* Support stores to `undef` locations
* Very basic support for the target has been added to clang
* Small optimizations to some 16-bit boolean expressions
Most of the work behind the scenes has been on correctness of generated
assembly, and also fixing some assertions we would hit on some well-formed
inputs.
Changes to the MIPS Target
-----------------------------
**During this release the MIPS target has:**
* IAS is now enabled by default for Debian mips64el.
* Added support for the two operand form for many instructions.
* Added the following macros: unaligned load/store, seq, double word load/store for O32.
* Improved the parsing of complex memory offset expressions.
* Enabled the integrated assembler by default for Debian mips64el.
* Added a generic scheduler based on the interAptiv CPU.
* Added support for thread local relocations.
* Added recip, rsqrt, evp, dvp, synci instructions in IAS.
* Optimized the generation of constants from some cases.
**The following issues have been fixed:**
* Thread local debug information is correctly recorded.
* MSA intrinsics are now range checked.
* Fixed an issue with MSA and the no-odd-spreg abi.
* Fixed some corner cases in handling forbidden slots for MIPSR6.
* Fixed an issue with jumps not being converted to relative branches for assembly.
* Fixed the handling of local symbols and jal instruction.
* N32/N64 no longer have their relocation tables sorted as per their ABIs.
* Fixed a crash when half-precision floating point conversion MSA intrinsics are used.
* Fixed several crashes involving FastISel.
* Corrected the corrected definitions for aui/daui/dahi/dati for MIPSR6.
+Changes to the X86 Target
+-------------------------
+
+**During this release the X86 target has:**
+
+* Added support AMD Ryzen (znver1) CPUs.
+* Gained support for using VEX encoding on AVX-512 CPUs to reduce code size when possible.
+* Improved AVX-512 codegen.
+
Changes to the OCaml bindings
-----------------------------
* The attribute API was completely overhauled, following the changes
to the C API.
External Open Source Projects Using LLVM 4.0.0
==============================================
LDC - the LLVM-based D compiler
-------------------------------
`D `_ is a language with C-like syntax and static typing. It
pragmatically combines efficiency, control, and modeling power, with safety and
programmer productivity. D supports powerful concepts like Compile-Time Function
Execution (CTFE) and Template Meta-Programming, provides an innovative approach
to concurrency and offers many classical paradigms.
`LDC `_ uses the frontend from the reference compiler
combined with LLVM as backend to produce efficient native code. LDC targets
x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on ARM
and PowerPC (32/64 bit). Ports to other architectures like AArch64 and MIPS64
are underway.
+
+Portable Computing Language (pocl)
+----------------------------------
+
+In addition to producing an easily portable open source OpenCL
+implementation, another major goal of `pocl `_
+is improving performance portability of OpenCL programs with
+compiler optimizations, reducing the need for target-dependent manual
+optimizations. An important part of pocl is a set of LLVM passes used to
+statically parallelize multiple work-items with the kernel compiler, even in
+the presence of work-group barriers. This enables static parallelization of
+the fine-grained static concurrency in the work groups in multiple ways.
+
+TTA-based Co-design Environment (TCE)
+-------------------------------------
+
+`TCE `_ is a toolset for designing customized
+processors based on the Transport Triggered Architecture (TTA).
+The toolset provides a complete co-design flow from C/C++
+programs down to synthesizable VHDL/Verilog and parallel program binaries.
+Processor customization points include register files, function units,
+supported operations, and the interconnection network.
+
+TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent
+optimizations and also for parts of code generation. It generates new
+LLVM-based code generators "on the fly" for the designed TTA processors and
+loads them in to the compiler backend as runtime libraries to avoid
+per-target recompilation of larger parts of the compiler chain.
Additional Information
======================
A wide variety of additional information is available on the `LLVM web page
`_, in particular in the `documentation
`_ section. The web page also contains versions of the
API documentation which is up-to-date with the Subversion version of the source
code. You can access versions of these documents specific to this release by
going into the ``llvm/docs/`` directory in the LLVM tree.
If you have any questions or comments about LLVM, please feel free to contact
us via the `mailing lists `_.
Index: vendor/llvm/dist/lib/Analysis/ScalarEvolution.cpp
===================================================================
--- vendor/llvm/dist/lib/Analysis/ScalarEvolution.cpp (revision 314974)
+++ vendor/llvm/dist/lib/Analysis/ScalarEvolution.cpp (revision 314975)
@@ -1,10506 +1,10511 @@
//===- ScalarEvolution.cpp - Scalar Evolution Analysis --------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains the implementation of the scalar evolution analysis
// engine, which is used primarily to analyze expressions involving induction
// variables in loops.
//
// There are several aspects to this library. First is the representation of
// scalar expressions, which are represented as subclasses of the SCEV class.
// These classes are used to represent certain types of subexpressions that we
// can handle. We only create one SCEV of a particular shape, so
// pointer-comparisons for equality are legal.
//
// One important aspect of the SCEV objects is that they are never cyclic, even
// if there is a cycle in the dataflow for an expression (ie, a PHI node). If
// the PHI node is one of the idioms that we can represent (e.g., a polynomial
// recurrence) then we represent it directly as a recurrence node, otherwise we
// represent it as a SCEVUnknown node.
//
// In addition to being able to represent expressions of various types, we also
// have folders that are used to build the *canonical* representation for a
// particular expression. These folders are capable of using a variety of
// rewrite rules to simplify the expressions.
//
// Once the folders are defined, we can implement the more interesting
// higher-level code, such as the code that recognizes PHI nodes of various
// types, computes the execution count of a loop, etc.
//
// TODO: We should use these routines and value representations to implement
// dependence analysis!
//
//===----------------------------------------------------------------------===//
//
// There are several good references for the techniques used in this analysis.
//
// Chains of recurrences -- a method to expedite the evaluation
// of closed-form functions
// Olaf Bachmann, Paul S. Wang, Eugene V. Zima
//
// On computational properties of chains of recurrences
// Eugene V. Zima
//
// Symbolic Evaluation of Chains of Recurrences for Loop Optimization
// Robert A. van Engelen
//
// Efficient Symbolic Analysis for Optimizing Compilers
// Robert A. van Engelen
//
// Using the chains of recurrences algebra for data dependence testing and
// induction variable substitution
// MS Thesis, Johnie Birch
//
//===----------------------------------------------------------------------===//
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/SaveAndRestore.h"
#include
using namespace llvm;
#define DEBUG_TYPE "scalar-evolution"
STATISTIC(NumArrayLenItCounts,
"Number of trip counts computed with array length");
STATISTIC(NumTripCountsComputed,
"Number of loops with predictable loop counts");
STATISTIC(NumTripCountsNotComputed,
"Number of loops without predictable loop counts");
STATISTIC(NumBruteForceTripCountsComputed,
"Number of loops with trip counts computed by force");
static cl::opt
MaxBruteForceIterations("scalar-evolution-max-iterations", cl::ReallyHidden,
cl::desc("Maximum number of iterations SCEV will "
"symbolically execute a constant "
"derived loop"),
cl::init(100));
// FIXME: Enable this with EXPENSIVE_CHECKS when the test suite is clean.
static cl::opt
VerifySCEV("verify-scev",
cl::desc("Verify ScalarEvolution's backedge taken counts (slow)"));
static cl::opt
VerifySCEVMap("verify-scev-maps",
cl::desc("Verify no dangling value in ScalarEvolution's "
"ExprValueMap (slow)"));
static cl::opt MulOpsInlineThreshold(
"scev-mulops-inline-threshold", cl::Hidden,
cl::desc("Threshold for inlining multiplication operands into a SCEV"),
cl::init(1000));
-static cl::opt
- MaxCompareDepth("scalar-evolution-max-compare-depth", cl::Hidden,
- cl::desc("Maximum depth of recursive compare complexity"),
- cl::init(32));
+static cl::opt MaxSCEVCompareDepth(
+ "scalar-evolution-max-scev-compare-depth", cl::Hidden,
+ cl::desc("Maximum depth of recursive SCEV complexity comparisons"),
+ cl::init(32));
+static cl::opt MaxValueCompareDepth(
+ "scalar-evolution-max-value-compare-depth", cl::Hidden,
+ cl::desc("Maximum depth of recursive value complexity comparisons"),
+ cl::init(2));
+
//===----------------------------------------------------------------------===//
// SCEV class definitions
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Implementation of the SCEV class.
//
LLVM_DUMP_METHOD
void SCEV::dump() const {
print(dbgs());
dbgs() << '\n';
}
void SCEV::print(raw_ostream &OS) const {
switch (static_cast(getSCEVType())) {
case scConstant:
cast(this)->getValue()->printAsOperand(OS, false);
return;
case scTruncate: {
const SCEVTruncateExpr *Trunc = cast(this);
const SCEV *Op = Trunc->getOperand();
OS << "(trunc " << *Op->getType() << " " << *Op << " to "
<< *Trunc->getType() << ")";
return;
}
case scZeroExtend: {
const SCEVZeroExtendExpr *ZExt = cast(this);
const SCEV *Op = ZExt->getOperand();
OS << "(zext " << *Op->getType() << " " << *Op << " to "
<< *ZExt->getType() << ")";
return;
}
case scSignExtend: {
const SCEVSignExtendExpr *SExt = cast(this);
const SCEV *Op = SExt->getOperand();
OS << "(sext " << *Op->getType() << " " << *Op << " to "
<< *SExt->getType() << ")";
return;
}
case scAddRecExpr: {
const SCEVAddRecExpr *AR = cast(this);
OS << "{" << *AR->getOperand(0);
for (unsigned i = 1, e = AR->getNumOperands(); i != e; ++i)
OS << ",+," << *AR->getOperand(i);
OS << "}<";
if (AR->hasNoUnsignedWrap())
OS << "nuw><";
if (AR->hasNoSignedWrap())
OS << "nsw><";
if (AR->hasNoSelfWrap() &&
!AR->getNoWrapFlags((NoWrapFlags)(FlagNUW | FlagNSW)))
OS << "nw><";
AR->getLoop()->getHeader()->printAsOperand(OS, /*PrintType=*/false);
OS << ">";
return;
}
case scAddExpr:
case scMulExpr:
case scUMaxExpr:
case scSMaxExpr: {
const SCEVNAryExpr *NAry = cast(this);
const char *OpStr = nullptr;
switch (NAry->getSCEVType()) {
case scAddExpr: OpStr = " + "; break;
case scMulExpr: OpStr = " * "; break;
case scUMaxExpr: OpStr = " umax "; break;
case scSMaxExpr: OpStr = " smax "; break;
}
OS << "(";
for (SCEVNAryExpr::op_iterator I = NAry->op_begin(), E = NAry->op_end();
I != E; ++I) {
OS << **I;
if (std::next(I) != E)
OS << OpStr;
}
OS << ")";
switch (NAry->getSCEVType()) {
case scAddExpr:
case scMulExpr:
if (NAry->hasNoUnsignedWrap())
OS << "";
if (NAry->hasNoSignedWrap())
OS << "";
}
return;
}
case scUDivExpr: {
const SCEVUDivExpr *UDiv = cast(this);
OS << "(" << *UDiv->getLHS() << " /u " << *UDiv->getRHS() << ")";
return;
}
case scUnknown: {
const SCEVUnknown *U = cast(this);
Type *AllocTy;
if (U->isSizeOf(AllocTy)) {
OS << "sizeof(" << *AllocTy << ")";
return;
}
if (U->isAlignOf(AllocTy)) {
OS << "alignof(" << *AllocTy << ")";
return;
}
Type *CTy;
Constant *FieldNo;
if (U->isOffsetOf(CTy, FieldNo)) {
OS << "offsetof(" << *CTy << ", ";
FieldNo->printAsOperand(OS, false);
OS << ")";
return;
}
// Otherwise just print it normally.
U->getValue()->printAsOperand(OS, false);
return;
}
case scCouldNotCompute:
OS << "***COULDNOTCOMPUTE***";
return;
}
llvm_unreachable("Unknown SCEV kind!");
}
Type *SCEV::getType() const {
switch (static_cast(getSCEVType())) {
case scConstant:
return cast(this)->getType();
case scTruncate:
case scZeroExtend:
case scSignExtend:
return cast(this)->getType();
case scAddRecExpr:
case scMulExpr:
case scUMaxExpr:
case scSMaxExpr:
return cast(this)->getType();
case scAddExpr:
return cast(this)->getType();
case scUDivExpr:
return cast(this)->getType();
case scUnknown:
return cast(this)->getType();
case scCouldNotCompute:
llvm_unreachable("Attempt to use a SCEVCouldNotCompute object!");
}
llvm_unreachable("Unknown SCEV kind!");
}
bool SCEV::isZero() const {
if (const SCEVConstant *SC = dyn_cast(this))
return SC->getValue()->isZero();
return false;
}
bool SCEV::isOne() const {
if (const SCEVConstant *SC = dyn_cast(this))
return SC->getValue()->isOne();
return false;
}
bool SCEV::isAllOnesValue() const {
if (const SCEVConstant *SC = dyn_cast(this))
return SC->getValue()->isAllOnesValue();
return false;
}
bool SCEV::isNonConstantNegative() const {
const SCEVMulExpr *Mul = dyn_cast(this);
if (!Mul) return false;
// If there is a constant factor, it will be first.
const SCEVConstant *SC = dyn_cast(Mul->getOperand(0));
if (!SC) return false;
// Return true if the value is negative, this matches things like (-42 * V).
return SC->getAPInt().isNegative();
}
SCEVCouldNotCompute::SCEVCouldNotCompute() :
SCEV(FoldingSetNodeIDRef(), scCouldNotCompute) {}
bool SCEVCouldNotCompute::classof(const SCEV *S) {
return S->getSCEVType() == scCouldNotCompute;
}
const SCEV *ScalarEvolution::getConstant(ConstantInt *V) {
FoldingSetNodeID ID;
ID.AddInteger(scConstant);
ID.AddPointer(V);
void *IP = nullptr;
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
SCEV *S = new (SCEVAllocator) SCEVConstant(ID.Intern(SCEVAllocator), V);
UniqueSCEVs.InsertNode(S, IP);
return S;
}
const SCEV *ScalarEvolution::getConstant(const APInt &Val) {
return getConstant(ConstantInt::get(getContext(), Val));
}
const SCEV *
ScalarEvolution::getConstant(Type *Ty, uint64_t V, bool isSigned) {
IntegerType *ITy = cast(getEffectiveSCEVType(Ty));
return getConstant(ConstantInt::get(ITy, V, isSigned));
}
SCEVCastExpr::SCEVCastExpr(const FoldingSetNodeIDRef ID,
unsigned SCEVTy, const SCEV *op, Type *ty)
: SCEV(ID, SCEVTy), Op(op), Ty(ty) {}
SCEVTruncateExpr::SCEVTruncateExpr(const FoldingSetNodeIDRef ID,
const SCEV *op, Type *ty)
: SCEVCastExpr(ID, scTruncate, op, ty) {
assert((Op->getType()->isIntegerTy() || Op->getType()->isPointerTy()) &&
(Ty->isIntegerTy() || Ty->isPointerTy()) &&
"Cannot truncate non-integer value!");
}
SCEVZeroExtendExpr::SCEVZeroExtendExpr(const FoldingSetNodeIDRef ID,
const SCEV *op, Type *ty)
: SCEVCastExpr(ID, scZeroExtend, op, ty) {
assert((Op->getType()->isIntegerTy() || Op->getType()->isPointerTy()) &&
(Ty->isIntegerTy() || Ty->isPointerTy()) &&
"Cannot zero extend non-integer value!");
}
SCEVSignExtendExpr::SCEVSignExtendExpr(const FoldingSetNodeIDRef ID,
const SCEV *op, Type *ty)
: SCEVCastExpr(ID, scSignExtend, op, ty) {
assert((Op->getType()->isIntegerTy() || Op->getType()->isPointerTy()) &&
(Ty->isIntegerTy() || Ty->isPointerTy()) &&
"Cannot sign extend non-integer value!");
}
void SCEVUnknown::deleted() {
// Clear this SCEVUnknown from various maps.
SE->forgetMemoizedResults(this);
// Remove this SCEVUnknown from the uniquing map.
SE->UniqueSCEVs.RemoveNode(this);
// Release the value.
setValPtr(nullptr);
}
void SCEVUnknown::allUsesReplacedWith(Value *New) {
// Clear this SCEVUnknown from various maps.
SE->forgetMemoizedResults(this);
// Remove this SCEVUnknown from the uniquing map.
SE->UniqueSCEVs.RemoveNode(this);
// Update this SCEVUnknown to point to the new value. This is needed
// because there may still be outstanding SCEVs which still point to
// this SCEVUnknown.
setValPtr(New);
}
bool SCEVUnknown::isSizeOf(Type *&AllocTy) const {
if (ConstantExpr *VCE = dyn_cast(getValue()))
if (VCE->getOpcode() == Instruction::PtrToInt)
if (ConstantExpr *CE = dyn_cast(VCE->getOperand(0)))
if (CE->getOpcode() == Instruction::GetElementPtr &&
CE->getOperand(0)->isNullValue() &&
CE->getNumOperands() == 2)
if (ConstantInt *CI = dyn_cast(CE->getOperand(1)))
if (CI->isOne()) {
AllocTy = cast(CE->getOperand(0)->getType())
->getElementType();
return true;
}
return false;
}
bool SCEVUnknown::isAlignOf(Type *&AllocTy) const {
if (ConstantExpr *VCE = dyn_cast(getValue()))
if (VCE->getOpcode() == Instruction::PtrToInt)
if (ConstantExpr *CE = dyn_cast(VCE->getOperand(0)))
if (CE->getOpcode() == Instruction::GetElementPtr &&
CE->getOperand(0)->isNullValue()) {
Type *Ty =
cast(CE->getOperand(0)->getType())->getElementType();
if (StructType *STy = dyn_cast(Ty))
if (!STy->isPacked() &&
CE->getNumOperands() == 3 &&
CE->getOperand(1)->isNullValue()) {
if (ConstantInt *CI = dyn_cast(CE->getOperand(2)))
if (CI->isOne() &&
STy->getNumElements() == 2 &&
STy->getElementType(0)->isIntegerTy(1)) {
AllocTy = STy->getElementType(1);
return true;
}
}
}
return false;
}
bool SCEVUnknown::isOffsetOf(Type *&CTy, Constant *&FieldNo) const {
if (ConstantExpr *VCE = dyn_cast(getValue()))
if (VCE->getOpcode() == Instruction::PtrToInt)
if (ConstantExpr *CE = dyn_cast(VCE->getOperand(0)))
if (CE->getOpcode() == Instruction::GetElementPtr &&
CE->getNumOperands() == 3 &&
CE->getOperand(0)->isNullValue() &&
CE->getOperand(1)->isNullValue()) {
Type *Ty =
cast(CE->getOperand(0)->getType())->getElementType();
// Ignore vector types here so that ScalarEvolutionExpander doesn't
// emit getelementptrs that index into vectors.
if (Ty->isStructTy() || Ty->isArrayTy()) {
CTy = Ty;
FieldNo = CE->getOperand(2);
return true;
}
}
return false;
}
//===----------------------------------------------------------------------===//
// SCEV Utilities
//===----------------------------------------------------------------------===//
/// Compare the two values \p LV and \p RV in terms of their "complexity" where
/// "complexity" is a partial (and somewhat ad-hoc) relation used to order
/// operands in SCEV expressions. \p EqCache is a set of pairs of values that
/// have been previously deemed to be "equally complex" by this routine. It is
/// intended to avoid exponential time complexity in cases like:
///
/// %a = f(%x, %y)
/// %b = f(%a, %a)
/// %c = f(%b, %b)
///
/// %d = f(%x, %y)
/// %e = f(%d, %d)
/// %f = f(%e, %e)
///
/// CompareValueComplexity(%f, %c)
///
/// Since we do not continue running this routine on expression trees once we
/// have seen unequal values, there is no need to track them in the cache.
static int
CompareValueComplexity(SmallSet, 8> &EqCache,
const LoopInfo *const LI, Value *LV, Value *RV,
unsigned Depth) {
- if (Depth > MaxCompareDepth || EqCache.count({LV, RV}))
+ if (Depth > MaxValueCompareDepth || EqCache.count({LV, RV}))
return 0;
// Order pointer values after integer values. This helps SCEVExpander form
// GEPs.
bool LIsPointer = LV->getType()->isPointerTy(),
RIsPointer = RV->getType()->isPointerTy();
if (LIsPointer != RIsPointer)
return (int)LIsPointer - (int)RIsPointer;
// Compare getValueID values.
unsigned LID = LV->getValueID(), RID = RV->getValueID();
if (LID != RID)
return (int)LID - (int)RID;
// Sort arguments by their position.
if (const auto *LA = dyn_cast(LV)) {
const auto *RA = cast(RV);
unsigned LArgNo = LA->getArgNo(), RArgNo = RA->getArgNo();
return (int)LArgNo - (int)RArgNo;
}
if (const auto *LGV = dyn_cast(LV)) {
const auto *RGV = cast(RV);
const auto IsGVNameSemantic = [&](const GlobalValue *GV) {
auto LT = GV->getLinkage();
return !(GlobalValue::isPrivateLinkage(LT) ||
GlobalValue::isInternalLinkage(LT));
};
// Use the names to distinguish the two values, but only if the
// names are semantically important.
if (IsGVNameSemantic(LGV) && IsGVNameSemantic(RGV))
return LGV->getName().compare(RGV->getName());
}
// For instructions, compare their loop depth, and their operand count. This
// is pretty loose.
if (const auto *LInst = dyn_cast(LV)) {
const auto *RInst = cast(RV);
// Compare loop depths.
const BasicBlock *LParent = LInst->getParent(),
*RParent = RInst->getParent();
if (LParent != RParent) {
unsigned LDepth = LI->getLoopDepth(LParent),
RDepth = LI->getLoopDepth(RParent);
if (LDepth != RDepth)
return (int)LDepth - (int)RDepth;
}
// Compare the number of operands.
unsigned LNumOps = LInst->getNumOperands(),
RNumOps = RInst->getNumOperands();
if (LNumOps != RNumOps)
return (int)LNumOps - (int)RNumOps;
for (unsigned Idx : seq(0u, LNumOps)) {
int Result =
CompareValueComplexity(EqCache, LI, LInst->getOperand(Idx),
RInst->getOperand(Idx), Depth + 1);
if (Result != 0)
return Result;
}
}
EqCache.insert({LV, RV});
return 0;
}
// Return negative, zero, or positive, if LHS is less than, equal to, or greater
// than RHS, respectively. A three-way result allows recursive comparisons to be
// more efficient.
static int CompareSCEVComplexity(
SmallSet, 8> &EqCacheSCEV,
const LoopInfo *const LI, const SCEV *LHS, const SCEV *RHS,
unsigned Depth = 0) {
// Fast-path: SCEVs are uniqued so we can do a quick equality check.
if (LHS == RHS)
return 0;
// Primarily, sort the SCEVs by their getSCEVType().
unsigned LType = LHS->getSCEVType(), RType = RHS->getSCEVType();
if (LType != RType)
return (int)LType - (int)RType;
- if (Depth > MaxCompareDepth || EqCacheSCEV.count({LHS, RHS}))
+ if (Depth > MaxSCEVCompareDepth || EqCacheSCEV.count({LHS, RHS}))
return 0;
// Aside from the getSCEVType() ordering, the particular ordering
// isn't very important except that it's beneficial to be consistent,
// so that (a + b) and (b + a) don't end up as different expressions.
switch (static_cast(LType)) {
case scUnknown: {
const SCEVUnknown *LU = cast(LHS);
const SCEVUnknown *RU = cast(RHS);
SmallSet, 8> EqCache;
int X = CompareValueComplexity(EqCache, LI, LU->getValue(), RU->getValue(),
Depth + 1);
if (X == 0)
EqCacheSCEV.insert({LHS, RHS});
return X;
}
case scConstant: {
const SCEVConstant *LC = cast(LHS);
const SCEVConstant *RC = cast(RHS);
// Compare constant values.
const APInt &LA = LC->getAPInt();
const APInt &RA = RC->getAPInt();
unsigned LBitWidth = LA.getBitWidth(), RBitWidth = RA.getBitWidth();
if (LBitWidth != RBitWidth)
return (int)LBitWidth - (int)RBitWidth;
return LA.ult(RA) ? -1 : 1;
}
case scAddRecExpr: {
const SCEVAddRecExpr *LA = cast(LHS);
const SCEVAddRecExpr *RA = cast(RHS);
// Compare addrec loop depths.
const Loop *LLoop = LA->getLoop(), *RLoop = RA->getLoop();
if (LLoop != RLoop) {
unsigned LDepth = LLoop->getLoopDepth(), RDepth = RLoop->getLoopDepth();
if (LDepth != RDepth)
return (int)LDepth - (int)RDepth;
}
// Addrec complexity grows with operand count.
unsigned LNumOps = LA->getNumOperands(), RNumOps = RA->getNumOperands();
if (LNumOps != RNumOps)
return (int)LNumOps - (int)RNumOps;
// Lexicographically compare.
for (unsigned i = 0; i != LNumOps; ++i) {
int X = CompareSCEVComplexity(EqCacheSCEV, LI, LA->getOperand(i),
RA->getOperand(i), Depth + 1);
if (X != 0)
return X;
}
EqCacheSCEV.insert({LHS, RHS});
return 0;
}
case scAddExpr:
case scMulExpr:
case scSMaxExpr:
case scUMaxExpr: {
const SCEVNAryExpr *LC = cast(LHS);
const SCEVNAryExpr *RC = cast(RHS);
// Lexicographically compare n-ary expressions.
unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands();
if (LNumOps != RNumOps)
return (int)LNumOps - (int)RNumOps;
for (unsigned i = 0; i != LNumOps; ++i) {
if (i >= RNumOps)
return 1;
int X = CompareSCEVComplexity(EqCacheSCEV, LI, LC->getOperand(i),
RC->getOperand(i), Depth + 1);
if (X != 0)
return X;
}
EqCacheSCEV.insert({LHS, RHS});
return 0;
}
case scUDivExpr: {
const SCEVUDivExpr *LC = cast(LHS);
const SCEVUDivExpr *RC = cast(RHS);
// Lexicographically compare udiv expressions.
int X = CompareSCEVComplexity(EqCacheSCEV, LI, LC->getLHS(), RC->getLHS(),
Depth + 1);
if (X != 0)
return X;
X = CompareSCEVComplexity(EqCacheSCEV, LI, LC->getRHS(), RC->getRHS(),
Depth + 1);
if (X == 0)
EqCacheSCEV.insert({LHS, RHS});
return X;
}
case scTruncate:
case scZeroExtend:
case scSignExtend: {
const SCEVCastExpr *LC = cast(LHS);
const SCEVCastExpr *RC = cast(RHS);
// Compare cast expressions by operand.
int X = CompareSCEVComplexity(EqCacheSCEV, LI, LC->getOperand(),
RC->getOperand(), Depth + 1);
if (X == 0)
EqCacheSCEV.insert({LHS, RHS});
return X;
}
case scCouldNotCompute:
llvm_unreachable("Attempt to use a SCEVCouldNotCompute object!");
}
llvm_unreachable("Unknown SCEV kind!");
}
/// Given a list of SCEV objects, order them by their complexity, and group
/// objects of the same complexity together by value. When this routine is
/// finished, we know that any duplicates in the vector are consecutive and that
/// complexity is monotonically increasing.
///
/// Note that we go take special precautions to ensure that we get deterministic
/// results from this routine. In other words, we don't want the results of
/// this to depend on where the addresses of various SCEV objects happened to
/// land in memory.
///
static void GroupByComplexity(SmallVectorImpl &Ops,
LoopInfo *LI) {
if (Ops.size() < 2) return; // Noop
SmallSet, 8> EqCache;
if (Ops.size() == 2) {
// This is the common case, which also happens to be trivially simple.
// Special case it.
const SCEV *&LHS = Ops[0], *&RHS = Ops[1];
if (CompareSCEVComplexity(EqCache, LI, RHS, LHS) < 0)
std::swap(LHS, RHS);
return;
}
// Do the rough sort by complexity.
std::stable_sort(Ops.begin(), Ops.end(),
[&EqCache, LI](const SCEV *LHS, const SCEV *RHS) {
return CompareSCEVComplexity(EqCache, LI, LHS, RHS) < 0;
});
// Now that we are sorted by complexity, group elements of the same
// complexity. Note that this is, at worst, N^2, but the vector is likely to
// be extremely short in practice. Note that we take this approach because we
// do not want to depend on the addresses of the objects we are grouping.
for (unsigned i = 0, e = Ops.size(); i != e-2; ++i) {
const SCEV *S = Ops[i];
unsigned Complexity = S->getSCEVType();
// If there are any objects of the same complexity and same value as this
// one, group them.
for (unsigned j = i+1; j != e && Ops[j]->getSCEVType() == Complexity; ++j) {
if (Ops[j] == S) { // Found a duplicate.
// Move it to immediately after i'th element.
std::swap(Ops[i+1], Ops[j]);
++i; // no need to rescan it.
if (i == e-2) return; // Done!
}
}
}
}
// Returns the size of the SCEV S.
static inline int sizeOfSCEV(const SCEV *S) {
struct FindSCEVSize {
int Size;
FindSCEVSize() : Size(0) {}
bool follow(const SCEV *S) {
++Size;
// Keep looking at all operands of S.
return true;
}
bool isDone() const {
return false;
}
};
FindSCEVSize F;
SCEVTraversal ST(F);
ST.visitAll(S);
return F.Size;
}
namespace {
struct SCEVDivision : public SCEVVisitor {
public:
// Computes the Quotient and Remainder of the division of Numerator by
// Denominator.
static void divide(ScalarEvolution &SE, const SCEV *Numerator,
const SCEV *Denominator, const SCEV **Quotient,
const SCEV **Remainder) {
assert(Numerator && Denominator && "Uninitialized SCEV");
SCEVDivision D(SE, Numerator, Denominator);
// Check for the trivial case here to avoid having to check for it in the
// rest of the code.
if (Numerator == Denominator) {
*Quotient = D.One;
*Remainder = D.Zero;
return;
}
if (Numerator->isZero()) {
*Quotient = D.Zero;
*Remainder = D.Zero;
return;
}
// A simple case when N/1. The quotient is N.
if (Denominator->isOne()) {
*Quotient = Numerator;
*Remainder = D.Zero;
return;
}
// Split the Denominator when it is a product.
if (const SCEVMulExpr *T = dyn_cast(Denominator)) {
const SCEV *Q, *R;
*Quotient = Numerator;
for (const SCEV *Op : T->operands()) {
divide(SE, *Quotient, Op, &Q, &R);
*Quotient = Q;
// Bail out when the Numerator is not divisible by one of the terms of
// the Denominator.
if (!R->isZero()) {
*Quotient = D.Zero;
*Remainder = Numerator;
return;
}
}
*Remainder = D.Zero;
return;
}
D.visit(Numerator);
*Quotient = D.Quotient;
*Remainder = D.Remainder;
}
// Except in the trivial case described above, we do not know how to divide
// Expr by Denominator for the following functions with empty implementation.
void visitTruncateExpr(const SCEVTruncateExpr *Numerator) {}
void visitZeroExtendExpr(const SCEVZeroExtendExpr *Numerator) {}
void visitSignExtendExpr(const SCEVSignExtendExpr *Numerator) {}
void visitUDivExpr(const SCEVUDivExpr *Numerator) {}
void visitSMaxExpr(const SCEVSMaxExpr *Numerator) {}
void visitUMaxExpr(const SCEVUMaxExpr *Numerator) {}
void visitUnknown(const SCEVUnknown *Numerator) {}
void visitCouldNotCompute(const SCEVCouldNotCompute *Numerator) {}
void visitConstant(const SCEVConstant *Numerator) {
if (const SCEVConstant *D = dyn_cast(Denominator)) {
APInt NumeratorVal = Numerator->getAPInt();
APInt DenominatorVal = D->getAPInt();
uint32_t NumeratorBW = NumeratorVal.getBitWidth();
uint32_t DenominatorBW = DenominatorVal.getBitWidth();
if (NumeratorBW > DenominatorBW)
DenominatorVal = DenominatorVal.sext(NumeratorBW);
else if (NumeratorBW < DenominatorBW)
NumeratorVal = NumeratorVal.sext(DenominatorBW);
APInt QuotientVal(NumeratorVal.getBitWidth(), 0);
APInt RemainderVal(NumeratorVal.getBitWidth(), 0);
APInt::sdivrem(NumeratorVal, DenominatorVal, QuotientVal, RemainderVal);
Quotient = SE.getConstant(QuotientVal);
Remainder = SE.getConstant(RemainderVal);
return;
}
}
void visitAddRecExpr(const SCEVAddRecExpr *Numerator) {
const SCEV *StartQ, *StartR, *StepQ, *StepR;
if (!Numerator->isAffine())
return cannotDivide(Numerator);
divide(SE, Numerator->getStart(), Denominator, &StartQ, &StartR);
divide(SE, Numerator->getStepRecurrence(SE), Denominator, &StepQ, &StepR);
// Bail out if the types do not match.
Type *Ty = Denominator->getType();
if (Ty != StartQ->getType() || Ty != StartR->getType() ||
Ty != StepQ->getType() || Ty != StepR->getType())
return cannotDivide(Numerator);
Quotient = SE.getAddRecExpr(StartQ, StepQ, Numerator->getLoop(),
Numerator->getNoWrapFlags());
Remainder = SE.getAddRecExpr(StartR, StepR, Numerator->getLoop(),
Numerator->getNoWrapFlags());
}
void visitAddExpr(const SCEVAddExpr *Numerator) {
SmallVector Qs, Rs;
Type *Ty = Denominator->getType();
for (const SCEV *Op : Numerator->operands()) {
const SCEV *Q, *R;
divide(SE, Op, Denominator, &Q, &R);
// Bail out if types do not match.
if (Ty != Q->getType() || Ty != R->getType())
return cannotDivide(Numerator);
Qs.push_back(Q);
Rs.push_back(R);
}
if (Qs.size() == 1) {
Quotient = Qs[0];
Remainder = Rs[0];
return;
}
Quotient = SE.getAddExpr(Qs);
Remainder = SE.getAddExpr(Rs);
}
void visitMulExpr(const SCEVMulExpr *Numerator) {
SmallVector Qs;
Type *Ty = Denominator->getType();
bool FoundDenominatorTerm = false;
for (const SCEV *Op : Numerator->operands()) {
// Bail out if types do not match.
if (Ty != Op->getType())
return cannotDivide(Numerator);
if (FoundDenominatorTerm) {
Qs.push_back(Op);
continue;
}
// Check whether Denominator divides one of the product operands.
const SCEV *Q, *R;
divide(SE, Op, Denominator, &Q, &R);
if (!R->isZero()) {
Qs.push_back(Op);
continue;
}
// Bail out if types do not match.
if (Ty != Q->getType())
return cannotDivide(Numerator);
FoundDenominatorTerm = true;
Qs.push_back(Q);
}
if (FoundDenominatorTerm) {
Remainder = Zero;
if (Qs.size() == 1)
Quotient = Qs[0];
else
Quotient = SE.getMulExpr(Qs);
return;
}
if (!isa(Denominator))
return cannotDivide(Numerator);
// The Remainder is obtained by replacing Denominator by 0 in Numerator.
ValueToValueMap RewriteMap;
RewriteMap[cast(Denominator)->getValue()] =
cast(Zero)->getValue();
Remainder = SCEVParameterRewriter::rewrite(Numerator, SE, RewriteMap, true);
if (Remainder->isZero()) {
// The Quotient is obtained by replacing Denominator by 1 in Numerator.
RewriteMap[cast(Denominator)->getValue()] =
cast(One)->getValue();
Quotient =
SCEVParameterRewriter::rewrite(Numerator, SE, RewriteMap, true);
return;
}
// Quotient is (Numerator - Remainder) divided by Denominator.
const SCEV *Q, *R;
const SCEV *Diff = SE.getMinusSCEV(Numerator, Remainder);
// This SCEV does not seem to simplify: fail the division here.
if (sizeOfSCEV(Diff) > sizeOfSCEV(Numerator))
return cannotDivide(Numerator);
divide(SE, Diff, Denominator, &Q, &R);
if (R != Zero)
return cannotDivide(Numerator);
Quotient = Q;
}
private:
SCEVDivision(ScalarEvolution &S, const SCEV *Numerator,
const SCEV *Denominator)
: SE(S), Denominator(Denominator) {
Zero = SE.getZero(Denominator->getType());
One = SE.getOne(Denominator->getType());
// We generally do not know how to divide Expr by Denominator. We
// initialize the division to a "cannot divide" state to simplify the rest
// of the code.
cannotDivide(Numerator);
}
// Convenience function for giving up on the division. We set the quotient to
// be equal to zero and the remainder to be equal to the numerator.
void cannotDivide(const SCEV *Numerator) {
Quotient = Zero;
Remainder = Numerator;
}
ScalarEvolution &SE;
const SCEV *Denominator, *Quotient, *Remainder, *Zero, *One;
};
}
//===----------------------------------------------------------------------===//
// Simple SCEV method implementations
//===----------------------------------------------------------------------===//
/// Compute BC(It, K). The result has width W. Assume, K > 0.
static const SCEV *BinomialCoefficient(const SCEV *It, unsigned K,
ScalarEvolution &SE,
Type *ResultTy) {
// Handle the simplest case efficiently.
if (K == 1)
return SE.getTruncateOrZeroExtend(It, ResultTy);
// We are using the following formula for BC(It, K):
//
// BC(It, K) = (It * (It - 1) * ... * (It - K + 1)) / K!
//
// Suppose, W is the bitwidth of the return value. We must be prepared for
// overflow. Hence, we must assure that the result of our computation is
// equal to the accurate one modulo 2^W. Unfortunately, division isn't
// safe in modular arithmetic.
//
// However, this code doesn't use exactly that formula; the formula it uses
// is something like the following, where T is the number of factors of 2 in
// K! (i.e. trailing zeros in the binary representation of K!), and ^ is
// exponentiation:
//
// BC(It, K) = (It * (It - 1) * ... * (It - K + 1)) / 2^T / (K! / 2^T)
//
// This formula is trivially equivalent to the previous formula. However,
// this formula can be implemented much more efficiently. The trick is that
// K! / 2^T is odd, and exact division by an odd number *is* safe in modular
// arithmetic. To do exact division in modular arithmetic, all we have
// to do is multiply by the inverse. Therefore, this step can be done at
// width W.
//
// The next issue is how to safely do the division by 2^T. The way this
// is done is by doing the multiplication step at a width of at least W + T
// bits. This way, the bottom W+T bits of the product are accurate. Then,
// when we perform the division by 2^T (which is equivalent to a right shift
// by T), the bottom W bits are accurate. Extra bits are okay; they'll get
// truncated out after the division by 2^T.
//
// In comparison to just directly using the first formula, this technique
// is much more efficient; using the first formula requires W * K bits,
// but this formula less than W + K bits. Also, the first formula requires
// a division step, whereas this formula only requires multiplies and shifts.
//
// It doesn't matter whether the subtraction step is done in the calculation
// width or the input iteration count's width; if the subtraction overflows,
// the result must be zero anyway. We prefer here to do it in the width of
// the induction variable because it helps a lot for certain cases; CodeGen
// isn't smart enough to ignore the overflow, which leads to much less
// efficient code if the width of the subtraction is wider than the native
// register width.
//
// (It's possible to not widen at all by pulling out factors of 2 before
// the multiplication; for example, K=2 can be calculated as
// It/2*(It+(It*INT_MIN/INT_MIN)+-1). However, it requires
// extra arithmetic, so it's not an obvious win, and it gets
// much more complicated for K > 3.)
// Protection from insane SCEVs; this bound is conservative,
// but it probably doesn't matter.
if (K > 1000)
return SE.getCouldNotCompute();
unsigned W = SE.getTypeSizeInBits(ResultTy);
// Calculate K! / 2^T and T; we divide out the factors of two before
// multiplying for calculating K! / 2^T to avoid overflow.
// Other overflow doesn't matter because we only care about the bottom
// W bits of the result.
APInt OddFactorial(W, 1);
unsigned T = 1;
for (unsigned i = 3; i <= K; ++i) {
APInt Mult(W, i);
unsigned TwoFactors = Mult.countTrailingZeros();
T += TwoFactors;
Mult = Mult.lshr(TwoFactors);
OddFactorial *= Mult;
}
// We need at least W + T bits for the multiplication step
unsigned CalculationBits = W + T;
// Calculate 2^T, at width T+W.
APInt DivFactor = APInt::getOneBitSet(CalculationBits, T);
// Calculate the multiplicative inverse of K! / 2^T;
// this multiplication factor will perform the exact division by
// K! / 2^T.
APInt Mod = APInt::getSignedMinValue(W+1);
APInt MultiplyFactor = OddFactorial.zext(W+1);
MultiplyFactor = MultiplyFactor.multiplicativeInverse(Mod);
MultiplyFactor = MultiplyFactor.trunc(W);
// Calculate the product, at width T+W
IntegerType *CalculationTy = IntegerType::get(SE.getContext(),
CalculationBits);
const SCEV *Dividend = SE.getTruncateOrZeroExtend(It, CalculationTy);
for (unsigned i = 1; i != K; ++i) {
const SCEV *S = SE.getMinusSCEV(It, SE.getConstant(It->getType(), i));
Dividend = SE.getMulExpr(Dividend,
SE.getTruncateOrZeroExtend(S, CalculationTy));
}
// Divide by 2^T
const SCEV *DivResult = SE.getUDivExpr(Dividend, SE.getConstant(DivFactor));
// Truncate the result, and divide by K! / 2^T.
return SE.getMulExpr(SE.getConstant(MultiplyFactor),
SE.getTruncateOrZeroExtend(DivResult, ResultTy));
}
/// Return the value of this chain of recurrences at the specified iteration
/// number. We can evaluate this recurrence by multiplying each element in the
/// chain by the binomial coefficient corresponding to it. In other words, we
/// can evaluate {A,+,B,+,C,+,D} as:
///
/// A*BC(It, 0) + B*BC(It, 1) + C*BC(It, 2) + D*BC(It, 3)
///
/// where BC(It, k) stands for binomial coefficient.
///
const SCEV *SCEVAddRecExpr::evaluateAtIteration(const SCEV *It,
ScalarEvolution &SE) const {
const SCEV *Result = getStart();
for (unsigned i = 1, e = getNumOperands(); i != e; ++i) {
// The computation is correct in the face of overflow provided that the
// multiplication is performed _after_ the evaluation of the binomial
// coefficient.
const SCEV *Coeff = BinomialCoefficient(It, i, SE, getType());
if (isa(Coeff))
return Coeff;
Result = SE.getAddExpr(Result, SE.getMulExpr(getOperand(i), Coeff));
}
return Result;
}
//===----------------------------------------------------------------------===//
// SCEV Expression folder implementations
//===----------------------------------------------------------------------===//
const SCEV *ScalarEvolution::getTruncateExpr(const SCEV *Op,
Type *Ty) {
assert(getTypeSizeInBits(Op->getType()) > getTypeSizeInBits(Ty) &&
"This is not a truncating conversion!");
assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);
FoldingSetNodeID ID;
ID.AddInteger(scTruncate);
ID.AddPointer(Op);
ID.AddPointer(Ty);
void *IP = nullptr;
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
// Fold if the operand is constant.
if (const SCEVConstant *SC = dyn_cast(Op))
return getConstant(
cast(ConstantExpr::getTrunc(SC->getValue(), Ty)));
// trunc(trunc(x)) --> trunc(x)
if (const SCEVTruncateExpr *ST = dyn_cast(Op))
return getTruncateExpr(ST->getOperand(), Ty);
// trunc(sext(x)) --> sext(x) if widening or trunc(x) if narrowing
if (const SCEVSignExtendExpr *SS = dyn_cast(Op))
return getTruncateOrSignExtend(SS->getOperand(), Ty);
// trunc(zext(x)) --> zext(x) if widening or trunc(x) if narrowing
if (const SCEVZeroExtendExpr *SZ = dyn_cast(Op))
return getTruncateOrZeroExtend(SZ->getOperand(), Ty);
// trunc(x1+x2+...+xN) --> trunc(x1)+trunc(x2)+...+trunc(xN) if we can
// eliminate all the truncates, or we replace other casts with truncates.
if (const SCEVAddExpr *SA = dyn_cast(Op)) {
SmallVector Operands;
bool hasTrunc = false;
for (unsigned i = 0, e = SA->getNumOperands(); i != e && !hasTrunc; ++i) {
const SCEV *S = getTruncateExpr(SA->getOperand(i), Ty);
if (!isa(SA->getOperand(i)))
hasTrunc = isa(S);
Operands.push_back(S);
}
if (!hasTrunc)
return getAddExpr(Operands);
UniqueSCEVs.FindNodeOrInsertPos(ID, IP); // Mutates IP, returns NULL.
}
// trunc(x1*x2*...*xN) --> trunc(x1)*trunc(x2)*...*trunc(xN) if we can
// eliminate all the truncates, or we replace other casts with truncates.
if (const SCEVMulExpr *SM = dyn_cast(Op)) {
SmallVector Operands;
bool hasTrunc = false;
for (unsigned i = 0, e = SM->getNumOperands(); i != e && !hasTrunc; ++i) {
const SCEV *S = getTruncateExpr(SM->getOperand(i), Ty);
if (!isa(SM->getOperand(i)))
hasTrunc = isa(S);
Operands.push_back(S);
}
if (!hasTrunc)
return getMulExpr(Operands);
UniqueSCEVs.FindNodeOrInsertPos(ID, IP); // Mutates IP, returns NULL.
}
// If the input value is a chrec scev, truncate the chrec's operands.
if (const SCEVAddRecExpr *AddRec = dyn_cast(Op)) {
SmallVector Operands;
for (const SCEV *Op : AddRec->operands())
Operands.push_back(getTruncateExpr(Op, Ty));
return getAddRecExpr(Operands, AddRec->getLoop(), SCEV::FlagAnyWrap);
}
// The cast wasn't folded; create an explicit cast node. We can reuse
// the existing insert position since if we get here, we won't have
// made any changes which would invalidate it.
SCEV *S = new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator),
Op, Ty);
UniqueSCEVs.InsertNode(S, IP);
return S;
}
// Get the limit of a recurrence such that incrementing by Step cannot cause
// signed overflow as long as the value of the recurrence within the
// loop does not exceed this limit before incrementing.
static const SCEV *getSignedOverflowLimitForStep(const SCEV *Step,
ICmpInst::Predicate *Pred,
ScalarEvolution *SE) {
unsigned BitWidth = SE->getTypeSizeInBits(Step->getType());
if (SE->isKnownPositive(Step)) {
*Pred = ICmpInst::ICMP_SLT;
return SE->getConstant(APInt::getSignedMinValue(BitWidth) -
SE->getSignedRange(Step).getSignedMax());
}
if (SE->isKnownNegative(Step)) {
*Pred = ICmpInst::ICMP_SGT;
return SE->getConstant(APInt::getSignedMaxValue(BitWidth) -
SE->getSignedRange(Step).getSignedMin());
}
return nullptr;
}
// Get the limit of a recurrence such that incrementing by Step cannot cause
// unsigned overflow as long as the value of the recurrence within the loop does
// not exceed this limit before incrementing.
static const SCEV *getUnsignedOverflowLimitForStep(const SCEV *Step,
ICmpInst::Predicate *Pred,
ScalarEvolution *SE) {
unsigned BitWidth = SE->getTypeSizeInBits(Step->getType());
*Pred = ICmpInst::ICMP_ULT;
return SE->getConstant(APInt::getMinValue(BitWidth) -
SE->getUnsignedRange(Step).getUnsignedMax());
}
namespace {
struct ExtendOpTraitsBase {
typedef const SCEV *(ScalarEvolution::*GetExtendExprTy)(const SCEV *, Type *);
};
// Used to make code generic over signed and unsigned overflow.
template struct ExtendOpTraits {
// Members present:
//
// static const SCEV::NoWrapFlags WrapType;
//
// static const ExtendOpTraitsBase::GetExtendExprTy GetExtendExpr;
//
// static const SCEV *getOverflowLimitForStep(const SCEV *Step,
// ICmpInst::Predicate *Pred,
// ScalarEvolution *SE);
};
template <>
struct ExtendOpTraits : public ExtendOpTraitsBase {
static const SCEV::NoWrapFlags WrapType = SCEV::FlagNSW;
static const GetExtendExprTy GetExtendExpr;
static const SCEV *getOverflowLimitForStep(const SCEV *Step,
ICmpInst::Predicate *Pred,
ScalarEvolution *SE) {
return getSignedOverflowLimitForStep(Step, Pred, SE);
}
};
const ExtendOpTraitsBase::GetExtendExprTy ExtendOpTraits<
SCEVSignExtendExpr>::GetExtendExpr = &ScalarEvolution::getSignExtendExpr;
template <>
struct ExtendOpTraits : public ExtendOpTraitsBase {
static const SCEV::NoWrapFlags WrapType = SCEV::FlagNUW;
static const GetExtendExprTy GetExtendExpr;
static const SCEV *getOverflowLimitForStep(const SCEV *Step,
ICmpInst::Predicate *Pred,
ScalarEvolution *SE) {
return getUnsignedOverflowLimitForStep(Step, Pred, SE);
}
};
const ExtendOpTraitsBase::GetExtendExprTy ExtendOpTraits<
SCEVZeroExtendExpr>::GetExtendExpr = &ScalarEvolution::getZeroExtendExpr;
}
// The recurrence AR has been shown to have no signed/unsigned wrap or something
// close to it. Typically, if we can prove NSW/NUW for AR, then we can just as
// easily prove NSW/NUW for its preincrement or postincrement sibling. This
// allows normalizing a sign/zero extended AddRec as such: {sext/zext(Step +
// Start),+,Step} => {(Step + sext/zext(Start),+,Step} As a result, the
// expression "Step + sext/zext(PreIncAR)" is congruent with
// "sext/zext(PostIncAR)"
template
static const SCEV *getPreStartForExtend(const SCEVAddRecExpr *AR, Type *Ty,
ScalarEvolution *SE) {
auto WrapType = ExtendOpTraits::WrapType;
auto GetExtendExpr = ExtendOpTraits::GetExtendExpr;
const Loop *L = AR->getLoop();
const SCEV *Start = AR->getStart();
const SCEV *Step = AR->getStepRecurrence(*SE);
// Check for a simple looking step prior to loop entry.
const SCEVAddExpr *SA = dyn_cast(Start);
if (!SA)
return nullptr;
// Create an AddExpr for "PreStart" after subtracting Step. Full SCEV
// subtraction is expensive. For this purpose, perform a quick and dirty
// difference, by checking for Step in the operand list.
SmallVector DiffOps;
for (const SCEV *Op : SA->operands())
if (Op != Step)
DiffOps.push_back(Op);
if (DiffOps.size() == SA->getNumOperands())
return nullptr;
// Try to prove `WrapType` (SCEV::FlagNSW or SCEV::FlagNUW) on `PreStart` +
// `Step`:
// 1. NSW/NUW flags on the step increment.
auto PreStartFlags =
ScalarEvolution::maskFlags(SA->getNoWrapFlags(), SCEV::FlagNUW);
const SCEV *PreStart = SE->getAddExpr(DiffOps, PreStartFlags);
const SCEVAddRecExpr *PreAR = dyn_cast(
SE->getAddRecExpr(PreStart, Step, L, SCEV::FlagAnyWrap));
// "{S,+,X} is /" and "the backedge is taken at least once" implies
// "S+X does not sign/unsign-overflow".
//
const SCEV *BECount = SE->getBackedgeTakenCount(L);
if (PreAR && PreAR->getNoWrapFlags(WrapType) &&
!isa(BECount) && SE->isKnownPositive(BECount))
return PreStart;
// 2. Direct overflow check on the step operation's expression.
unsigned BitWidth = SE->getTypeSizeInBits(AR->getType());
Type *WideTy = IntegerType::get(SE->getContext(), BitWidth * 2);
const SCEV *OperandExtendedStart =
SE->getAddExpr((SE->*GetExtendExpr)(PreStart, WideTy),
(SE->*GetExtendExpr)(Step, WideTy));
if ((SE->*GetExtendExpr)(Start, WideTy) == OperandExtendedStart) {
if (PreAR && AR->getNoWrapFlags(WrapType)) {
// If we know `AR` == {`PreStart`+`Step`,+,`Step`} is `WrapType` (FlagNSW
// or FlagNUW) and that `PreStart` + `Step` is `WrapType` too, then
// `PreAR` == {`PreStart`,+,`Step`} is also `WrapType`. Cache this fact.
const_cast(PreAR)->setNoWrapFlags(WrapType);
}
return PreStart;
}
// 3. Loop precondition.
ICmpInst::Predicate Pred;
const SCEV *OverflowLimit =
ExtendOpTraits::getOverflowLimitForStep(Step, &Pred, SE);
if (OverflowLimit &&
SE->isLoopEntryGuardedByCond(L, Pred, PreStart, OverflowLimit))
return PreStart;
return nullptr;
}
// Get the normalized zero or sign extended expression for this AddRec's Start.
template
static const SCEV *getExtendAddRecStart(const SCEVAddRecExpr *AR, Type *Ty,
ScalarEvolution *SE) {
auto GetExtendExpr = ExtendOpTraits::GetExtendExpr;
const SCEV *PreStart = getPreStartForExtend(AR, Ty, SE);
if (!PreStart)
return (SE->*GetExtendExpr)(AR->getStart(), Ty);
return SE->getAddExpr((SE->*GetExtendExpr)(AR->getStepRecurrence(*SE), Ty),
(SE->*GetExtendExpr)(PreStart, Ty));
}
// Try to prove away overflow by looking at "nearby" add recurrences. A
// motivating example for this rule: if we know `{0,+,4}` is `ult` `-1` and it
// does not itself wrap then we can conclude that `{1,+,4}` is `nuw`.
//
// Formally:
//
// {S,+,X} == {S-T,+,X} + T
// => Ext({S,+,X}) == Ext({S-T,+,X} + T)
//
// If ({S-T,+,X} + T) does not overflow ... (1)
//
// RHS == Ext({S-T,+,X} + T) == Ext({S-T,+,X}) + Ext(T)
//
// If {S-T,+,X} does not overflow ... (2)
//
// RHS == Ext({S-T,+,X}) + Ext(T) == {Ext(S-T),+,Ext(X)} + Ext(T)
// == {Ext(S-T)+Ext(T),+,Ext(X)}
//
// If (S-T)+T does not overflow ... (3)
//
// RHS == {Ext(S-T)+Ext(T),+,Ext(X)} == {Ext(S-T+T),+,Ext(X)}
// == {Ext(S),+,Ext(X)} == LHS
//
// Thus, if (1), (2) and (3) are true for some T, then
// Ext({S,+,X}) == {Ext(S),+,Ext(X)}
//
// (3) is implied by (1) -- "(S-T)+T does not overflow" is simply "({S-T,+,X}+T)
// does not overflow" restricted to the 0th iteration. Therefore we only need
// to check for (1) and (2).
//
// In the current context, S is `Start`, X is `Step`, Ext is `ExtendOpTy` and T
// is `Delta` (defined below).
//
template
bool ScalarEvolution::proveNoWrapByVaryingStart(const SCEV *Start,
const SCEV *Step,
const Loop *L) {
auto WrapType = ExtendOpTraits::WrapType;
// We restrict `Start` to a constant to prevent SCEV from spending too much
// time here. It is correct (but more expensive) to continue with a
// non-constant `Start` and do a general SCEV subtraction to compute
// `PreStart` below.
//
const SCEVConstant *StartC = dyn_cast(Start);
if (!StartC)
return false;
APInt StartAI = StartC->getAPInt();
for (unsigned Delta : {-2, -1, 1, 2}) {
const SCEV *PreStart = getConstant(StartAI - Delta);
FoldingSetNodeID ID;
ID.AddInteger(scAddRecExpr);
ID.AddPointer(PreStart);
ID.AddPointer(Step);
ID.AddPointer(L);
void *IP = nullptr;
const auto *PreAR =
static_cast(UniqueSCEVs.FindNodeOrInsertPos(ID, IP));
// Give up if we don't already have the add recurrence we need because
// actually constructing an add recurrence is relatively expensive.
if (PreAR && PreAR->getNoWrapFlags(WrapType)) { // proves (2)
const SCEV *DeltaS = getConstant(StartC->getType(), Delta);
ICmpInst::Predicate Pred = ICmpInst::BAD_ICMP_PREDICATE;
const SCEV *Limit = ExtendOpTraits::getOverflowLimitForStep(
DeltaS, &Pred, this);
if (Limit && isKnownPredicate(Pred, PreAR, Limit)) // proves (1)
return true;
}
}
return false;
}
const SCEV *ScalarEvolution::getZeroExtendExpr(const SCEV *Op,
Type *Ty) {
assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&
"This is not an extending conversion!");
assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);
// Fold if the operand is constant.
if (const SCEVConstant *SC = dyn_cast(Op))
return getConstant(
cast(ConstantExpr::getZExt(SC->getValue(), Ty)));
// zext(zext(x)) --> zext(x)
if (const SCEVZeroExtendExpr *SZ = dyn_cast(Op))
return getZeroExtendExpr(SZ->getOperand(), Ty);
// Before doing any expensive analysis, check to see if we've already
// computed a SCEV for this Op and Ty.
FoldingSetNodeID ID;
ID.AddInteger(scZeroExtend);
ID.AddPointer(Op);
ID.AddPointer(Ty);
void *IP = nullptr;
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
// zext(trunc(x)) --> zext(x) or x or trunc(x)
if (const SCEVTruncateExpr *ST = dyn_cast(Op)) {
// It's possible the bits taken off by the truncate were all zero bits. If
// so, we should be able to simplify this further.
const SCEV *X = ST->getOperand();
ConstantRange CR = getUnsignedRange(X);
unsigned TruncBits = getTypeSizeInBits(ST->getType());
unsigned NewBits = getTypeSizeInBits(Ty);
if (CR.truncate(TruncBits).zeroExtend(NewBits).contains(
CR.zextOrTrunc(NewBits)))
return getTruncateOrZeroExtend(X, Ty);
}
// If the input value is a chrec scev, and we can prove that the value
// did not overflow the old, smaller, value, we can zero extend all of the
// operands (often constants). This allows analysis of something like
// this: for (unsigned char X = 0; X < 100; ++X) { int Y = X; }
if (const SCEVAddRecExpr *AR = dyn_cast(Op))
if (AR->isAffine()) {
const SCEV *Start = AR->getStart();
const SCEV *Step = AR->getStepRecurrence(*this);
unsigned BitWidth = getTypeSizeInBits(AR->getType());
const Loop *L = AR->getLoop();
if (!AR->hasNoUnsignedWrap()) {
auto NewFlags = proveNoWrapViaConstantRanges(AR);
const_cast(AR)->setNoWrapFlags(NewFlags);
}
// If we have special knowledge that this addrec won't overflow,
// we don't need to do any further analysis.
if (AR->hasNoUnsignedWrap())
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
// Check whether the backedge-taken count is SCEVCouldNotCompute.
// Note that this serves two purposes: It filters out loops that are
// simply not analyzable, and it covers the case where this code is
// being called from within backedge-taken count analysis, such that
// attempting to ask for the backedge-taken count would likely result
// in infinite recursion. In the later case, the analysis code will
// cope with a conservative value, and it will take care to purge
// that value once it has finished.
const SCEV *MaxBECount = getMaxBackedgeTakenCount(L);
if (!isa(MaxBECount)) {
// Manually compute the final value for AR, checking for
// overflow.
// Check whether the backedge-taken count can be losslessly casted to
// the addrec's type. The count is always unsigned.
const SCEV *CastedMaxBECount =
getTruncateOrZeroExtend(MaxBECount, Start->getType());
const SCEV *RecastedMaxBECount =
getTruncateOrZeroExtend(CastedMaxBECount, MaxBECount->getType());
if (MaxBECount == RecastedMaxBECount) {
Type *WideTy = IntegerType::get(getContext(), BitWidth * 2);
// Check whether Start+Step*MaxBECount has no unsigned overflow.
const SCEV *ZMul = getMulExpr(CastedMaxBECount, Step);
const SCEV *ZAdd = getZeroExtendExpr(getAddExpr(Start, ZMul), WideTy);
const SCEV *WideStart = getZeroExtendExpr(Start, WideTy);
const SCEV *WideMaxBECount =
getZeroExtendExpr(CastedMaxBECount, WideTy);
const SCEV *OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getZeroExtendExpr(Step, WideTy)));
if (ZAdd == OperandExtendedAdd) {
// Cache knowledge of AR NUW, which is propagated to this AddRec.
const_cast(AR)->setNoWrapFlags(SCEV::FlagNUW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
// Similar to above, only this time treat the step value as signed.
// This covers loops that count down.
OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getSignExtendExpr(Step, WideTy)));
if (ZAdd == OperandExtendedAdd) {
// Cache knowledge of AR NW, which is propagated to this AddRec.
// Negative step causes unsigned wrap, but it still can't self-wrap.
const_cast(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
}
}
// Normally, in the cases we can prove no-overflow via a
// backedge guarding condition, we can also compute a backedge
// taken count for the loop. The exceptions are assumptions and
// guards present in the loop -- SCEV is not great at exploiting
// these to compute max backedge taken counts, but can still use
// these to prove lack of overflow. Use this fact to avoid
// doing extra work that may not pay off.
if (!isa(MaxBECount) || HasGuards ||
!AC.assumptions().empty()) {
// If the backedge is guarded by a comparison with the pre-inc
// value the addrec is safe. Also, if the entry is guarded by
// a comparison with the start value and the backedge is
// guarded by a comparison with the post-inc value, the addrec
// is safe.
if (isKnownPositive(Step)) {
const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
getUnsignedRange(Step).getUnsignedMax());
if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N) ||
(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_ULT, Start, N) &&
isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT,
AR->getPostIncExpr(*this), N))) {
// Cache knowledge of AR NUW, which is propagated to this
// AddRec.
const_cast(AR)->setNoWrapFlags(SCEV::FlagNUW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
} else if (isKnownNegative(Step)) {
const SCEV *N = getConstant(APInt::getMaxValue(BitWidth) -
getSignedRange(Step).getSignedMin());
if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_UGT, AR, N) ||
(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_UGT, Start, N) &&
isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_UGT,
AR->getPostIncExpr(*this), N))) {
// Cache knowledge of AR NW, which is propagated to this
// AddRec. Negative step causes unsigned wrap, but it
// still can't self-wrap.
const_cast(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
}
}
if (proveNoWrapByVaryingStart(Start, Step, L)) {
const_cast(AR)->setNoWrapFlags(SCEV::FlagNUW);
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
}
if (auto *SA = dyn_cast(Op)) {
// zext((A + B + ...)) --> (zext(A) + zext(B) + ...)
if (SA->hasNoUnsignedWrap()) {
// If the addition does not unsign overflow then we can, by definition,
// commute the zero extension with the addition operation.
SmallVector Ops;
for (const auto *Op : SA->operands())
Ops.push_back(getZeroExtendExpr(Op, Ty));
return getAddExpr(Ops, SCEV::FlagNUW);
}
}
// The cast wasn't folded; create an explicit cast node.
// Recompute the insert position, as it may have been invalidated.
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
SCEV *S = new (SCEVAllocator) SCEVZeroExtendExpr(ID.Intern(SCEVAllocator),
Op, Ty);
UniqueSCEVs.InsertNode(S, IP);
return S;
}
const SCEV *ScalarEvolution::getSignExtendExpr(const SCEV *Op,
Type *Ty) {
assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&
"This is not an extending conversion!");
assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);
// Fold if the operand is constant.
if (const SCEVConstant *SC = dyn_cast(Op))
return getConstant(
cast(ConstantExpr::getSExt(SC->getValue(), Ty)));
// sext(sext(x)) --> sext(x)
if (const SCEVSignExtendExpr *SS = dyn_cast(Op))
return getSignExtendExpr(SS->getOperand(), Ty);
// sext(zext(x)) --> zext(x)
if (const SCEVZeroExtendExpr *SZ = dyn_cast(Op))
return getZeroExtendExpr(SZ->getOperand(), Ty);
// Before doing any expensive analysis, check to see if we've already
// computed a SCEV for this Op and Ty.
FoldingSetNodeID ID;
ID.AddInteger(scSignExtend);
ID.AddPointer(Op);
ID.AddPointer(Ty);
void *IP = nullptr;
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
// sext(trunc(x)) --> sext(x) or x or trunc(x)
if (const SCEVTruncateExpr *ST = dyn_cast(Op)) {
// It's possible the bits taken off by the truncate were all sign bits. If
// so, we should be able to simplify this further.
const SCEV *X = ST->getOperand();
ConstantRange CR = getSignedRange(X);
unsigned TruncBits = getTypeSizeInBits(ST->getType());
unsigned NewBits = getTypeSizeInBits(Ty);
if (CR.truncate(TruncBits).signExtend(NewBits).contains(
CR.sextOrTrunc(NewBits)))
return getTruncateOrSignExtend(X, Ty);
}
// sext(C1 + (C2 * x)) --> C1 + sext(C2 * x) if C1 < C2
if (auto *SA = dyn_cast(Op)) {
if (SA->getNumOperands() == 2) {
auto *SC1 = dyn_cast(SA->getOperand(0));
auto *SMul = dyn_cast(SA->getOperand(1));
if (SMul && SC1) {
if (auto *SC2 = dyn_cast(SMul->getOperand(0))) {
const APInt &C1 = SC1->getAPInt();
const APInt &C2 = SC2->getAPInt();
if (C1.isStrictlyPositive() && C2.isStrictlyPositive() &&
C2.ugt(C1) && C2.isPowerOf2())
return getAddExpr(getSignExtendExpr(SC1, Ty),
getSignExtendExpr(SMul, Ty));
}
}
}
// sext((A + B + ...)) --> (sext(A) + sext(B) + ...)
if (SA->hasNoSignedWrap()) {
// If the addition does not sign overflow then we can, by definition,
// commute the sign extension with the addition operation.
SmallVector Ops;
for (const auto *Op : SA->operands())
Ops.push_back(getSignExtendExpr(Op, Ty));
return getAddExpr(Ops, SCEV::FlagNSW);
}
}
// If the input value is a chrec scev, and we can prove that the value
// did not overflow the old, smaller, value, we can sign extend all of the
// operands (often constants). This allows analysis of something like
// this: for (signed char X = 0; X < 100; ++X) { int Y = X; }
if (const SCEVAddRecExpr *AR = dyn_cast(Op))
if (AR->isAffine()) {
const SCEV *Start = AR->getStart();
const SCEV *Step = AR->getStepRecurrence(*this);
unsigned BitWidth = getTypeSizeInBits(AR->getType());
const Loop *L = AR->getLoop();
if (!AR->hasNoSignedWrap()) {
auto NewFlags = proveNoWrapViaConstantRanges(AR);
const_cast(AR)->setNoWrapFlags(NewFlags);
}
// If we have special knowledge that this addrec won't overflow,
// we don't need to do any further analysis.
if (AR->hasNoSignedWrap())
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getSignExtendExpr(Step, Ty), L, SCEV::FlagNSW);
// Check whether the backedge-taken count is SCEVCouldNotCompute.
// Note that this serves two purposes: It filters out loops that are
// simply not analyzable, and it covers the case where this code is
// being called from within backedge-taken count analysis, such that
// attempting to ask for the backedge-taken count would likely result
// in infinite recursion. In the later case, the analysis code will
// cope with a conservative value, and it will take care to purge
// that value once it has finished.
const SCEV *MaxBECount = getMaxBackedgeTakenCount(L);
if (!isa(MaxBECount)) {
// Manually compute the final value for AR, checking for
// overflow.
// Check whether the backedge-taken count can be losslessly casted to
// the addrec's type. The count is always unsigned.
const SCEV *CastedMaxBECount =
getTruncateOrZeroExtend(MaxBECount, Start->getType());
const SCEV *RecastedMaxBECount =
getTruncateOrZeroExtend(CastedMaxBECount, MaxBECount->getType());
if (MaxBECount == RecastedMaxBECount) {
Type *WideTy = IntegerType::get(getContext(), BitWidth * 2);
// Check whether Start+Step*MaxBECount has no signed overflow.
const SCEV *SMul = getMulExpr(CastedMaxBECount, Step);
const SCEV *SAdd = getSignExtendExpr(getAddExpr(Start, SMul), WideTy);
const SCEV *WideStart = getSignExtendExpr(Start, WideTy);
const SCEV *WideMaxBECount =
getZeroExtendExpr(CastedMaxBECount, WideTy);
const SCEV *OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getSignExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {
// Cache knowledge of AR NSW, which is propagated to this AddRec.
const_cast(AR)->setNoWrapFlags(SCEV::FlagNSW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
// Similar to above, only this time treat the step value as unsigned.
// This covers loops that count up with an unsigned step.
OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getZeroExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {
// If AR wraps around then
//
// abs(Step) * MaxBECount > unsigned-max(AR->getType())
// => SAdd != OperandExtendedAdd
//
// Thus (AR is not NW => SAdd != OperandExtendedAdd) <=>
// (SAdd == OperandExtendedAdd => AR is NW)
const_cast(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
}
}
// Normally, in the cases we can prove no-overflow via a
// backedge guarding condition, we can also compute a backedge
// taken count for the loop. The exceptions are assumptions and
// guards present in the loop -- SCEV is not great at exploiting
// these to compute max backedge taken counts, but can still use
// these to prove lack of overflow. Use this fact to avoid
// doing extra work that may not pay off.
if (!isa(MaxBECount) || HasGuards ||
!AC.assumptions().empty()) {
// If the backedge is guarded by a comparison with the pre-inc
// value the addrec is safe. Also, if the entry is guarded by
// a comparison with the start value and the backedge is
// guarded by a comparison with the post-inc value, the addrec
// is safe.
ICmpInst::Predicate Pred;
const SCEV *OverflowLimit =
getSignedOverflowLimitForStep(Step, &Pred, this);
if (OverflowLimit &&
(isLoopBackedgeGuardedByCond(L, Pred, AR, OverflowLimit) ||
(isLoopEntryGuardedByCond(L, Pred, Start, OverflowLimit) &&
isLoopBackedgeGuardedByCond(L, Pred, AR->getPostIncExpr(*this),
OverflowLimit)))) {
// Cache knowledge of AR NSW, then propagate NSW to the wide AddRec.
const_cast(AR)->setNoWrapFlags(SCEV::FlagNSW);
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
}
// If Start and Step are constants, check if we can apply this
// transformation:
// sext{C1,+,C2} --> C1 + sext{0,+,C2} if C1 < C2
auto *SC1 = dyn_cast(Start);
auto *SC2 = dyn_cast(Step);
if (SC1 && SC2) {
const APInt &C1 = SC1->getAPInt();
const APInt &C2 = SC2->getAPInt();
if (C1.isStrictlyPositive() && C2.isStrictlyPositive() && C2.ugt(C1) &&
C2.isPowerOf2()) {
Start = getSignExtendExpr(Start, Ty);
const SCEV *NewAR = getAddRecExpr(getZero(AR->getType()), Step, L,
AR->getNoWrapFlags());
return getAddExpr(Start, getSignExtendExpr(NewAR, Ty));
}
}
if (proveNoWrapByVaryingStart(Start, Step, L)) {
const_cast(AR)->setNoWrapFlags(SCEV::FlagNSW);
return getAddRecExpr(
getExtendAddRecStart(AR, Ty, this),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}
}
// If the input value is provably positive and we could not simplify
// away the sext build a zext instead.
if (isKnownNonNegative(Op))
return getZeroExtendExpr(Op, Ty);
// The cast wasn't folded; create an explicit cast node.
// Recompute the insert position, as it may have been invalidated.
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
SCEV *S = new (SCEVAllocator) SCEVSignExtendExpr(ID.Intern(SCEVAllocator),
Op, Ty);
UniqueSCEVs.InsertNode(S, IP);
return S;
}
/// getAnyExtendExpr - Return a SCEV for the given operand extended with
/// unspecified bits out to the given type.
///
const SCEV *ScalarEvolution::getAnyExtendExpr(const SCEV *Op,
Type *Ty) {
assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&
"This is not an extending conversion!");
assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);
// Sign-extend negative constants.
if (const SCEVConstant *SC = dyn_cast(Op))
if (SC->getAPInt().isNegative())
return getSignExtendExpr(Op, Ty);
// Peel off a truncate cast.
if (const SCEVTruncateExpr *T = dyn_cast(Op)) {
const SCEV *NewOp = T->getOperand();
if (getTypeSizeInBits(NewOp->getType()) < getTypeSizeInBits(Ty))
return getAnyExtendExpr(NewOp, Ty);
return getTruncateOrNoop(NewOp, Ty);
}
// Next try a zext cast. If the cast is folded, use it.
const SCEV *ZExt = getZeroExtendExpr(Op, Ty);
if (!isa(ZExt))
return ZExt;
// Next try a sext cast. If the cast is folded, use it.
const SCEV *SExt = getSignExtendExpr(Op, Ty);
if (!isa(SExt))
return SExt;
// Force the cast to be folded into the operands of an addrec.
if (const SCEVAddRecExpr *AR = dyn_cast(Op)) {
SmallVector Ops;
for (const SCEV *Op : AR->operands())
Ops.push_back(getAnyExtendExpr(Op, Ty));
return getAddRecExpr(Ops, AR->getLoop(), SCEV::FlagNW);
}
// If the expression is obviously signed, use the sext cast value.
if (isa(Op))
return SExt;
// Absent any other information, use the zext cast value.
return ZExt;
}
/// Process the given Ops list, which is a list of operands to be added under
/// the given scale, update the given map. This is a helper function for
/// getAddRecExpr. As an example of what it does, given a sequence of operands
/// that would form an add expression like this:
///
/// m + n + 13 + (A * (o + p + (B * (q + m + 29)))) + r + (-1 * r)
///
/// where A and B are constants, update the map with these values:
///
/// (m, 1+A*B), (n, 1), (o, A), (p, A), (q, A*B), (r, 0)
///
/// and add 13 + A*B*29 to AccumulatedConstant.
/// This will allow getAddRecExpr to produce this:
///
/// 13+A*B*29 + n + (m * (1+A*B)) + ((o + p) * A) + (q * A*B)
///
/// This form often exposes folding opportunities that are hidden in
/// the original operand list.
///
/// Return true iff it appears that any interesting folding opportunities
/// may be exposed. This helps getAddRecExpr short-circuit extra work in
/// the common case where no interesting opportunities are present, and
/// is also used as a check to avoid infinite recursion.
///
static bool
CollectAddOperandsWithScales(DenseMap &M,
SmallVectorImpl &NewOps,
APInt &AccumulatedConstant,
const SCEV *const *Ops, size_t NumOperands,
const APInt &Scale,
ScalarEvolution &SE) {
bool Interesting = false;
// Iterate over the add operands. They are sorted, with constants first.
unsigned i = 0;
while (const SCEVConstant *C = dyn_cast(Ops[i])) {
++i;
// Pull a buried constant out to the outside.
if (Scale != 1 || AccumulatedConstant != 0 || C->getValue()->isZero())
Interesting = true;
AccumulatedConstant += Scale * C->getAPInt();
}
// Next comes everything else. We're especially interested in multiplies
// here, but they're in the middle, so just visit the rest with one loop.
for (; i != NumOperands; ++i) {
const SCEVMulExpr *Mul = dyn_cast(Ops[i]);
if (Mul && isa(Mul->getOperand(0))) {
APInt NewScale =
Scale * cast(Mul->getOperand(0))->getAPInt();
if (Mul->getNumOperands() == 2 && isa(Mul->getOperand(1))) {
// A multiplication of a constant with another add; recurse.
const SCEVAddExpr *Add = cast(Mul->getOperand(1));
Interesting |=
CollectAddOperandsWithScales(M, NewOps, AccumulatedConstant,
Add->op_begin(), Add->getNumOperands(),
NewScale, SE);
} else {
// A multiplication of a constant with some other value. Update
// the map.
SmallVector MulOps(Mul->op_begin()+1, Mul->op_end());
const SCEV *Key = SE.getMulExpr(MulOps);
auto Pair = M.insert({Key, NewScale});
if (Pair.second) {
NewOps.push_back(Pair.first->first);
} else {
Pair.first->second += NewScale;
// The map already had an entry for this value, which may indicate
// a folding opportunity.
Interesting = true;
}
}
} else {
// An ordinary operand. Update the map.
std::pair::iterator, bool> Pair =
M.insert({Ops[i], Scale});
if (Pair.second) {
NewOps.push_back(Pair.first->first);
} else {
Pair.first->second += Scale;
// The map already had an entry for this value, which may indicate
// a folding opportunity.
Interesting = true;
}
}
}
return Interesting;
}
// We're trying to construct a SCEV of type `Type' with `Ops' as operands and
// `OldFlags' as can't-wrap behavior. Infer a more aggressive set of
// can't-overflow flags for the operation if possible.
static SCEV::NoWrapFlags
StrengthenNoWrapFlags(ScalarEvolution *SE, SCEVTypes Type,
const SmallVectorImpl &Ops,
SCEV::NoWrapFlags Flags) {
using namespace std::placeholders;
typedef OverflowingBinaryOperator OBO;
bool CanAnalyze =
Type == scAddExpr || Type == scAddRecExpr || Type == scMulExpr;
(void)CanAnalyze;
assert(CanAnalyze && "don't call from other places!");
int SignOrUnsignMask = SCEV::FlagNUW | SCEV::FlagNSW;
SCEV::NoWrapFlags SignOrUnsignWrap =
ScalarEvolution::maskFlags(Flags, SignOrUnsignMask);
// If FlagNSW is true and all the operands are non-negative, infer FlagNUW.
auto IsKnownNonNegative = [&](const SCEV *S) {
return SE->isKnownNonNegative(S);
};
if (SignOrUnsignWrap == SCEV::FlagNSW && all_of(Ops, IsKnownNonNegative))
Flags =
ScalarEvolution::setFlags(Flags, (SCEV::NoWrapFlags)SignOrUnsignMask);
SignOrUnsignWrap = ScalarEvolution::maskFlags(Flags, SignOrUnsignMask);
if (SignOrUnsignWrap != SignOrUnsignMask && Type == scAddExpr &&
Ops.size() == 2 && isa(Ops[0])) {
// (A + C) --> (A + C) if the addition does not sign overflow
// (A + C) --> (A + C) if the addition does not unsign overflow
const APInt &C = cast(Ops[0])->getAPInt();
if (!(SignOrUnsignWrap & SCEV::FlagNSW)) {
auto NSWRegion = ConstantRange::makeGuaranteedNoWrapRegion(
Instruction::Add, C, OBO::NoSignedWrap);
if (NSWRegion.contains(SE->getSignedRange(Ops[1])))
Flags = ScalarEvolution::setFlags(Flags, SCEV::FlagNSW);
}
if (!(SignOrUnsignWrap & SCEV::FlagNUW)) {
auto NUWRegion = ConstantRange::makeGuaranteedNoWrapRegion(
Instruction::Add, C, OBO::NoUnsignedWrap);
if (NUWRegion.contains(SE->getUnsignedRange(Ops[1])))
Flags = ScalarEvolution::setFlags(Flags, SCEV::FlagNUW);
}
}
return Flags;
}
/// Get a canonical add expression, or something simpler if possible.
const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl &Ops,
SCEV::NoWrapFlags Flags) {
assert(!(Flags & ~(SCEV::FlagNUW | SCEV::FlagNSW)) &&
"only nuw or nsw allowed");
assert(!Ops.empty() && "Cannot get empty add!");
if (Ops.size() == 1) return Ops[0];
#ifndef NDEBUG
Type *ETy = getEffectiveSCEVType(Ops[0]->getType());
for (unsigned i = 1, e = Ops.size(); i != e; ++i)
assert(getEffectiveSCEVType(Ops[i]->getType()) == ETy &&
"SCEVAddExpr operand types don't match!");
#endif
// Sort by complexity, this groups all similar expression types together.
GroupByComplexity(Ops, &LI);
Flags = StrengthenNoWrapFlags(this, scAddExpr, Ops, Flags);
// If there are any constants, fold them together.
unsigned Idx = 0;
if (const SCEVConstant *LHSC = dyn_cast(Ops[0])) {
++Idx;
assert(Idx < Ops.size());
while (const SCEVConstant *RHSC = dyn_cast(Ops[Idx])) {
// We found two constants, fold them together!
Ops[0] = getConstant(LHSC->getAPInt() + RHSC->getAPInt());
if (Ops.size() == 2) return Ops[0];
Ops.erase(Ops.begin()+1); // Erase the folded element
LHSC = cast(Ops[0]);
}
// If we are left with a constant zero being added, strip it off.
if (LHSC->getValue()->isZero()) {
Ops.erase(Ops.begin());
--Idx;
}
if (Ops.size() == 1) return Ops[0];
}
// Okay, check to see if the same value occurs in the operand list more than
// once. If so, merge them together into an multiply expression. Since we
// sorted the list, these values are required to be adjacent.
Type *Ty = Ops[0]->getType();
bool FoundMatch = false;
for (unsigned i = 0, e = Ops.size(); i != e-1; ++i)
if (Ops[i] == Ops[i+1]) { // X + Y + Y --> X + Y*2
// Scan ahead to count how many equal operands there are.
unsigned Count = 2;
while (i+Count != e && Ops[i+Count] == Ops[i])
++Count;
// Merge the values into a multiply.
const SCEV *Scale = getConstant(Ty, Count);
const SCEV *Mul = getMulExpr(Scale, Ops[i]);
if (Ops.size() == Count)
return Mul;
Ops[i] = Mul;
Ops.erase(Ops.begin()+i+1, Ops.begin()+i+Count);
--i; e -= Count - 1;
FoundMatch = true;
}
if (FoundMatch)
return getAddExpr(Ops, Flags);
// Check for truncates. If all the operands are truncated from the same
// type, see if factoring out the truncate would permit the result to be
// folded. eg., trunc(x) + m*trunc(n) --> trunc(x + trunc(m)*n)
// if the contents of the resulting outer trunc fold to something simple.
for (; Idx < Ops.size() && isa(Ops[Idx]); ++Idx) {
const SCEVTruncateExpr *Trunc = cast(Ops[Idx]);
Type *DstType = Trunc->getType();
Type *SrcType = Trunc->getOperand()->getType();
SmallVector LargeOps;
bool Ok = true;
// Check all the operands to see if they can be represented in the
// source type of the truncate.
for (unsigned i = 0, e = Ops.size(); i != e; ++i) {
if (const SCEVTruncateExpr *T = dyn_cast(Ops[i])) {
if (T->getOperand()->getType() != SrcType) {
Ok = false;
break;
}
LargeOps.push_back(T->getOperand());
} else if (const SCEVConstant *C = dyn_cast(Ops[i])) {
LargeOps.push_back(getAnyExtendExpr(C, SrcType));
} else if (const SCEVMulExpr *M = dyn_cast(Ops[i])) {
SmallVector LargeMulOps;
for (unsigned j = 0, f = M->getNumOperands(); j != f && Ok; ++j) {
if (const SCEVTruncateExpr *T =
dyn_cast(M->getOperand(j))) {
if (T->getOperand()->getType() != SrcType) {
Ok = false;
break;
}
LargeMulOps.push_back(T->getOperand());
} else if (const auto *C = dyn_cast(M->getOperand(j))) {
LargeMulOps.push_back(getAnyExtendExpr(C, SrcType));
} else {
Ok = false;
break;
}
}
if (Ok)
LargeOps.push_back(getMulExpr(LargeMulOps));
} else {
Ok = false;
break;
}
}
if (Ok) {
// Evaluate the expression in the larger type.
const SCEV *Fold = getAddExpr(LargeOps, Flags);
// If it folds to something simple, use it. Otherwise, don't.
if (isa(Fold) || isa(Fold))
return getTruncateExpr(Fold, DstType);
}
}
// Skip past any other cast SCEVs.
while (Idx < Ops.size() && Ops[Idx]->getSCEVType() < scAddExpr)
++Idx;
// If there are add operands they would be next.
if (Idx < Ops.size()) {
bool DeletedAdd = false;
while (const SCEVAddExpr *Add = dyn_cast(Ops[Idx])) {
// If we have an add, expand the add operands onto the end of the operands
// list.
Ops.erase(Ops.begin()+Idx);
Ops.append(Add->op_begin(), Add->op_end());
DeletedAdd = true;
}
// If we deleted at least one add, we added operands to the end of the list,
// and they are not necessarily sorted. Recurse to resort and resimplify
// any operands we just acquired.
if (DeletedAdd)
return getAddExpr(Ops);
}
// Skip over the add expression until we get to a multiply.
while (Idx < Ops.size() && Ops[Idx]->getSCEVType() < scMulExpr)
++Idx;
// Check to see if there are any folding opportunities present with
// operands multiplied by constant values.
if (Idx < Ops.size() && isa(Ops[Idx])) {
uint64_t BitWidth = getTypeSizeInBits(Ty);
DenseMap M;
SmallVector NewOps;
APInt AccumulatedConstant(BitWidth, 0);
if (CollectAddOperandsWithScales(M, NewOps, AccumulatedConstant,
Ops.data(), Ops.size(),
APInt(BitWidth, 1), *this)) {
struct APIntCompare {
bool operator()(const APInt &LHS, const APInt &RHS) const {
return LHS.ult(RHS);
}
};
// Some interesting folding opportunity is present, so its worthwhile to
// re-generate the operands list. Group the operands by constant scale,
// to avoid multiplying by the same constant scale multiple times.
std::map, APIntCompare> MulOpLists;
for (const SCEV *NewOp : NewOps)
MulOpLists[M.find(NewOp)->second].push_back(NewOp);
// Re-generate the operands list.
Ops.clear();
if (AccumulatedConstant != 0)
Ops.push_back(getConstant(AccumulatedConstant));
for (auto &MulOp : MulOpLists)
if (MulOp.first != 0)
Ops.push_back(getMulExpr(getConstant(MulOp.first),
getAddExpr(MulOp.second)));
if (Ops.empty())
return getZero(Ty);
if (Ops.size() == 1)
return Ops[0];
return getAddExpr(Ops);
}
}
// If we are adding something to a multiply expression, make sure the
// something is not already an operand of the multiply. If so, merge it into
// the multiply.
for (; Idx < Ops.size() && isa(Ops[Idx]); ++Idx) {
const SCEVMulExpr *Mul = cast(Ops[Idx]);
for (unsigned MulOp = 0, e = Mul->getNumOperands(); MulOp != e; ++MulOp) {
const SCEV *MulOpSCEV = Mul->getOperand(MulOp);
if (isa(MulOpSCEV))
continue;
for (unsigned AddOp = 0, e = Ops.size(); AddOp != e; ++AddOp)
if (MulOpSCEV == Ops[AddOp]) {
// Fold W + X + (X * Y * Z) --> W + (X * ((Y*Z)+1))
const SCEV *InnerMul = Mul->getOperand(MulOp == 0);
if (Mul->getNumOperands() != 2) {
// If the multiply has more than two operands, we must get the
// Y*Z term.
SmallVector MulOps(Mul->op_begin(),
Mul->op_begin()+MulOp);
MulOps.append(Mul->op_begin()+MulOp+1, Mul->op_end());
InnerMul = getMulExpr(MulOps);
}
const SCEV *One = getOne(Ty);
const SCEV *AddOne = getAddExpr(One, InnerMul);
const SCEV *OuterMul = getMulExpr(AddOne, MulOpSCEV);
if (Ops.size() == 2) return OuterMul;
if (AddOp < Idx) {
Ops.erase(Ops.begin()+AddOp);
Ops.erase(Ops.begin()+Idx-1);
} else {
Ops.erase(Ops.begin()+Idx);
Ops.erase(Ops.begin()+AddOp-1);
}
Ops.push_back(OuterMul);
return getAddExpr(Ops);
}
// Check this multiply against other multiplies being added together.
for (unsigned OtherMulIdx = Idx+1;
OtherMulIdx < Ops.size() && isa(Ops[OtherMulIdx]);
++OtherMulIdx) {
const SCEVMulExpr *OtherMul = cast(Ops[OtherMulIdx]);
// If MulOp occurs in OtherMul, we can fold the two multiplies
// together.
for (unsigned OMulOp = 0, e = OtherMul->getNumOperands();
OMulOp != e; ++OMulOp)
if (OtherMul->getOperand(OMulOp) == MulOpSCEV) {
// Fold X + (A*B*C) + (A*D*E) --> X + (A*(B*C+D*E))
const SCEV *InnerMul1 = Mul->getOperand(MulOp == 0);
if (Mul->getNumOperands() != 2) {
SmallVector MulOps(Mul->op_begin(),
Mul->op_begin()+MulOp);
MulOps.append(Mul->op_begin()+MulOp+1, Mul->op_end());
InnerMul1 = getMulExpr(MulOps);
}
const SCEV *InnerMul2 = OtherMul->getOperand(OMulOp == 0);
if (OtherMul->getNumOperands() != 2) {
SmallVector MulOps(OtherMul->op_begin(),
OtherMul->op_begin()+OMulOp);
MulOps.append(OtherMul->op_begin()+OMulOp+1, OtherMul->op_end());
InnerMul2 = getMulExpr(MulOps);
}
const SCEV *InnerMulSum = getAddExpr(InnerMul1,InnerMul2);
const SCEV *OuterMul = getMulExpr(MulOpSCEV, InnerMulSum);
if (Ops.size() == 2) return OuterMul;
Ops.erase(Ops.begin()+Idx);
Ops.erase(Ops.begin()+OtherMulIdx-1);
Ops.push_back(OuterMul);
return getAddExpr(Ops);
}
}
}
}
// If there are any add recurrences in the operands list, see if any other
// added values are loop invariant. If so, we can fold them into the
// recurrence.
while (Idx < Ops.size() && Ops[Idx]->getSCEVType() < scAddRecExpr)
++Idx;
// Scan over all recurrences, trying to fold loop invariants into them.
for (; Idx < Ops.size() && isa(Ops[Idx]); ++Idx) {
// Scan all of the other operands to this add and add them to the vector if
// they are loop invariant w.r.t. the recurrence.
SmallVector