LLVM: Frequently Asked Questions

diff --git a/docs/FAQ.html b/docs/FAQ.html index 31fc0c06e6a4..eb162f12cbd6 100644 --- a/docs/FAQ.html +++ b/docs/FAQ.html @@ -1,938 +1,938 @@ LLVM: Frequently Asked Questions

License
1. Why are the LLVM source code and the front-end distributed under different licenses?
2. Does the University of Illinois Open Source License really qualify as an "open source" license?
3. Can I modify LLVM source code and redistribute the modified source?
4. Can I modify LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?
Source code
1. In what language is LLVM written?
2. How portable is the LLVM source code?
Build Problems
1. When I run configure, it finds the wrong C compiler.
2. The configure script finds the right C compiler, but it uses the LLVM linker from a previous build. What do I do?
3. When creating a dynamic library, I get a strange GLIBC error.
4. I've updated my source tree from Subversion, and now my build is trying to use a file/directory that doesn't exist.
5. I've modified a Makefile in my source tree, but my build tree keeps using the old version. What do I do?
6. I've upgraded to a new version of LLVM, and I get strange build errors.
7. I've built LLVM and am testing it, but the tests freeze.
8. Why do test results differ when I perform different types of builds?
9. Compiling LLVM with GCC 3.3.2 fails, what should I do?
10. Compiling LLVM with GCC succeeds, but the resulting tools do not work, what can be wrong?
11. When I use the test suite, all of the C Backend tests fail. What is wrong?
12. After Subversion update, rebuilding gives the error "No rule to make target".
13. The llvmc program gives me errors/doesn't work.
14. When I compile LLVM-GCC with srcdir == objdir, it fails. Why?
Source Languages
Using the GCC Front End
1. When I compile software that uses a configure script, the configure script thinks my system has all of the header files and libraries it is testing for. How do I get configure to work correctly?
2. When I compile code using the LLVM GCC front end, it complains that it cannot find libcrtend.a?
3. How can I disable all optimizations when compiling code using the LLVM GCC front end?
4. Can I use LLVM to convert C++ code to C code?
5. Can I compile C or C++ code to platform-independent LLVM bitcode?
Questions about code generated by the GCC front-end

Written by The LLVM Team

License

Why are the LLVM source code and the front-end distributed under different licenses?

The C/C++ front-ends are based on GCC and must be distributed under the GPL. Our aim is to distribute LLVM source code under a much less restrictive license, in particular one that does not compel users who distribute tools based on modifying the source to redistribute the modified source code as well.

Does the University of Illinois Open Source License really qualify as an "open source" license?

Yes, the license is certified by the Open Source Initiative (OSI).

Can I modify LLVM source code and redistribute the modified source?

Yes. The modified source distribution must retain the copyright notice and follow the three bulletted conditions listed in the LLVM license.

Can I modify LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?

Yes. This is why we distribute LLVM under a less restrictive license than GPL, as explained in the first question above.

Source Code

In what language is LLVM written?

All of the LLVM tools and libraries are written in C++ with extensive use of the STL.

How portable is the LLVM source code?

The LLVM source code should be portable to most modern UNIX-like operating systems. Most of the code is written in standard C++ with operating system services abstracted to a support library. The tools required to build and test LLVM have been ported to a plethora of platforms.

Some porting problems may exist in the following areas:

The GCC front end code is not as portable as the LLVM suite, so it may not compile as well on unsupported platforms.
The LLVM build system relies heavily on UNIX shell tools, like the Bourne Shell and sed. Porting to systems without these tools (MacOS 9, Plan 9) will require more effort.

Build Problems

When I run configure, it finds the wrong C compiler.

The configure script attempts to locate first gcc and then cc, unless it finds compiler paths set in CC and CXX for the C and C++ compiler, respectively.

If configure finds the wrong compiler, either adjust your PATH environment variable or set CC and CXX explicitly.

The configure script finds the right C compiler, but it uses the LLVM linker from a previous build. What do I do?

The configure script uses the PATH to find executables, so if it's grabbing the wrong linker/assembler/etc, there are two ways to fix it:

Adjust your PATH environment variable so that the correct program appears first in the PATH. This may work, but may not be convenient when you want them first in your path for other work.
Run configure with an alternative PATH that is correct. In a Borne compatible shell, the syntax would be:
```
 % PATH=[the path without the bad program] ./configure ...
 
```
This is still somewhat inconvenient, but it allows configure to do its work without having to adjust your PATH permanently.

When creating a dynamic library, I get a strange GLIBC error.

Under some operating systems (i.e. Linux), libtool does not work correctly if GCC was compiled with the --disable-shared option. To work around this, install your own version of GCC that has shared libraries enabled by default.

I've updated my source tree from Subversion, and now my build is trying to use a file/directory that doesn't exist.

You need to re-run configure in your object directory. When new Makefiles are added to the source tree, they have to be copied over to the object tree in order to be used by the build.

I've modified a Makefile in my source tree, but my build tree keeps using the old version. What do I do?

If the Makefile already exists in your object tree, you can just run the following command in the top level directory of your object tree:

 % ./config.status <relative path to Makefile>

If the Makefile is new, you will have to modify the configure script to copy it over.

I've upgraded to a new version of LLVM, and I get strange build errors.

Sometimes, changes to the LLVM source code alters how the build system works. Changes in libtool, autoconf, or header file dependencies are especially prone to this sort of problem.

The best thing to try is to remove the old files and re-build. In most cases, this takes care of the problem. To do this, just type make clean and then make in the directory that fails to build.

I've built LLVM and am testing it, but the tests freeze.

This is most likely occurring because you built a profile or release (optimized) build of LLVM and have not specified the same information on the gmake command line.

For example, if you built LLVM with the command:

 % gmake ENABLE_PROFILING=1

...then you must run the tests with the following commands:

 % cd llvm/test
 % gmake ENABLE_PROFILING=1

Why do test results differ when I perform different types of builds?

The LLVM test suite is dependent upon several features of the LLVM tools and libraries.

First, the debugging assertions in code are not enabled in optimized or profiling builds. Hence, tests that used to fail may pass.

Second, some tests may rely upon debugging options or behavior that is only available in the debug build. These tests will fail in an optimized or profile build.

Compiling LLVM with GCC 3.3.2 fails, what should I do?

This is a bug in GCC, and affects projects other than LLVM. Try upgrading or downgrading your GCC.

Compiling LLVM with GCC succeeds, but the resulting tools do not work, what can be wrong?

Several versions of GCC have shown a weakness in miscompiling the LLVM codebase. Please consult your compiler version (gcc --version) to find out whether it is broken. If so, your only option is to upgrade GCC to a known good version.

After Subversion update, rebuilding gives the error "No rule to make target".

If the error is of the form:

 gmake[2]: *** No rule to make target `/path/to/somefile', needed by
 `/path/to/another/file.d'.

 Stop.

This may occur anytime files are moved within the Subversion repository or removed entirely. In this case, the best solution is to erase all .d files, which list dependencies for source files, and rebuild:

 % cd $LLVM_OBJ_DIR
 % rm -f `find . -name \*\.d` 
 % gmake

In other cases, it may be necessary to run make clean before rebuilding.

The llvmc program gives me errors/doesn't work.

llvmc is experimental and isn't really supported. We suggest using llvm-gcc instead.

When I compile LLVM-GCC with srcdir == objdir, it fails. Why?

The GNUmakefile in the top-level directory of LLVM-GCC is a special Makefile used by Apple to invoke the build_gcc script after setting up a special environment. This has the unfortunate side-effect that trying to build LLVM-GCC with srcdir == objdir in a "non-Apple way" invokes the GNUmakefile instead of Makefile. Because the environment isn't set up correctly to do this, the build fails.

People not building LLVM-GCC the "Apple way" need to build LLVM-GCC with srcdir != objdir, or simply remove the GNUmakefile entirely.

We regret the inconvenience.

Source Languages

What source languages are supported?

LLVM currently has full support for C and C++ source languages. These are available through a special version of GCC that LLVM calls the C Front End

There is an incomplete version of a Java front end available in the java module. There is no documentation on this yet so you'll need to download the code, compile it, and try it.

The PyPy developers are working on integrating LLVM into the PyPy backend so that PyPy language can translate to LLVM.

I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?

Your compiler front-end will communicate with LLVM by creating a module in the LLVM intermediate representation (IR) format. Assuming you want to write your language's compiler in the language itself (rather than C++), there are 3 major ways to tackle generating LLVM IR from a front-end:

Call into the LLVM libraries code using your language's FFI (foreign function interface).
- for: best tracks changes to the LLVM IR, .ll syntax, and .bc format
- for: enables running LLVM optimization passes without a emit/parse overhead
- for: adapts well to a JIT context
- against: lots of ugly glue code to write
Emit LLVM assembly from your compiler's native language.
- for: very straightforward to get started
- against: the .ll parser is slower than the bitcode reader when interfacing to the middle end
- against: you'll have to re-engineer the LLVM IR object model and asm writer in your language
- against: it may be harder to track changes to the IR
Emit LLVM bitcode from your compiler's native language.
- for: can use the more-efficient bitcode reader when interfacing to the middle end
- against: you'll have to re-engineer the LLVM IR object model and bitcode writer in your language
- against: it may be harder to track changes to the IR

If you go with the first option, the C bindings in include/llvm-c should help a lot, since most languages have strong support for interfacing with C. The most common hurdle with calling C from managed code is interfacing with the garbage collector. The C interface was designed to require very little memory management, and so is straightforward in this regard.

What support is there for a higher level source language constructs for building a compiler?

Currently, there isn't much. LLVM supports an intermediate representation which is useful for code representation but will not support the high level (abstract syntax tree) representation needed by most compilers. There are no facilities for lexical nor semantic analysis. There is, however, a mostly implemented configuration-driven compiler driver which simplifies the task of running optimizations, linking, and executable generation.

I don't understand the GetElementPtr instruction. Help!

See The Often Misunderstood GEP Instruction.

Using the GCC Front End

When I compile software that uses a configure script, the configure script thinks my system has all of the header files and libraries it is testing for. How do I get configure to work correctly?

The configure script is getting things wrong because the LLVM linker allows symbols to be undefined at link time (so that they can be resolved during JIT or translation to the C back end). That is why configure thinks your system "has everything."

To work around this, perform the following steps:

Make sure the CC and CXX environment variables contains the full path to the LLVM GCC front end.
Make sure that the regular C compiler is first in your PATH.
Add the string "-Wl,-native" to your CFLAGS environment variable.

This will allow the llvm-ld linker to create a native code executable instead of shell script that runs the JIT. Creating native code requires standard linkage, which in turn will allow the configure script to find out if code is not linking on your system because the feature isn't available on your system.

When I compile code using the LLVM GCC front end, it complains that it cannot find libcrtend.a.

The only way this can happen is if you haven't installed the runtime library. To correct this, do:

 % cd llvm/runtime
 % make clean ; make install-bytecode

How can I disable all optimizations when compiling code using the LLVM GCC front end?

Passing "-Wa,-disable-opt -Wl,-disable-opt" will disable *all* cleanup and optimizations done at the llvm level, leaving you with the truly horrible code that you desire.

Can I use LLVM to convert C++ code to C code?

Yes, you can use LLVM to convert code from any language LLVM supports to C. Note that the generated C code will be very low level (all loops are lowered to gotos, etc) and not very pretty (comments are stripped, original source formatting is totally lost, variables are renamed, expressions are regrouped), so this may not be what you're looking for. Also, there are several limitations noted below.

Use commands like this:

Compile your program as normal with llvm-g++:
+

Compile your program with llvm-g++:

-% llvm-g++ x.cpp -o program
+% llvm-g++ -emit-llvm x.cpp -o program.bc -c

or:

-% llvm-g++ a.cpp -c
-% llvm-g++ b.cpp -c
-% llvm-g++ a.o b.o -o program
+% llvm-g++ a.cpp -c -emit-llvm
+% llvm-g++ b.cpp -c -emit-llvm
+% llvm-ld a.o b.o -o program

With llvm-gcc3, this will generate program and program.bc. The .bc - file is the LLVM version of the program all linked together.

This will generate program and program.bc. The .bc + file is the LLVM version of the program all linked together.

Convert the LLVM code to C code, using the LLC tool with the C backend:
```
 % llc -march=c program.bc -o program.c
 
```
Finally, compile the C file:
```
-% cc x.c
+% cc x.c -lstdc++
 
```

Using LLVM does not eliminate the need for C++ library support. If you use the llvm-g++ front-end, the generated code will depend on g++'s C++ support libraries in the same way that code generated from g++ would. If you use another C++ front-end, the generated code will depend on whatever library that front-end would normally require.

If you are working on a platform that does not provide any C++ libraries, you may be able to manually compile libstdc++ to LLVM bitcode, statically link it into your program, then use the commands above to convert the whole result into C code. Alternatively, you might compile the libraries and your application into two different chunks of C code and link them.

Note that, by default, the C back end does not support exception handling. If you want/need it for a certain program, you can enable it by passing "-enable-correct-eh-support" to the llc program. The resultant code will use setjmp/longjmp to implement exception support that is relatively slow, and not C++-ABI-conforming on most platforms, but otherwise correct.

Also, there are a number of other limitations of the C backend that cause it to produce code that does not fully conform to the C++ ABI on most platforms. Some of the C++ programs in LLVM's test suite are known to fail when compiled with the C back end because of ABI incompatibilities with standard C++ libraries.

Can I compile C or C++ code to platform-independent LLVM bitcode?

No. C and C++ are inherently platform-dependent languages. The most obvious example of this is the preprocessor. A very common way that C code is made portable is by using the preprocessor to include platform-specific code. In practice, information about other platforms is lost after preprocessing, so the result is inherently dependent on the platform that the preprocessing was targeting.

Another example is sizeof. It's common for sizeof(long) to vary between platforms. In most C front-ends, sizeof is expanded to a constant immediately, thus hard-wiring a platform-specific detail.

Also, since many platforms define their ABIs in terms of C, and since LLVM is lower-level than C, front-ends currently must emit platform-specific IR in order to have the result conform to the platform ABI.

Questions about code generated by the GCC front-end

What is this llvm.global_ctors and _GLOBAL__I__tmp_webcompile... stuff that happens when I #include <iostream>?

If you #include the <iostream> header into a C++ translation unit, the file will probably use the std::cin/std::cout/... global objects. However, C++ does not guarantee an order of initialization between static objects in different translation units, so if a static ctor/dtor in your .cpp file used std::cout, for example, the object would not necessarily be automatically initialized before your use.

To make std::cout and friends work correctly in these scenarios, the STL that we use declares a static object that gets created in every translation unit that includes <iostream>. This object has a static constructor and destructor that initializes and destroys the global iostream objects before they could possibly be used in the file. The code that you see in the .ll file corresponds to the constructor and destructor registration code.

If you would like to make it easier to understand the LLVM code generated by the compiler in the demo page, consider using printf() instead of iostreams to print values.

Where did all of my code go??

If you are using the LLVM demo page, you may often wonder what happened to all of the code that you typed in. Remember that the demo script is running the code through the LLVM optimizers, so if your code doesn't actually do anything useful, it might all be deleted.

To prevent this, make sure that the code is actually needed. For example, if you are computing some expression, return the value from the function instead of leaving it in a local variable. If you really want to constrain the optimizer, you can read from and assign to volatile global variables.

What is this "undef" thing that shows up in my code?

undef is the LLVM way of representing a value that is not defined. You can get these if you do not initialize a variable before you use it. For example, the C function:

 int X() { int i; return i; }

Is compiled to "ret i32 undef" because "i" never has a value specified for it.

Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it?

This is a common problem run into by authors of front-ends that are using custom calling conventions: you need to make sure to set the right calling convention on both the function and on each call to the function. For example, this code:

 define fastcc void @foo() {
         ret void
 }
 define void @bar() {
         call void @foo( )
         ret void
 }

Is optimized to:

 define fastcc void @foo() {
 	ret void
 }
 define void @bar() {
 	unreachable
 }

... with "opt -instcombine -simplifycfg". This often bites people because "all their code disappears". Setting the calling convention on the caller and callee is required for indirect calls to work, so people often ask why not make the verifier reject this sort of thing.

The answer is that this code has undefined behavior, but it is not illegal. If we made it illegal, then every transformation that could potentially create this would have to ensure that it doesn't, and there is valid code that can create this sort of construct (in dead code). The sorts of things that can cause this to happen are fairly contrived, but we still need to accept them. Here's an example:

 define fastcc void @foo() {
         ret void
 }
 define internal void @bar(void()* %FP, i1 %cond) {
         br i1 %cond, label %T, label %F
 T:  
         call void %FP()
         ret void
 F:
         call fastcc void %FP()
         ret void
 }
 define void @test() {
         %X = or i1 false, false
         call void @bar(void()* @foo, i1 %X)
         ret void
 }

In this example, "test" always passes @foo/false into bar, which ensures that it is dynamically called with the right calling conv (thus, the code is perfectly well defined). If you run this through the inliner, you get this (the explicit "or" is there so that the inliner doesn't dead code eliminate a bunch of stuff):

 define fastcc void @foo() {
 	ret void
 }
 define void @test() {
 	%X = or i1 false, false
 	br i1 %X, label %T.i, label %F.i
 T.i:
 	call void @foo()
 	br label %bar.exit
 F.i:
 	call fastcc void @foo()
 	br label %bar.exit
 bar.exit:
 	ret void
 }

Here you can see that the inlining pass made an undefined call to @foo with the wrong calling convention. We really don't want to make the inliner have to know about this sort of thing, so it needs to be valid code. In this case, dead code elimination can trivially remove the undefined code. However, if %X was an input argument to @test, the inliner would produce this:

 define fastcc void @foo() {
 	ret void
 }
 
 define void @test(i1 %X) {
 	br i1 %X, label %T.i, label %F.i
 T.i:
 	call void @foo()
 	br label %bar.exit
 F.i:
 	call fastcc void @foo()
 	br label %bar.exit
 bar.exit:
 	ret void
 }

The interesting thing about this is that %X must be false for the code to be well-defined, but no amount of dead code elimination will be able to delete the broken call as unreachable. However, since instcombine/simplifycfg turns the undefined call into unreachable, we end up with a branch on a condition that goes to unreachable: a branch to unreachable can never happen, so "-inline -instcombine -simplifycfg" is able to produce:

 define fastcc void @foo() {
 	ret void
 }
 define void @test(i1 %X) {
 F.i:
 	call fastcc void @foo()
 	ret void
 }

LLVM Compiler Infrastructure
- Last modified: $Date: 2010-02-26 00:41:41 +0100 (Fri, 26 Feb 2010) $ + Last modified: $Date: 2010-05-04 20:16:00 +0200 (Tue, 04 May 2010) $

diff --git a/lib/CodeGen/LiveIntervalAnalysis.cpp b/lib/CodeGen/LiveIntervalAnalysis.cpp index 26a7190110f9..ca9921cd3323 100644 --- a/lib/CodeGen/LiveIntervalAnalysis.cpp +++ b/lib/CodeGen/LiveIntervalAnalysis.cpp @@ -1,2136 +1,2160 @@ //===-- LiveIntervalAnalysis.cpp - Live Interval Analysis -----------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file implements the LiveInterval analysis pass which is used // by the Linear Scan Register allocator. This pass linearizes the // basic blocks of the function in DFS order and uses the // LiveVariables pass to conservatively compute live intervals for // each virtual and physical register. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "liveintervals" #include "llvm/CodeGen/LiveIntervalAnalysis.h" #include "VirtRegMap.h" #include "llvm/Value.h" #include "llvm/Analysis/AliasAnalysis.h" #include "llvm/CodeGen/LiveVariables.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachineMemOperand.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/Passes.h" #include "llvm/CodeGen/ProcessImplicitDefs.h" #include "llvm/Target/TargetRegisterInfo.h" #include "llvm/Target/TargetInstrInfo.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetOptions.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #include "llvm/ADT/DepthFirstIterator.h" #include "llvm/ADT/SmallSet.h" #include "llvm/ADT/Statistic.h" #include "llvm/ADT/STLExtras.h" #include #include #include using namespace llvm; // Hidden options for help debugging. static cl::opt DisableReMat("disable-rematerialization", cl::init(false), cl::Hidden); static cl::opt EnableFastSpilling("fast-spill", cl::init(false), cl::Hidden); STATISTIC(numIntervals , "Number of original intervals"); STATISTIC(numFolds , "Number of loads/stores folded into instructions"); STATISTIC(numSplits , "Number of intervals split"); char LiveIntervals::ID = 0; static RegisterPass X("liveintervals", "Live Interval Analysis"); void LiveIntervals::getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesCFG(); AU.addRequired(); AU.addPreserved(); AU.addPreserved(); AU.addRequired(); AU.addPreservedID(MachineLoopInfoID); AU.addPreservedID(MachineDominatorsID); if (!StrongPHIElim) { AU.addPreservedID(PHIEliminationID); AU.addRequiredID(PHIEliminationID); } AU.addRequiredID(TwoAddressInstructionPassID); AU.addPreserved(); AU.addRequired(); AU.addPreserved(); AU.addRequiredTransitive(); MachineFunctionPass::getAnalysisUsage(AU); } void LiveIntervals::releaseMemory() { // Free the live intervals themselves. for (DenseMap::iterator I = r2iMap_.begin(), E = r2iMap_.end(); I != E; ++I) delete I->second; r2iMap_.clear(); // Release VNInfo memroy regions after all VNInfo objects are dtor'd. VNInfoAllocator.DestroyAll(); while (!CloneMIs.empty()) { MachineInstr *MI = CloneMIs.back(); CloneMIs.pop_back(); mf_->DeleteMachineInstr(MI); } } /// runOnMachineFunction - Register allocate the whole function /// bool LiveIntervals::runOnMachineFunction(MachineFunction &fn) { mf_ = &fn; mri_ = &mf_->getRegInfo(); tm_ = &fn.getTarget(); tri_ = tm_->getRegisterInfo(); tii_ = tm_->getInstrInfo(); aa_ = &getAnalysis(); lv_ = &getAnalysis(); indexes_ = &getAnalysis(); allocatableRegs_ = tri_->getAllocatableSet(fn); computeIntervals(); numIntervals += getNumIntervals(); DEBUG(dump()); return true; } /// print - Implement the dump method. void LiveIntervals::print(raw_ostream &OS, const Module* ) const { OS << "********** INTERVALS **********\n"; for (const_iterator I = begin(), E = end(); I != E; ++I) { I->second->print(OS, tri_); OS << "\n"; } printInstrs(OS); } void LiveIntervals::printInstrs(raw_ostream &OS) const { OS << "********** MACHINEINSTRS **********\n"; for (MachineFunction::iterator mbbi = mf_->begin(), mbbe = mf_->end(); mbbi != mbbe; ++mbbi) { OS << "BB#" << mbbi->getNumber() << ":\t\t# derived from " << mbbi->getName() << "\n"; for (MachineBasicBlock::iterator mii = mbbi->begin(), mie = mbbi->end(); mii != mie; ++mii) { if (mii->isDebugValue()) OS << " \t" << *mii; else OS << getInstructionIndex(mii) << '\t' << *mii; } } } void LiveIntervals::dumpInstrs() const { printInstrs(dbgs()); } bool LiveIntervals::conflictsWithPhysReg(const LiveInterval &li, VirtRegMap &vrm, unsigned reg) { // We don't handle fancy stuff crossing basic block boundaries if (li.ranges.size() != 1) return true; const LiveRange &range = li.ranges.front(); SlotIndex idx = range.start.getBaseIndex(); SlotIndex end = range.end.getPrevSlot().getBaseIndex().getNextIndex(); // Skip deleted instructions MachineInstr *firstMI = getInstructionFromIndex(idx); while (!firstMI && idx != end) { idx = idx.getNextIndex(); firstMI = getInstructionFromIndex(idx); } if (!firstMI) return false; // Find last instruction in range SlotIndex lastIdx = end.getPrevIndex(); MachineInstr *lastMI = getInstructionFromIndex(lastIdx); while (!lastMI && lastIdx != idx) { lastIdx = lastIdx.getPrevIndex(); lastMI = getInstructionFromIndex(lastIdx); } if (!lastMI) return false; // Range cannot cross basic block boundaries or terminators MachineBasicBlock *MBB = firstMI->getParent(); if (MBB != lastMI->getParent() || lastMI->getDesc().isTerminator()) return true; MachineBasicBlock::const_iterator E = lastMI; ++E; for (MachineBasicBlock::const_iterator I = firstMI; I != E; ++I) { const MachineInstr &MI = *I; // Allow copies to and from li.reg unsigned SrcReg, DstReg, SrcSubReg, DstSubReg; if (tii_->isMoveInstr(MI, SrcReg, DstReg, SrcSubReg, DstSubReg)) if (SrcReg == li.reg || DstReg == li.reg) continue; // Check for operands using reg for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) { const MachineOperand& mop = MI.getOperand(i); if (!mop.isReg()) continue; unsigned PhysReg = mop.getReg(); if (PhysReg == 0 || PhysReg == li.reg) continue; if (TargetRegisterInfo::isVirtualRegister(PhysReg)) { if (!vrm.hasPhys(PhysReg)) continue; PhysReg = vrm.getPhys(PhysReg); } if (PhysReg && tri_->regsOverlap(PhysReg, reg)) return true; } } // No conflicts found. return false; } /// conflictsWithSubPhysRegRef - Similar to conflictsWithPhysRegRef except /// it checks for sub-register reference and it can check use as well. bool LiveIntervals::conflictsWithSubPhysRegRef(LiveInterval &li, unsigned Reg, bool CheckUse, SmallPtrSet &JoinedCopies) { for (LiveInterval::Ranges::const_iterator I = li.ranges.begin(), E = li.ranges.end(); I != E; ++I) { for (SlotIndex index = I->start.getBaseIndex(), end = I->end.getPrevSlot().getBaseIndex().getNextIndex(); index != end; index = index.getNextIndex()) { MachineInstr *MI = getInstructionFromIndex(index); if (!MI) continue; // skip deleted instructions if (JoinedCopies.count(MI)) continue; for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) { MachineOperand& MO = MI->getOperand(i); if (!MO.isReg()) continue; if (MO.isUse() && !CheckUse) continue; unsigned PhysReg = MO.getReg(); if (PhysReg == 0 || TargetRegisterInfo::isVirtualRegister(PhysReg)) continue; if (tri_->isSubRegister(Reg, PhysReg)) return true; } } } return false; } #ifndef NDEBUG static void printRegName(unsigned reg, const TargetRegisterInfo* tri_) { if (TargetRegisterInfo::isPhysicalRegister(reg)) dbgs() << tri_->getName(reg); else dbgs() << "%reg" << reg; } #endif +static +bool MultipleDefsByMI(const MachineInstr &MI, unsigned MOIdx) { + unsigned Reg = MI.getOperand(MOIdx).getReg(); + for (unsigned i = MOIdx+1, e = MI.getNumOperands(); i < e; ++i) { + const MachineOperand &MO = MI.getOperand(i); + if (!MO.isReg()) + continue; + if (MO.getReg() == Reg && MO.isDef()) { + assert(MI.getOperand(MOIdx).getSubReg() != MO.getSubReg() && + MI.getOperand(MOIdx).getSubReg() && + MO.getSubReg()); + return true; + } + } + return false; +} + void LiveIntervals::handleVirtualRegisterDef(MachineBasicBlock *mbb, MachineBasicBlock::iterator mi, SlotIndex MIIdx, MachineOperand& MO, unsigned MOIdx, LiveInterval &interval) { DEBUG({ dbgs() << "\t\tregister: "; printRegName(interval.reg, tri_); }); // Virtual registers may be defined multiple times (due to phi // elimination and 2-addr elimination). Much of what we do only has to be // done once for the vreg. We use an empty interval to detect the first // time we see a vreg. LiveVariables::VarInfo& vi = lv_->getVarInfo(interval.reg); if (interval.empty()) { // Get the Idx of the defining instructions. SlotIndex defIndex = MIIdx.getDefIndex(); // Earlyclobbers move back one, so that they overlap the live range // of inputs. if (MO.isEarlyClobber()) defIndex = MIIdx.getUseIndex(); VNInfo *ValNo; MachineInstr *CopyMI = NULL; unsigned SrcReg, DstReg, SrcSubReg, DstSubReg; if (mi->isExtractSubreg() || mi->isInsertSubreg() || mi->isSubregToReg() || tii_->isMoveInstr(*mi, SrcReg, DstReg, SrcSubReg, DstSubReg)) CopyMI = mi; // Earlyclobbers move back one. ValNo = interval.getNextValue(defIndex, CopyMI, true, VNInfoAllocator); assert(ValNo->id == 0 && "First value in interval is not 0?"); // Loop over all of the blocks that the vreg is defined in. There are // two cases we have to handle here. The most common case is a vreg // whose lifetime is contained within a basic block. In this case there // will be a single kill, in MBB, which comes after the definition. if (vi.Kills.size() == 1 && vi.Kills[0]->getParent() == mbb) { // FIXME: what about dead vars? SlotIndex killIdx; if (vi.Kills[0] != mi) killIdx = getInstructionIndex(vi.Kills[0]).getDefIndex(); else killIdx = defIndex.getStoreIndex(); // If the kill happens after the definition, we have an intra-block // live range. if (killIdx > defIndex) { assert(vi.AliveBlocks.empty() && "Shouldn't be alive across any blocks!"); LiveRange LR(defIndex, killIdx, ValNo); interval.addRange(LR); DEBUG(dbgs() << " +" << LR << "\n"); ValNo->addKill(killIdx); return; } } // The other case we handle is when a virtual register lives to the end // of the defining block, potentially live across some blocks, then is // live into some number of blocks, but gets killed. Start by adding a // range that goes from this definition to the end of the defining block. LiveRange NewLR(defIndex, getMBBEndIdx(mbb), ValNo); DEBUG(dbgs() << " +" << NewLR); interval.addRange(NewLR); bool PHIJoin = lv_->isPHIJoin(interval.reg); if (PHIJoin) { // A phi join register is killed at the end of the MBB and revived as a new // valno in the killing blocks. assert(vi.AliveBlocks.empty() && "Phi join can't pass through blocks"); DEBUG(dbgs() << " phi-join"); ValNo->addKill(indexes_->getTerminatorGap(mbb)); ValNo->setHasPHIKill(true); } else { // Iterate over all of the blocks that the variable is completely // live in, adding [insrtIndex(begin), instrIndex(end)+4) to the // live interval. for (SparseBitVector<>::iterator I = vi.AliveBlocks.begin(), E = vi.AliveBlocks.end(); I != E; ++I) { MachineBasicBlock *aliveBlock = mf_->getBlockNumbered(*I); LiveRange LR(getMBBStartIdx(aliveBlock), getMBBEndIdx(aliveBlock), ValNo); interval.addRange(LR); DEBUG(dbgs() << " +" << LR); } } // Finally, this virtual register is live from the start of any killing // block to the 'use' slot of the killing instruction. for (unsigned i = 0, e = vi.Kills.size(); i != e; ++i) { MachineInstr *Kill = vi.Kills[i]; SlotIndex Start = getMBBStartIdx(Kill->getParent()); SlotIndex killIdx = getInstructionIndex(Kill).getDefIndex(); // Create interval with one of a NEW value number. Note that this value // number isn't actually defined by an instruction, weird huh? :) if (PHIJoin) { ValNo = interval.getNextValue(SlotIndex(Start, true), 0, false, VNInfoAllocator); ValNo->setIsPHIDef(true); } LiveRange LR(Start, killIdx, ValNo); interval.addRange(LR); ValNo->addKill(killIdx); DEBUG(dbgs() << " +" << LR); } } else { + if (MultipleDefsByMI(*mi, MOIdx)) + // Mutple defs of the same virtual register by the same instruction. e.g. + // %reg1031:5, %reg1031:6 = VLD1q16 %reg1024, ... + // This is likely due to elimination of REG_SEQUENCE instructions. Return + // here since there is nothing to do. + return; + // If this is the second time we see a virtual register definition, it // must be due to phi elimination or two addr elimination. If this is // the result of two address elimination, then the vreg is one of the // def-and-use register operand. if (mi->isRegTiedToUseOperand(MOIdx)) { // If this is a two-address definition, then we have already processed // the live range. The only problem is that we didn't realize there // are actually two values in the live interval. Because of this we // need to take the LiveRegion that defines this register and split it // into two values. assert(interval.containsOneValue()); SlotIndex DefIndex = interval.getValNumInfo(0)->def.getDefIndex(); SlotIndex RedefIndex = MIIdx.getDefIndex(); if (MO.isEarlyClobber()) RedefIndex = MIIdx.getUseIndex(); const LiveRange *OldLR = interval.getLiveRangeContaining(RedefIndex.getUseIndex()); VNInfo *OldValNo = OldLR->valno; // Delete the initial value, which should be short and continuous, // because the 2-addr copy must be in the same MBB as the redef. interval.removeRange(DefIndex, RedefIndex); // Two-address vregs should always only be redefined once. This means // that at this point, there should be exactly one value number in it. assert(interval.containsOneValue() && "Unexpected 2-addr liveint!"); // The new value number (#1) is defined by the instruction we claimed // defined value #0. VNInfo *ValNo = interval.getNextValue(OldValNo->def, OldValNo->getCopy(), false, // update at * VNInfoAllocator); ValNo->setFlags(OldValNo->getFlags()); // * <- updating here // Value#0 is now defined by the 2-addr instruction. OldValNo->def = RedefIndex; OldValNo->setCopy(0); // Add the new live interval which replaces the range for the input copy. LiveRange LR(DefIndex, RedefIndex, ValNo); DEBUG(dbgs() << " replace range with " << LR); interval.addRange(LR); ValNo->addKill(RedefIndex); // If this redefinition is dead, we need to add a dummy unit live // range covering the def slot. if (MO.isDead()) interval.addRange(LiveRange(RedefIndex, RedefIndex.getStoreIndex(), OldValNo)); DEBUG({ dbgs() << " RESULT: "; interval.print(dbgs(), tri_); }); } else { assert(lv_->isPHIJoin(interval.reg) && "Multiply defined register"); // In the case of PHI elimination, each variable definition is only // live until the end of the block. We've already taken care of the // rest of the live range. SlotIndex defIndex = MIIdx.getDefIndex(); if (MO.isEarlyClobber()) defIndex = MIIdx.getUseIndex(); VNInfo *ValNo; MachineInstr *CopyMI = NULL; unsigned SrcReg, DstReg, SrcSubReg, DstSubReg; if (mi->isExtractSubreg() || mi->isInsertSubreg() || mi->isSubregToReg()|| tii_->isMoveInstr(*mi, SrcReg, DstReg, SrcSubReg, DstSubReg)) CopyMI = mi; ValNo = interval.getNextValue(defIndex, CopyMI, true, VNInfoAllocator); SlotIndex killIndex = getMBBEndIdx(mbb); LiveRange LR(defIndex, killIndex, ValNo); interval.addRange(LR); ValNo->addKill(indexes_->getTerminatorGap(mbb)); ValNo->setHasPHIKill(true); DEBUG(dbgs() << " phi-join +" << LR); } } DEBUG(dbgs() << '\n'); } void LiveIntervals::handlePhysicalRegisterDef(MachineBasicBlock *MBB, MachineBasicBlock::iterator mi, SlotIndex MIIdx, MachineOperand& MO, LiveInterval &interval, MachineInstr *CopyMI) { // A physical register cannot be live across basic block, so its // lifetime must end somewhere in its defining basic block. DEBUG({ dbgs() << "\t\tregister: "; printRegName(interval.reg, tri_); }); SlotIndex baseIndex = MIIdx; SlotIndex start = baseIndex.getDefIndex(); // Earlyclobbers move back one. if (MO.isEarlyClobber()) start = MIIdx.getUseIndex(); SlotIndex end = start; // If it is not used after definition, it is considered dead at // the instruction defining it. Hence its interval is: // [defSlot(def), defSlot(def)+1) // For earlyclobbers, the defSlot was pushed back one; the extra // advance below compensates. if (MO.isDead()) { DEBUG(dbgs() << " dead"); end = start.getStoreIndex(); goto exit; } // If it is not dead on definition, it must be killed by a // subsequent instruction. Hence its interval is: // [defSlot(def), useSlot(kill)+1) baseIndex = baseIndex.getNextIndex(); while (++mi != MBB->end()) { if (mi->isDebugValue()) continue; if (getInstructionFromIndex(baseIndex) == 0) baseIndex = indexes_->getNextNonNullIndex(baseIndex); if (mi->killsRegister(interval.reg, tri_)) { DEBUG(dbgs() << " killed"); end = baseIndex.getDefIndex(); goto exit; } else { int DefIdx = mi->findRegisterDefOperandIdx(interval.reg, false, tri_); if (DefIdx != -1) { if (mi->isRegTiedToUseOperand(DefIdx)) { // Two-address instruction. end = baseIndex.getDefIndex(); } else { // Another instruction redefines the register before it is ever read. // Then the register is essentially dead at the instruction that // defines it. Hence its interval is: // [defSlot(def), defSlot(def)+1) DEBUG(dbgs() << " dead"); end = start.getStoreIndex(); } goto exit; } } baseIndex = baseIndex.getNextIndex(); } // The only case we should have a dead physreg here without a killing or // instruction where we know it's dead is if it is live-in to the function // and never used. Another possible case is the implicit use of the // physical register has been deleted by two-address pass. end = start.getStoreIndex(); exit: assert(start < end && "did not find end of interval?"); // Already exists? Extend old live interval. LiveInterval::iterator OldLR = interval.FindLiveRangeContaining(start); bool Extend = OldLR != interval.end(); VNInfo *ValNo = Extend ? OldLR->valno : interval.getNextValue(start, CopyMI, true, VNInfoAllocator); if (MO.isEarlyClobber() && Extend) ValNo->setHasRedefByEC(true); LiveRange LR(start, end, ValNo); interval.addRange(LR); LR.valno->addKill(end); DEBUG(dbgs() << " +" << LR << '\n'); } void LiveIntervals::handleRegisterDef(MachineBasicBlock *MBB, MachineBasicBlock::iterator MI, SlotIndex MIIdx, MachineOperand& MO, unsigned MOIdx) { if (TargetRegisterInfo::isVirtualRegister(MO.getReg())) handleVirtualRegisterDef(MBB, MI, MIIdx, MO, MOIdx, getOrCreateInterval(MO.getReg())); else if (allocatableRegs_[MO.getReg()]) { MachineInstr *CopyMI = NULL; unsigned SrcReg, DstReg, SrcSubReg, DstSubReg; if (MI->isExtractSubreg() || MI->isInsertSubreg() || MI->isSubregToReg() || tii_->isMoveInstr(*MI, SrcReg, DstReg, SrcSubReg, DstSubReg)) CopyMI = MI; handlePhysicalRegisterDef(MBB, MI, MIIdx, MO, getOrCreateInterval(MO.getReg()), CopyMI); // Def of a register also defines its sub-registers. for (const unsigned* AS = tri_->getSubRegisters(MO.getReg()); *AS; ++AS) // If MI also modifies the sub-register explicitly, avoid processing it // more than once. Do not pass in TRI here so it checks for exact match. if (!MI->modifiesRegister(*AS)) handlePhysicalRegisterDef(MBB, MI, MIIdx, MO, getOrCreateInterval(*AS), 0); } } void LiveIntervals::handleLiveInRegister(MachineBasicBlock *MBB, SlotIndex MIIdx, LiveInterval &interval, bool isAlias) { DEBUG({ dbgs() << "\t\tlivein register: "; printRegName(interval.reg, tri_); }); // Look for kills, if it reaches a def before it's killed, then it shouldn't // be considered a livein. MachineBasicBlock::iterator mi = MBB->begin(); MachineBasicBlock::iterator E = MBB->end(); // Skip over DBG_VALUE at the start of the MBB. if (mi != E && mi->isDebugValue()) { while (++mi != E && mi->isDebugValue()) ; if (mi == E) // MBB is empty except for DBG_VALUE's. return; } SlotIndex baseIndex = MIIdx; SlotIndex start = baseIndex; if (getInstructionFromIndex(baseIndex) == 0) baseIndex = indexes_->getNextNonNullIndex(baseIndex); SlotIndex end = baseIndex; bool SeenDefUse = false; while (mi != E) { if (mi->killsRegister(interval.reg, tri_)) { DEBUG(dbgs() << " killed"); end = baseIndex.getDefIndex(); SeenDefUse = true; break; } else if (mi->modifiesRegister(interval.reg, tri_)) { // Another instruction redefines the register before it is ever read. // Then the register is essentially dead at the instruction that defines // it. Hence its interval is: // [defSlot(def), defSlot(def)+1) DEBUG(dbgs() << " dead"); end = start.getStoreIndex(); SeenDefUse = true; break; } while (++mi != E && mi->isDebugValue()) // Skip over DBG_VALUE. ; if (mi != E) baseIndex = indexes_->getNextNonNullIndex(baseIndex); } // Live-in register might not be used at all. if (!SeenDefUse) { if (isAlias) { DEBUG(dbgs() << " dead"); end = MIIdx.getStoreIndex(); } else { DEBUG(dbgs() << " live through"); end = baseIndex; } } VNInfo *vni = interval.getNextValue(SlotIndex(getMBBStartIdx(MBB), true), 0, false, VNInfoAllocator); vni->setIsPHIDef(true); LiveRange LR(start, end, vni); interval.addRange(LR); LR.valno->addKill(end); DEBUG(dbgs() << " +" << LR << '\n'); } /// computeIntervals - computes the live intervals for virtual /// registers. for some ordering of the machine instructions [1,N] a /// live interval is an interval [i, j) where 1 <= i <= j < N for /// which a variable is live void LiveIntervals::computeIntervals() { DEBUG(dbgs() << "********** COMPUTING LIVE INTERVALS **********\n" << "********** Function: " << ((Value*)mf_->getFunction())->getName() << '\n'); SmallVector UndefUses; for (MachineFunction::iterator MBBI = mf_->begin(), E = mf_->end(); MBBI != E; ++MBBI) { MachineBasicBlock *MBB = MBBI; if (MBB->empty()) continue; // Track the index of the current machine instr. SlotIndex MIIndex = getMBBStartIdx(MBB); DEBUG(dbgs() << "BB#" << MBB->getNumber() << ":\t\t# derived from " << MBB->getName() << "\n"); // Create intervals for live-ins to this BB first. for (MachineBasicBlock::livein_iterator LI = MBB->livein_begin(), LE = MBB->livein_end(); LI != LE; ++LI) { handleLiveInRegister(MBB, MIIndex, getOrCreateInterval(*LI)); // Multiple live-ins can alias the same register. for (const unsigned* AS = tri_->getSubRegisters(*LI); *AS; ++AS) if (!hasInterval(*AS)) handleLiveInRegister(MBB, MIIndex, getOrCreateInterval(*AS), true); } // Skip over empty initial indices. if (getInstructionFromIndex(MIIndex) == 0) MIIndex = indexes_->getNextNonNullIndex(MIIndex); for (MachineBasicBlock::iterator MI = MBB->begin(), miEnd = MBB->end(); MI != miEnd; ++MI) { DEBUG(dbgs() << MIIndex << "\t" << *MI); if (MI->isDebugValue()) continue; // Handle defs. for (int i = MI->getNumOperands() - 1; i >= 0; --i) { MachineOperand &MO = MI->getOperand(i); if (!MO.isReg() || !MO.getReg()) continue; // handle register defs - build intervals if (MO.isDef()) handleRegisterDef(MBB, MI, MIIndex, MO, i); else if (MO.isUndef()) UndefUses.push_back(MO.getReg()); } // Move to the next instr slot. MIIndex = indexes_->getNextNonNullIndex(MIIndex); } } // Create empty intervals for registers defined by implicit_def's (except // for those implicit_def that define values which are liveout of their // blocks. for (unsigned i = 0, e = UndefUses.size(); i != e; ++i) { unsigned UndefReg = UndefUses[i]; (void)getOrCreateInterval(UndefReg); } } LiveInterval* LiveIntervals::createInterval(unsigned reg) { float Weight = TargetRegisterInfo::isPhysicalRegister(reg) ? HUGE_VALF : 0.0F; return new LiveInterval(reg, Weight); } /// dupInterval - Duplicate a live interval. The caller is responsible for /// managing the allocated memory. LiveInterval* LiveIntervals::dupInterval(LiveInterval *li) { LiveInterval *NewLI = createInterval(li->reg); NewLI->Copy(*li, mri_, getVNInfoAllocator()); return NewLI; } /// getVNInfoSourceReg - Helper function that parses the specified VNInfo /// copy field and returns the source register that defines it. unsigned LiveIntervals::getVNInfoSourceReg(const VNInfo *VNI) const { if (!VNI->getCopy()) return 0; if (VNI->getCopy()->isExtractSubreg()) { // If it's extracting out of a physical register, return the sub-register. unsigned Reg = VNI->getCopy()->getOperand(1).getReg(); if (TargetRegisterInfo::isPhysicalRegister(Reg)) { unsigned SrcSubReg = VNI->getCopy()->getOperand(2).getImm(); unsigned DstSubReg = VNI->getCopy()->getOperand(0).getSubReg(); if (SrcSubReg == DstSubReg) // %reg1034:3 = EXTRACT_SUBREG %EDX, 3 // reg1034 can still be coalesced to EDX. return Reg; assert(DstSubReg == 0); Reg = tri_->getSubReg(Reg, VNI->getCopy()->getOperand(2).getImm()); } return Reg; } else if (VNI->getCopy()->isInsertSubreg() || VNI->getCopy()->isSubregToReg()) return VNI->getCopy()->getOperand(2).getReg(); unsigned SrcReg, DstReg, SrcSubReg, DstSubReg; if (tii_->isMoveInstr(*VNI->getCopy(), SrcReg, DstReg, SrcSubReg, DstSubReg)) return SrcReg; llvm_unreachable("Unrecognized copy instruction!"); return 0; } //===----------------------------------------------------------------------===// // Register allocator hooks. // /// getReMatImplicitUse - If the remat definition MI has one (for now, we only /// allow one) virtual register operand, then its uses are implicitly using /// the register. Returns the virtual register. unsigned LiveIntervals::getReMatImplicitUse(const LiveInterval &li, MachineInstr *MI) const { unsigned RegOp = 0; for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) { MachineOperand &MO = MI->getOperand(i); if (!MO.isReg() || !MO.isUse()) continue; unsigned Reg = MO.getReg(); if (Reg == 0 || Reg == li.reg) continue; if (TargetRegisterInfo::isPhysicalRegister(Reg) && !allocatableRegs_[Reg]) continue; // FIXME: For now, only remat MI with at most one register operand. assert(!RegOp && "Can't rematerialize instruction with multiple register operand!"); RegOp = MO.getReg(); #ifndef NDEBUG break; #endif } return RegOp; } /// isValNoAvailableAt - Return true if the val# of the specified interval /// which reaches the given instruction also reaches the specified use index. bool LiveIntervals::isValNoAvailableAt(const LiveInterval &li, MachineInstr *MI, SlotIndex UseIdx) const { SlotIndex Index = getInstructionIndex(MI); VNInfo *ValNo = li.FindLiveRangeContaining(Index)->valno; LiveInterval::const_iterator UI = li.FindLiveRangeContaining(UseIdx); return UI != li.end() && UI->valno == ValNo; } /// isReMaterializable - Returns true if the definition MI of the specified /// val# of the specified interval is re-materializable. bool LiveIntervals::isReMaterializable(const LiveInterval &li, const VNInfo *ValNo, MachineInstr *MI, SmallVectorImpl &SpillIs, bool &isLoad) { if (DisableReMat) return false; if (!tii_->isTriviallyReMaterializable(MI, aa_)) return false; // Target-specific code can mark an instruction as being rematerializable // if it has one virtual reg use, though it had better be something like // a PIC base register which is likely to be live everywhere. unsigned ImpUse = getReMatImplicitUse(li, MI); if (ImpUse) { const LiveInterval &ImpLi = getInterval(ImpUse); for (MachineRegisterInfo::use_nodbg_iterator ri = mri_->use_nodbg_begin(li.reg), re = mri_->use_nodbg_end(); ri != re; ++ri) { MachineInstr *UseMI = &*ri; SlotIndex UseIdx = getInstructionIndex(UseMI); if (li.FindLiveRangeContaining(UseIdx)->valno != ValNo) continue; if (!isValNoAvailableAt(ImpLi, MI, UseIdx)) return false; } // If a register operand of the re-materialized instruction is going to // be spilled next, then it's not legal to re-materialize this instruction. for (unsigned i = 0, e = SpillIs.size(); i != e; ++i) if (ImpUse == SpillIs[i]->reg) return false; } return true; } /// isReMaterializable - Returns true if the definition MI of the specified /// val# of the specified interval is re-materializable. bool LiveIntervals::isReMaterializable(const LiveInterval &li, const VNInfo *ValNo, MachineInstr *MI) { SmallVector Dummy1; bool Dummy2; return isReMaterializable(li, ValNo, MI, Dummy1, Dummy2); } /// isReMaterializable - Returns true if every definition of MI of every /// val# of the specified interval is re-materializable. bool LiveIntervals::isReMaterializable(const LiveInterval &li, SmallVectorImpl &SpillIs, bool &isLoad) { isLoad = false; for (LiveInterval::const_vni_iterator i = li.vni_begin(), e = li.vni_end(); i != e; ++i) { const VNInfo *VNI = *i; if (VNI->isUnused()) continue; // Dead val#. // Is the def for the val# rematerializable? if (!VNI->isDefAccurate()) return false; MachineInstr *ReMatDefMI = getInstructionFromIndex(VNI->def); bool DefIsLoad = false; if (!ReMatDefMI || !isReMaterializable(li, VNI, ReMatDefMI, SpillIs, DefIsLoad)) return false; isLoad |= DefIsLoad; } return true; } /// FilterFoldedOps - Filter out two-address use operands. Return /// true if it finds any issue with the operands that ought to prevent /// folding. static bool FilterFoldedOps(MachineInstr *MI, SmallVector &Ops, unsigned &MRInfo, SmallVector &FoldOps) { MRInfo = 0; for (unsigned i = 0, e = Ops.size(); i != e; ++i) { unsigned OpIdx = Ops[i]; MachineOperand &MO = MI->getOperand(OpIdx); // FIXME: fold subreg use. if (MO.getSubReg()) return true; if (MO.isDef()) MRInfo |= (unsigned)VirtRegMap::isMod; else { // Filter out two-address use operand(s). if (MI->isRegTiedToDefOperand(OpIdx)) { MRInfo = VirtRegMap::isModRef; continue; } MRInfo |= (unsigned)VirtRegMap::isRef; } FoldOps.push_back(OpIdx); } return false; } /// tryFoldMemoryOperand - Attempts to fold either a spill / restore from /// slot / to reg or any rematerialized load into ith operand of specified /// MI. If it is successul, MI is updated with the newly created MI and /// returns true. bool LiveIntervals::tryFoldMemoryOperand(MachineInstr* &MI, VirtRegMap &vrm, MachineInstr *DefMI, SlotIndex InstrIdx, SmallVector &Ops, bool isSS, int Slot, unsigned Reg) { // If it is an implicit def instruction, just delete it. if (MI->isImplicitDef()) { RemoveMachineInstrFromMaps(MI); vrm.RemoveMachineInstrFromMaps(MI); MI->eraseFromParent(); ++numFolds; return true; } // Filter the list of operand indexes that are to be folded. Abort if // any operand will prevent folding. unsigned MRInfo = 0; SmallVector FoldOps; if (FilterFoldedOps(MI, Ops, MRInfo, FoldOps)) return false; // The only time it's safe to fold into a two address instruction is when // it's folding reload and spill from / into a spill stack slot. if (DefMI && (MRInfo & VirtRegMap::isMod)) return false; MachineInstr *fmi = isSS ? tii_->foldMemoryOperand(*mf_, MI, FoldOps, Slot) : tii_->foldMemoryOperand(*mf_, MI, FoldOps, DefMI); if (fmi) { // Remember this instruction uses the spill slot. if (isSS) vrm.addSpillSlotUse(Slot, fmi); // Attempt to fold the memory reference into the instruction. If // we can do this, we don't need to insert spill code. MachineBasicBlock &MBB = *MI->getParent(); if (isSS && !mf_->getFrameInfo()->isImmutableObjectIndex(Slot)) vrm.virtFolded(Reg, MI, fmi, (VirtRegMap::ModRef)MRInfo); vrm.transferSpillPts(MI, fmi); vrm.transferRestorePts(MI, fmi); vrm.transferEmergencySpills(MI, fmi); ReplaceMachineInstrInMaps(MI, fmi); MI = MBB.insert(MBB.erase(MI), fmi); ++numFolds; return true; } return false; } /// canFoldMemoryOperand - Returns true if the specified load / store /// folding is possible. bool LiveIntervals::canFoldMemoryOperand(MachineInstr *MI, SmallVector &Ops, bool ReMat) const { // Filter the list of operand indexes that are to be folded. Abort if // any operand will prevent folding. unsigned MRInfo = 0; SmallVector FoldOps; if (FilterFoldedOps(MI, Ops, MRInfo, FoldOps)) return false; // It's only legal to remat for a use, not a def. if (ReMat && (MRInfo & VirtRegMap::isMod)) return false; return tii_->canFoldMemoryOperand(MI, FoldOps); } bool LiveIntervals::intervalIsInOneMBB(const LiveInterval &li) const { LiveInterval::Ranges::const_iterator itr = li.ranges.begin(); MachineBasicBlock *mbb = indexes_->getMBBCoveringRange(itr->start, itr->end); if (mbb == 0) return false; for (++itr; itr != li.ranges.end(); ++itr) { MachineBasicBlock *mbb2 = indexes_->getMBBCoveringRange(itr->start, itr->end); if (mbb2 != mbb) return false; } return true; } /// rewriteImplicitOps - Rewrite implicit use operands of MI (i.e. uses of /// interval on to-be re-materialized operands of MI) with new register. void LiveIntervals::rewriteImplicitOps(const LiveInterval &li, MachineInstr *MI, unsigned NewVReg, VirtRegMap &vrm) { // There is an implicit use. That means one of the other operand is // being remat'ed and the remat'ed instruction has li.reg as an // use operand. Make sure we rewrite that as well. for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) { MachineOperand &MO = MI->getOperand(i); if (!MO.isReg()) continue; unsigned Reg = MO.getReg(); if (Reg == 0 || TargetRegisterInfo::isPhysicalRegister(Reg)) continue; if (!vrm.isReMaterialized(Reg)) continue; MachineInstr *ReMatMI = vrm.getReMaterializedMI(Reg); MachineOperand *UseMO = ReMatMI->findRegisterUseOperand(li.reg); if (UseMO) UseMO->setReg(NewVReg); } } /// rewriteInstructionForSpills, rewriteInstructionsForSpills - Helper functions /// for addIntervalsForSpills to rewrite uses / defs for the given live range. bool LiveIntervals:: rewriteInstructionForSpills(const LiveInterval &li, const VNInfo *VNI, bool TrySplit, SlotIndex index, SlotIndex end, MachineInstr *MI, MachineInstr *ReMatOrigDefMI, MachineInstr *ReMatDefMI, unsigned Slot, int LdSlot, bool isLoad, bool isLoadSS, bool DefIsReMat, bool CanDelete, VirtRegMap &vrm, const TargetRegisterClass* rc, SmallVector &ReMatIds, const MachineLoopInfo *loopInfo, unsigned &NewVReg, unsigned ImpUse, bool &HasDef, bool &HasUse, DenseMap &MBBVRegsMap, std::vector &NewLIs) { bool CanFold = false; RestartInstruction: for (unsigned i = 0; i != MI->getNumOperands(); ++i) { MachineOperand& mop = MI->getOperand(i); if (!mop.isReg()) continue; unsigned Reg = mop.getReg(); unsigned RegI = Reg; if (Reg == 0 || TargetRegisterInfo::isPhysicalRegister(Reg)) continue; if (Reg != li.reg) continue; bool TryFold = !DefIsReMat; bool FoldSS = true; // Default behavior unless it's a remat. int FoldSlot = Slot; if (DefIsReMat) { // If this is the rematerializable definition MI itself and // all of its uses are rematerialized, simply delete it. if (MI == ReMatOrigDefMI && CanDelete) { DEBUG(dbgs() << "\t\t\t\tErasing re-materializable def: " << *MI << '\n'); RemoveMachineInstrFromMaps(MI); vrm.RemoveMachineInstrFromMaps(MI); MI->eraseFromParent(); break; } // If def for this use can't be rematerialized, then try folding. // If def is rematerializable and it's a load, also try folding. TryFold = !ReMatDefMI || (ReMatDefMI && (MI == ReMatOrigDefMI || isLoad)); if (isLoad) { // Try fold loads (from stack slot, constant pool, etc.) into uses. FoldSS = isLoadSS; FoldSlot = LdSlot; } } // Scan all of the operands of this instruction rewriting operands // to use NewVReg instead of li.reg as appropriate. We do this for // two reasons: // // 1. If the instr reads the same spilled vreg multiple times, we // want to reuse the NewVReg. // 2. If the instr is a two-addr instruction, we are required to // keep the src/dst regs pinned. // // Keep track of whether we replace a use and/or def so that we can // create the spill interval with the appropriate range. HasUse = mop.isUse(); HasDef = mop.isDef(); SmallVector Ops; Ops.push_back(i); for (unsigned j = i+1, e = MI->getNumOperands(); j != e; ++j) { const MachineOperand &MOj = MI->getOperand(j); if (!MOj.isReg()) continue; unsigned RegJ = MOj.getReg(); if (RegJ == 0 || TargetRegisterInfo::isPhysicalRegister(RegJ)) continue; if (RegJ == RegI) { Ops.push_back(j); if (!MOj.isUndef()) { HasUse |= MOj.isUse(); HasDef |= MOj.isDef(); } } } // Create a new virtual register for the spill interval. // Create the new register now so we can map the fold instruction // to the new register so when it is unfolded we get the correct // answer. bool CreatedNewVReg = false; if (NewVReg == 0) { NewVReg = mri_->createVirtualRegister(rc); vrm.grow(); CreatedNewVReg = true; // The new virtual register should get the same allocation hints as the // old one. std::pair Hint = mri_->getRegAllocationHint(Reg); if (Hint.first || Hint.second) mri_->setRegAllocationHint(NewVReg, Hint.first, Hint.second); } if (!TryFold) CanFold = false; else { // Do not fold load / store here if we are splitting. We'll find an // optimal point to insert a load / store later. if (!TrySplit) { if (tryFoldMemoryOperand(MI, vrm, ReMatDefMI, index, Ops, FoldSS, FoldSlot, NewVReg)) { // Folding the load/store can completely change the instruction in // unpredictable ways, rescan it from the beginning. if (FoldSS) { // We need to give the new vreg the same stack slot as the // spilled interval. vrm.assignVirt2StackSlot(NewVReg, FoldSlot); } HasUse = false; HasDef = false; CanFold = false; if (isNotInMIMap(MI)) break; goto RestartInstruction; } } else { // We'll try to fold it later if it's profitable. CanFold = canFoldMemoryOperand(MI, Ops, DefIsReMat); } } mop.setReg(NewVReg); if (mop.isImplicit()) rewriteImplicitOps(li, MI, NewVReg, vrm); // Reuse NewVReg for other reads. for (unsigned j = 0, e = Ops.size(); j != e; ++j) { MachineOperand &mopj = MI->getOperand(Ops[j]); mopj.setReg(NewVReg); if (mopj.isImplicit()) rewriteImplicitOps(li, MI, NewVReg, vrm); } if (CreatedNewVReg) { if (DefIsReMat) { vrm.setVirtIsReMaterialized(NewVReg, ReMatDefMI); if (ReMatIds[VNI->id] == VirtRegMap::MAX_STACK_SLOT) { // Each valnum may have its own remat id. ReMatIds[VNI->id] = vrm.assignVirtReMatId(NewVReg); } else { vrm.assignVirtReMatId(NewVReg, ReMatIds[VNI->id]); } if (!CanDelete || (HasUse && HasDef)) { // If this is a two-addr instruction then its use operands are // rematerializable but its def is not. It should be assigned a // stack slot. vrm.assignVirt2StackSlot(NewVReg, Slot); } } else { vrm.assignVirt2StackSlot(NewVReg, Slot); } } else if (HasUse && HasDef && vrm.getStackSlot(NewVReg) == VirtRegMap::NO_STACK_SLOT) { // If this interval hasn't been assigned a stack slot (because earlier // def is a deleted remat def), do it now. assert(Slot != VirtRegMap::NO_STACK_SLOT); vrm.assignVirt2StackSlot(NewVReg, Slot); } // Re-matting an instruction with virtual register use. Add the // register as an implicit use on the use MI. if (DefIsReMat && ImpUse) MI->addOperand(MachineOperand::CreateReg(ImpUse, false, true)); // Create a new register interval for this spill / remat. LiveInterval &nI = getOrCreateInterval(NewVReg); if (CreatedNewVReg) { NewLIs.push_back(&nI); MBBVRegsMap.insert(std::make_pair(MI->getParent()->getNumber(), NewVReg)); if (TrySplit) vrm.setIsSplitFromReg(NewVReg, li.reg); } if (HasUse) { if (CreatedNewVReg) { LiveRange LR(index.getLoadIndex(), index.getDefIndex(), nI.getNextValue(SlotIndex(), 0, false, VNInfoAllocator)); DEBUG(dbgs() << " +" << LR); nI.addRange(LR); } else { // Extend the split live interval to this def / use. SlotIndex End = index.getDefIndex(); LiveRange LR(nI.ranges[nI.ranges.size()-1].end, End, nI.getValNumInfo(nI.getNumValNums()-1)); DEBUG(dbgs() << " +" << LR); nI.addRange(LR); } } if (HasDef) { LiveRange LR(index.getDefIndex(), index.getStoreIndex(), nI.getNextValue(SlotIndex(), 0, false, VNInfoAllocator)); DEBUG(dbgs() << " +" << LR); nI.addRange(LR); } DEBUG({ dbgs() << "\t\t\t\tAdded new interval: "; nI.print(dbgs(), tri_); dbgs() << '\n'; }); } return CanFold; } bool LiveIntervals::anyKillInMBBAfterIdx(const LiveInterval &li, const VNInfo *VNI, MachineBasicBlock *MBB, SlotIndex Idx) const { SlotIndex End = getMBBEndIdx(MBB); for (unsigned j = 0, ee = VNI->kills.size(); j != ee; ++j) { if (VNI->kills[j].isPHI()) continue; SlotIndex KillIdx = VNI->kills[j]; if (KillIdx > Idx && KillIdx <= End) return true; } return false; } /// RewriteInfo - Keep track of machine instrs that will be rewritten /// during spilling. namespace { struct RewriteInfo { SlotIndex Index; MachineInstr *MI; bool HasUse; bool HasDef; RewriteInfo(SlotIndex i, MachineInstr *mi, bool u, bool d) : Index(i), MI(mi), HasUse(u), HasDef(d) {} }; struct RewriteInfoCompare { bool operator()(const RewriteInfo &LHS, const RewriteInfo &RHS) const { return LHS.Index < RHS.Index; } }; } void LiveIntervals:: rewriteInstructionsForSpills(const LiveInterval &li, bool TrySplit, LiveInterval::Ranges::const_iterator &I, MachineInstr *ReMatOrigDefMI, MachineInstr *ReMatDefMI, unsigned Slot, int LdSlot, bool isLoad, bool isLoadSS, bool DefIsReMat, bool CanDelete, VirtRegMap &vrm, const TargetRegisterClass* rc, SmallVector &ReMatIds, const MachineLoopInfo *loopInfo, BitVector &SpillMBBs, DenseMap > &SpillIdxes, BitVector &RestoreMBBs, DenseMap > &RestoreIdxes, DenseMap &MBBVRegsMap, std::vector &NewLIs) { bool AllCanFold = true; unsigned NewVReg = 0; SlotIndex start = I->start.getBaseIndex(); SlotIndex end = I->end.getPrevSlot().getBaseIndex().getNextIndex(); // First collect all the def / use in this live range that will be rewritten. // Make sure they are sorted according to instruction index. std::vector RewriteMIs; for (MachineRegisterInfo::reg_iterator ri = mri_->reg_begin(li.reg), re = mri_->reg_end(); ri != re; ) { MachineInstr *MI = &*ri; MachineOperand &O = ri.getOperand(); ++ri; if (MI->isDebugValue()) { // Modify DBG_VALUE now that the value is in a spill slot. if (Slot != VirtRegMap::MAX_STACK_SLOT || isLoadSS) { uint64_t Offset = MI->getOperand(1).getImm(); const MDNode *MDPtr = MI->getOperand(2).getMetadata(); DebugLoc DL = MI->getDebugLoc(); int FI = isLoadSS ? LdSlot : (int)Slot; if (MachineInstr *NewDV = tii_->emitFrameIndexDebugValue(*mf_, FI, Offset, MDPtr, DL)) { DEBUG(dbgs() << "Modifying debug info due to spill:" << "\t" << *MI); ReplaceMachineInstrInMaps(MI, NewDV); MachineBasicBlock *MBB = MI->getParent(); MBB->insert(MBB->erase(MI), NewDV); continue; } } DEBUG(dbgs() << "Removing debug info due to spill:" << "\t" << *MI); RemoveMachineInstrFromMaps(MI); vrm.RemoveMachineInstrFromMaps(MI); MI->eraseFromParent(); continue; } assert(!O.isImplicit() && "Spilling register that's used as implicit use?"); SlotIndex index = getInstructionIndex(MI); if (index < start || index >= end) continue; if (O.isUndef()) // Must be defined by an implicit def. It should not be spilled. Note, // this is for correctness reason. e.g. // 8 %reg1024 = IMPLICIT_DEF // 12 %reg1024 = INSERT_SUBREG %reg1024, %reg1025, 2 // The live range [12, 14) are not part of the r1024 live interval since // it's defined by an implicit def. It will not conflicts with live // interval of r1025. Now suppose both registers are spilled, you can // easily see a situation where both registers are reloaded before // the INSERT_SUBREG and both target registers that would overlap. continue; RewriteMIs.push_back(RewriteInfo(index, MI, O.isUse(), O.isDef())); } std::sort(RewriteMIs.begin(), RewriteMIs.end(), RewriteInfoCompare()); unsigned ImpUse = DefIsReMat ? getReMatImplicitUse(li, ReMatDefMI) : 0; // Now rewrite the defs and uses. for (unsigned i = 0, e = RewriteMIs.size(); i != e; ) { RewriteInfo &rwi = RewriteMIs[i]; ++i; SlotIndex index = rwi.Index; bool MIHasUse = rwi.HasUse; bool MIHasDef = rwi.HasDef; MachineInstr *MI = rwi.MI; // If MI def and/or use the same register multiple times, then there // are multiple entries. unsigned NumUses = MIHasUse; while (i != e && RewriteMIs[i].MI == MI) { assert(RewriteMIs[i].Index == index); bool isUse = RewriteMIs[i].HasUse; if (isUse) ++NumUses; MIHasUse |= isUse; MIHasDef |= RewriteMIs[i].HasDef; ++i; } MachineBasicBlock *MBB = MI->getParent(); if (ImpUse && MI != ReMatDefMI) { // Re-matting an instruction with virtual register use. Prevent interval // from being spilled. getInterval(ImpUse).markNotSpillable(); } unsigned MBBId = MBB->getNumber(); unsigned ThisVReg = 0; if (TrySplit) { DenseMap::iterator NVI = MBBVRegsMap.find(MBBId); if (NVI != MBBVRegsMap.end()) { ThisVReg = NVI->second; // One common case: // x = use // ... // ... // def = ... // = use // It's better to start a new interval to avoid artifically // extend the new interval. if (MIHasDef && !MIHasUse) { MBBVRegsMap.erase(MBB->getNumber()); ThisVReg = 0; } } } bool IsNew = ThisVReg == 0; if (IsNew) { // This ends the previous live interval. If all of its def / use // can be folded, give it a low spill weight. if (NewVReg && TrySplit && AllCanFold) { LiveInterval &nI = getOrCreateInterval(NewVReg); nI.weight /= 10.0F; } AllCanFold = true; } NewVReg = ThisVReg; bool HasDef = false; bool HasUse = false; bool CanFold = rewriteInstructionForSpills(li, I->valno, TrySplit, index, end, MI, ReMatOrigDefMI, ReMatDefMI, Slot, LdSlot, isLoad, isLoadSS, DefIsReMat, CanDelete, vrm, rc, ReMatIds, loopInfo, NewVReg, ImpUse, HasDef, HasUse, MBBVRegsMap, NewLIs); if (!HasDef && !HasUse) continue; AllCanFold &= CanFold; // Update weight of spill interval. LiveInterval &nI = getOrCreateInterval(NewVReg); if (!TrySplit) { // The spill weight is now infinity as it cannot be spilled again. nI.markNotSpillable(); continue; } // Keep track of the last def and first use in each MBB. if (HasDef) { if (MI != ReMatOrigDefMI || !CanDelete) { bool HasKill = false; if (!HasUse) HasKill = anyKillInMBBAfterIdx(li, I->valno, MBB, index.getDefIndex()); else { // If this is a two-address code, then this index starts a new VNInfo. const VNInfo *VNI = li.findDefinedVNInfoForRegInt(index.getDefIndex()); if (VNI) HasKill = anyKillInMBBAfterIdx(li, VNI, MBB, index.getDefIndex()); } DenseMap >::iterator SII = SpillIdxes.find(MBBId); if (!HasKill) { if (SII == SpillIdxes.end()) { std::vector S; S.push_back(SRInfo(index, NewVReg, true)); SpillIdxes.insert(std::make_pair(MBBId, S)); } else if (SII->second.back().vreg != NewVReg) { SII->second.push_back(SRInfo(index, NewVReg, true)); } else if (index > SII->second.back().index) { // If there is an earlier def and this is a two-address // instruction, then it's not possible to fold the store (which // would also fold the load). SRInfo &Info = SII->second.back(); Info.index = index; Info.canFold = !HasUse; } SpillMBBs.set(MBBId); } else if (SII != SpillIdxes.end() && SII->second.back().vreg == NewVReg && index > SII->second.back().index) { // There is an earlier def that's not killed (must be two-address). // The spill is no longer needed. SII->second.pop_back(); if (SII->second.empty()) { SpillIdxes.erase(MBBId); SpillMBBs.reset(MBBId); } } } } if (HasUse) { DenseMap >::iterator SII = SpillIdxes.find(MBBId); if (SII != SpillIdxes.end() && SII->second.back().vreg == NewVReg && index > SII->second.back().index) // Use(s) following the last def, it's not safe to fold the spill. SII->second.back().canFold = false; DenseMap >::iterator RII = RestoreIdxes.find(MBBId); if (RII != RestoreIdxes.end() && RII->second.back().vreg == NewVReg) // If we are splitting live intervals, only fold if it's the first // use and there isn't another use later in the MBB. RII->second.back().canFold = false; else if (IsNew) { // Only need a reload if there isn't an earlier def / use. if (RII == RestoreIdxes.end()) { std::vector Infos; Infos.push_back(SRInfo(index, NewVReg, true)); RestoreIdxes.insert(std::make_pair(MBBId, Infos)); } else { RII->second.push_back(SRInfo(index, NewVReg, true)); } RestoreMBBs.set(MBBId); } } // Update spill weight. unsigned loopDepth = loopInfo->getLoopDepth(MBB); nI.weight += getSpillWeight(HasDef, HasUse, loopDepth); } if (NewVReg && TrySplit && AllCanFold) { // If all of its def / use can be folded, give it a low spill weight. LiveInterval &nI = getOrCreateInterval(NewVReg); nI.weight /= 10.0F; } } bool LiveIntervals::alsoFoldARestore(int Id, SlotIndex index, unsigned vr, BitVector &RestoreMBBs, DenseMap > &RestoreIdxes) { if (!RestoreMBBs[Id]) return false; std::vector &Restores = RestoreIdxes[Id]; for (unsigned i = 0, e = Restores.size(); i != e; ++i) if (Restores[i].index == index && Restores[i].vreg == vr && Restores[i].canFold) return true; return false; } void LiveIntervals::eraseRestoreInfo(int Id, SlotIndex index, unsigned vr, BitVector &RestoreMBBs, DenseMap > &RestoreIdxes) { if (!RestoreMBBs[Id]) return; std::vector &Restores = RestoreIdxes[Id]; for (unsigned i = 0, e = Restores.size(); i != e; ++i) if (Restores[i].index == index && Restores[i].vreg) Restores[i].index = SlotIndex(); } /// handleSpilledImpDefs - Remove IMPLICIT_DEF instructions which are being /// spilled and create empty intervals for their uses. void LiveIntervals::handleSpilledImpDefs(const LiveInterval &li, VirtRegMap &vrm, const TargetRegisterClass* rc, std::vector &NewLIs) { for (MachineRegisterInfo::reg_iterator ri = mri_->reg_begin(li.reg), re = mri_->reg_end(); ri != re; ) { MachineOperand &O = ri.getOperand(); MachineInstr *MI = &*ri; ++ri; if (MI->isDebugValue()) { // Remove debug info for now. O.setReg(0U); DEBUG(dbgs() << "Removing debug info due to spill:" << "\t" << *MI); continue; } if (O.isDef()) { assert(MI->isImplicitDef() && "Register def was not rewritten?"); RemoveMachineInstrFromMaps(MI); vrm.RemoveMachineInstrFromMaps(MI); MI->eraseFromParent(); } else { // This must be an use of an implicit_def so it's not part of the live // interval. Create a new empty live interval for it. // FIXME: Can we simply erase some of the instructions? e.g. Stores? unsigned NewVReg = mri_->createVirtualRegister(rc); vrm.grow(); vrm.setIsImplicitlyDefined(NewVReg); NewLIs.push_back(&getOrCreateInterval(NewVReg)); for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) { MachineOperand &MO = MI->getOperand(i); if (MO.isReg() && MO.getReg() == li.reg) { MO.setReg(NewVReg); MO.setIsUndef(); } } } } } float LiveIntervals::getSpillWeight(bool isDef, bool isUse, unsigned loopDepth) { // Limit the loop depth ridiculousness. if (loopDepth > 200) loopDepth = 200; // The loop depth is used to roughly estimate the number of times the // instruction is executed. Something like 10^d is simple, but will quickly // overflow a float. This expression behaves like 10^d for small d, but is // more tempered for large d. At d=200 we get 6.7e33 which leaves a bit of // headroom before overflow. float lc = powf(1 + (100.0f / (loopDepth+10)), (float)loopDepth); return (isDef + isUse) * lc; } void LiveIntervals::normalizeSpillWeights(std::vector &NewLIs) { for (unsigned i = 0, e = NewLIs.size(); i != e; ++i) normalizeSpillWeight(*NewLIs[i]); } std::vector LiveIntervals:: addIntervalsForSpillsFast(const LiveInterval &li, const MachineLoopInfo *loopInfo, VirtRegMap &vrm) { unsigned slot = vrm.assignVirt2StackSlot(li.reg); std::vector added; assert(li.isSpillable() && "attempt to spill already spilled interval!"); DEBUG({ dbgs() << "\t\t\t\tadding intervals for spills for interval: "; li.dump(); dbgs() << '\n'; }); const TargetRegisterClass* rc = mri_->getRegClass(li.reg); MachineRegisterInfo::reg_iterator RI = mri_->reg_begin(li.reg); while (RI != mri_->reg_end()) { MachineInstr* MI = &*RI; SmallVector Indices; bool HasUse = false; bool HasDef = false; for (unsigned i = 0; i != MI->getNumOperands(); ++i) { MachineOperand& mop = MI->getOperand(i); if (!mop.isReg() || mop.getReg() != li.reg) continue; HasUse |= MI->getOperand(i).isUse(); HasDef |= MI->getOperand(i).isDef(); Indices.push_back(i); } if (!tryFoldMemoryOperand(MI, vrm, NULL, getInstructionIndex(MI), Indices, true, slot, li.reg)) { unsigned NewVReg = mri_->createVirtualRegister(rc); vrm.grow(); vrm.assignVirt2StackSlot(NewVReg, slot); // create a new register for this spill LiveInterval &nI = getOrCreateInterval(NewVReg); nI.markNotSpillable(); // Rewrite register operands to use the new vreg. for (SmallVectorImpl::iterator I = Indices.begin(), E = Indices.end(); I != E; ++I) { MI->getOperand(*I).setReg(NewVReg); if (MI->getOperand(*I).isUse()) MI->getOperand(*I).setIsKill(true); } // Fill in the new live interval. SlotIndex index = getInstructionIndex(MI); if (HasUse) { LiveRange LR(index.getLoadIndex(), index.getUseIndex(), nI.getNextValue(SlotIndex(), 0, false, getVNInfoAllocator())); DEBUG(dbgs() << " +" << LR); nI.addRange(LR); vrm.addRestorePoint(NewVReg, MI); } if (HasDef) { LiveRange LR(index.getDefIndex(), index.getStoreIndex(), nI.getNextValue(SlotIndex(), 0, false, getVNInfoAllocator())); DEBUG(dbgs() << " +" << LR); nI.addRange(LR); vrm.addSpillPoint(NewVReg, true, MI); } added.push_back(&nI); DEBUG({ dbgs() << "\t\t\t\tadded new interval: "; nI.dump(); dbgs() << '\n'; }); } RI = mri_->reg_begin(li.reg); } return added; } std::vector LiveIntervals:: addIntervalsForSpills(const LiveInterval &li, SmallVectorImpl &SpillIs, const MachineLoopInfo *loopInfo, VirtRegMap &vrm) { if (EnableFastSpilling) return addIntervalsForSpillsFast(li, loopInfo, vrm); assert(li.isSpillable() && "attempt to spill already spilled interval!"); DEBUG({ dbgs() << "\t\t\t\tadding intervals for spills for interval: "; li.print(dbgs(), tri_); dbgs() << '\n'; }); // Each bit specify whether a spill is required in the MBB. BitVector SpillMBBs(mf_->getNumBlockIDs()); DenseMap > SpillIdxes; BitVector RestoreMBBs(mf_->getNumBlockIDs()); DenseMap > RestoreIdxes; DenseMap MBBVRegsMap; std::vector NewLIs; const TargetRegisterClass* rc = mri_->getRegClass(li.reg); unsigned NumValNums = li.getNumValNums(); SmallVector ReMatDefs; ReMatDefs.resize(NumValNums, NULL); SmallVector ReMatOrigDefs; ReMatOrigDefs.resize(NumValNums, NULL); SmallVector ReMatIds; ReMatIds.resize(NumValNums, VirtRegMap::MAX_STACK_SLOT); BitVector ReMatDelete(NumValNums); unsigned Slot = VirtRegMap::MAX_STACK_SLOT; // Spilling a split live interval. It cannot be split any further. Also, // it's also guaranteed to be a single val# / range interval. if (vrm.getPreSplitReg(li.reg)) { vrm.setIsSplitFromReg(li.reg, 0); // Unset the split kill marker on the last use. SlotIndex KillIdx = vrm.getKillPoint(li.reg); if (KillIdx != SlotIndex()) { MachineInstr *KillMI = getInstructionFromIndex(KillIdx); assert(KillMI && "Last use disappeared?"); int KillOp = KillMI->findRegisterUseOperandIdx(li.reg, true); assert(KillOp != -1 && "Last use disappeared?"); KillMI->getOperand(KillOp).setIsKill(false); } vrm.removeKillPoint(li.reg); bool DefIsReMat = vrm.isReMaterialized(li.reg); Slot = vrm.getStackSlot(li.reg); assert(Slot != VirtRegMap::MAX_STACK_SLOT); MachineInstr *ReMatDefMI = DefIsReMat ? vrm.getReMaterializedMI(li.reg) : NULL; int LdSlot = 0; bool isLoadSS = DefIsReMat && tii_->isLoadFromStackSlot(ReMatDefMI, LdSlot); bool isLoad = isLoadSS || (DefIsReMat && (ReMatDefMI->getDesc().canFoldAsLoad())); bool IsFirstRange = true; for (LiveInterval::Ranges::const_iterator I = li.ranges.begin(), E = li.ranges.end(); I != E; ++I) { // If this is a split live interval with multiple ranges, it means there // are two-address instructions that re-defined the value. Only the // first def can be rematerialized! if (IsFirstRange) { // Note ReMatOrigDefMI has already been deleted. rewriteInstructionsForSpills(li, false, I, NULL, ReMatDefMI, Slot, LdSlot, isLoad, isLoadSS, DefIsReMat, false, vrm, rc, ReMatIds, loopInfo, SpillMBBs, SpillIdxes, RestoreMBBs, RestoreIdxes, MBBVRegsMap, NewLIs); } else { rewriteInstructionsForSpills(li, false, I, NULL, 0, Slot, 0, false, false, false, false, vrm, rc, ReMatIds, loopInfo, SpillMBBs, SpillIdxes, RestoreMBBs, RestoreIdxes, MBBVRegsMap, NewLIs); } IsFirstRange = false; } handleSpilledImpDefs(li, vrm, rc, NewLIs); normalizeSpillWeights(NewLIs); return NewLIs; } bool TrySplit = !intervalIsInOneMBB(li); if (TrySplit) ++numSplits; bool NeedStackSlot = false; for (LiveInterval::const_vni_iterator i = li.vni_begin(), e = li.vni_end(); i != e; ++i) { const VNInfo *VNI = *i; unsigned VN = VNI->id; if (VNI->isUnused()) continue; // Dead val#. // Is the def for the val# rematerializable? MachineInstr *ReMatDefMI = VNI->isDefAccurate() ? getInstructionFromIndex(VNI->def) : 0; bool dummy; if (ReMatDefMI && isReMaterializable(li, VNI, ReMatDefMI, SpillIs, dummy)) { // Remember how to remat the def of this val#. ReMatOrigDefs[VN] = ReMatDefMI; // Original def may be modified so we have to make a copy here. MachineInstr *Clone = mf_->CloneMachineInstr(ReMatDefMI); CloneMIs.push_back(Clone); ReMatDefs[VN] = Clone; bool CanDelete = true; if (VNI->hasPHIKill()) { // A kill is a phi node, not all of its uses can be rematerialized. // It must not be deleted. CanDelete = false; // Need a stack slot if there is any live range where uses cannot be // rematerialized. NeedStackSlot = true; } if (CanDelete) ReMatDelete.set(VN); } else { // Need a stack slot if there is any live range where uses cannot be // rematerialized. NeedStackSlot = true; } } // One stack slot per live interval. if (NeedStackSlot && vrm.getPreSplitReg(li.reg) == 0) { if (vrm.getStackSlot(li.reg) == VirtRegMap::NO_STACK_SLOT) Slot = vrm.assignVirt2StackSlot(li.reg); // This case only occurs when the prealloc splitter has already assigned // a stack slot to this vreg. else Slot = vrm.getStackSlot(li.reg); } // Create new intervals and rewrite defs and uses. for (LiveInterval::Ranges::const_iterator I = li.ranges.begin(), E = li.ranges.end(); I != E; ++I) { MachineInstr *ReMatDefMI = ReMatDefs[I->valno->id]; MachineInstr *ReMatOrigDefMI = ReMatOrigDefs[I->valno->id]; bool DefIsReMat = ReMatDefMI != NULL; bool CanDelete = ReMatDelete[I->valno->id]; int LdSlot = 0; bool isLoadSS = DefIsReMat && tii_->isLoadFromStackSlot(ReMatDefMI, LdSlot); bool isLoad = isLoadSS || (DefIsReMat && ReMatDefMI->getDesc().canFoldAsLoad()); rewriteInstructionsForSpills(li, TrySplit, I, ReMatOrigDefMI, ReMatDefMI, Slot, LdSlot, isLoad, isLoadSS, DefIsReMat, CanDelete, vrm, rc, ReMatIds, loopInfo, SpillMBBs, SpillIdxes, RestoreMBBs, RestoreIdxes, MBBVRegsMap, NewLIs); } // Insert spills / restores if we are splitting. if (!TrySplit) { handleSpilledImpDefs(li, vrm, rc, NewLIs); normalizeSpillWeights(NewLIs); return NewLIs; } SmallPtrSet AddedKill; SmallVector Ops; if (NeedStackSlot) { int Id = SpillMBBs.find_first(); while (Id != -1) { std::vector &spills = SpillIdxes[Id]; for (unsigned i = 0, e = spills.size(); i != e; ++i) { SlotIndex index = spills[i].index; unsigned VReg = spills[i].vreg; LiveInterval &nI = getOrCreateInterval(VReg); bool isReMat = vrm.isReMaterialized(VReg); MachineInstr *MI = getInstructionFromIndex(index); bool CanFold = false; bool FoundUse = false; Ops.clear(); if (spills[i].canFold) { CanFold = true; for (unsigned j = 0, ee = MI->getNumOperands(); j != ee; ++j) { MachineOperand &MO = MI->getOperand(j); if (!MO.isReg() || MO.getReg() != VReg) continue; Ops.push_back(j); if (MO.isDef()) continue; if (isReMat || (!FoundUse && !alsoFoldARestore(Id, index, VReg, RestoreMBBs, RestoreIdxes))) { // MI has two-address uses of the same register. If the use // isn't the first and only use in the BB, then we can't fold // it. FIXME: Move this to rewriteInstructionsForSpills. CanFold = false; break; } FoundUse = true; } } // Fold the store into the def if possible. bool Folded = false; if (CanFold && !Ops.empty()) { if (tryFoldMemoryOperand(MI, vrm, NULL, index, Ops, true, Slot,VReg)){ Folded = true; if (FoundUse) { // Also folded uses, do not issue a load. eraseRestoreInfo(Id, index, VReg, RestoreMBBs, RestoreIdxes); nI.removeRange(index.getLoadIndex(), index.getDefIndex()); } nI.removeRange(index.getDefIndex(), index.getStoreIndex()); } } // Otherwise tell the spiller to issue a spill. if (!Folded) { LiveRange *LR = &nI.ranges[nI.ranges.size()-1]; bool isKill = LR->end == index.getStoreIndex(); if (!MI->registerDefIsDead(nI.reg)) // No need to spill a dead def. vrm.addSpillPoint(VReg, isKill, MI); if (isKill) AddedKill.insert(&nI); } } Id = SpillMBBs.find_next(Id); } } int Id = RestoreMBBs.find_first(); while (Id != -1) { std::vector &restores = RestoreIdxes[Id]; for (unsigned i = 0, e = restores.size(); i != e; ++i) { SlotIndex index = restores[i].index; if (index == SlotIndex()) continue; unsigned VReg = restores[i].vreg; LiveInterval &nI = getOrCreateInterval(VReg); bool isReMat = vrm.isReMaterialized(VReg); MachineInstr *MI = getInstructionFromIndex(index); bool CanFold = false; Ops.clear(); if (restores[i].canFold) { CanFold = true; for (unsigned j = 0, ee = MI->getNumOperands(); j != ee; ++j) { MachineOperand &MO = MI->getOperand(j); if (!MO.isReg() || MO.getReg() != VReg) continue; if (MO.isDef()) { // If this restore were to be folded, it would have been folded // already. CanFold = false; break; } Ops.push_back(j); } } // Fold the load into the use if possible. bool Folded = false; if (CanFold && !Ops.empty()) { if (!isReMat) Folded = tryFoldMemoryOperand(MI, vrm, NULL,index,Ops,true,Slot,VReg); else { MachineInstr *ReMatDefMI = vrm.getReMaterializedMI(VReg); int LdSlot = 0; bool isLoadSS = tii_->isLoadFromStackSlot(ReMatDefMI, LdSlot); // If the rematerializable def is a load, also try to fold it. if (isLoadSS || ReMatDefMI->getDesc().canFoldAsLoad()) Folded = tryFoldMemoryOperand(MI, vrm, ReMatDefMI, index, Ops, isLoadSS, LdSlot, VReg); if (!Folded) { unsigned ImpUse = getReMatImplicitUse(li, ReMatDefMI); if (ImpUse) { // Re-matting an instruction with virtual register use. Add the // register as an implicit use on the use MI and mark the register // interval as unspillable. LiveInterval &ImpLi = getInterval(ImpUse); ImpLi.markNotSpillable(); MI->addOperand(MachineOperand::CreateReg(ImpUse, false, true)); } } } } // If folding is not possible / failed, then tell the spiller to issue a // load / rematerialization for us. if (Folded) nI.removeRange(index.getLoadIndex(), index.getDefIndex()); else vrm.addRestorePoint(VReg, MI); } Id = RestoreMBBs.find_next(Id); } // Finalize intervals: add kills, finalize spill weights, and filter out // dead intervals. std::vector RetNewLIs; for (unsigned i = 0, e = NewLIs.size(); i != e; ++i) { LiveInterval *LI = NewLIs[i]; if (!LI->empty()) { LI->weight /= SlotIndex::NUM * getApproximateInstructionCount(*LI); if (!AddedKill.count(LI)) { LiveRange *LR = &LI->ranges[LI->ranges.size()-1]; SlotIndex LastUseIdx = LR->end.getBaseIndex(); MachineInstr *LastUse = getInstructionFromIndex(LastUseIdx); int UseIdx = LastUse->findRegisterUseOperandIdx(LI->reg, false); assert(UseIdx != -1); if (!LastUse->isRegTiedToDefOperand(UseIdx)) { LastUse->getOperand(UseIdx).setIsKill(); vrm.addKillPoint(LI->reg, LastUseIdx); } } RetNewLIs.push_back(LI); } } handleSpilledImpDefs(li, vrm, rc, RetNewLIs); normalizeSpillWeights(RetNewLIs); return RetNewLIs; } /// hasAllocatableSuperReg - Return true if the specified physical register has /// any super register that's allocatable. bool LiveIntervals::hasAllocatableSuperReg(unsigned Reg) const { for (const unsigned* AS = tri_->getSuperRegisters(Reg); *AS; ++AS) if (allocatableRegs_[*AS] && hasInterval(*AS)) return true; return false; } /// getRepresentativeReg - Find the largest super register of the specified /// physical register. unsigned LiveIntervals::getRepresentativeReg(unsigned Reg) const { // Find the largest super-register that is allocatable. unsigned BestReg = Reg; for (const unsigned* AS = tri_->getSuperRegisters(Reg); *AS; ++AS) { unsigned SuperReg = *AS; if (!hasAllocatableSuperReg(SuperReg) && hasInterval(SuperReg)) { BestReg = SuperReg; break; } } return BestReg; } /// getNumConflictsWithPhysReg - Return the number of uses and defs of the /// specified interval that conflicts with the specified physical register. unsigned LiveIntervals::getNumConflictsWithPhysReg(const LiveInterval &li, unsigned PhysReg) const { unsigned NumConflicts = 0; const LiveInterval &pli = getInterval(getRepresentativeReg(PhysReg)); for (MachineRegisterInfo::reg_iterator I = mri_->reg_begin(li.reg), E = mri_->reg_end(); I != E; ++I) { MachineOperand &O = I.getOperand(); MachineInstr *MI = O.getParent(); if (MI->isDebugValue()) continue; SlotIndex Index = getInstructionIndex(MI); if (pli.liveAt(Index)) ++NumConflicts; } return NumConflicts; } /// spillPhysRegAroundRegDefsUses - Spill the specified physical register /// around all defs and uses of the specified interval. Return true if it /// was able to cut its interval. bool LiveIntervals::spillPhysRegAroundRegDefsUses(const LiveInterval &li, unsigned PhysReg, VirtRegMap &vrm) { unsigned SpillReg = getRepresentativeReg(PhysReg); for (const unsigned *AS = tri_->getAliasSet(PhysReg); *AS; ++AS) // If there are registers which alias PhysReg, but which are not a // sub-register of the chosen representative super register. Assert // since we can't handle it yet. assert(*AS == SpillReg || !allocatableRegs_[*AS] || !hasInterval(*AS) || tri_->isSuperRegister(*AS, SpillReg)); bool Cut = false; SmallVector PRegs; if (hasInterval(SpillReg)) PRegs.push_back(SpillReg); else { SmallSet Added; for (const unsigned* AS = tri_->getSubRegisters(SpillReg); *AS; ++AS) if (Added.insert(*AS) && hasInterval(*AS)) { PRegs.push_back(*AS); for (const unsigned* ASS = tri_->getSubRegisters(*AS); *ASS; ++ASS) Added.insert(*ASS); } } SmallPtrSet SeenMIs; for (MachineRegisterInfo::reg_iterator I = mri_->reg_begin(li.reg), E = mri_->reg_end(); I != E; ++I) { MachineOperand &O = I.getOperand(); MachineInstr *MI = O.getParent(); if (MI->isDebugValue() || SeenMIs.count(MI)) continue; SeenMIs.insert(MI); SlotIndex Index = getInstructionIndex(MI); for (unsigned i = 0, e = PRegs.size(); i != e; ++i) { unsigned PReg = PRegs[i]; LiveInterval &pli = getInterval(PReg); if (!pli.liveAt(Index)) continue; vrm.addEmergencySpill(PReg, MI); SlotIndex StartIdx = Index.getLoadIndex(); SlotIndex EndIdx = Index.getNextIndex().getBaseIndex(); if (pli.isInOneLiveRange(StartIdx, EndIdx)) { pli.removeRange(StartIdx, EndIdx); Cut = true; } else { std::string msg; raw_string_ostream Msg(msg); Msg << "Ran out of registers during register allocation!"; if (MI->isInlineAsm()) { Msg << "\nPlease check your inline asm statement for invalid " << "constraints:\n"; MI->print(Msg, tm_); } report_fatal_error(Msg.str()); } for (const unsigned* AS = tri_->getSubRegisters(PReg); *AS; ++AS) { if (!hasInterval(*AS)) continue; LiveInterval &spli = getInterval(*AS); if (spli.liveAt(Index)) spli.removeRange(Index.getLoadIndex(), Index.getNextIndex().getBaseIndex()); } } } return Cut; } LiveRange LiveIntervals::addLiveRangeToEndOfBlock(unsigned reg, MachineInstr* startInst) { LiveInterval& Interval = getOrCreateInterval(reg); VNInfo* VN = Interval.getNextValue( SlotIndex(getInstructionIndex(startInst).getDefIndex()), startInst, true, getVNInfoAllocator()); VN->setHasPHIKill(true); VN->kills.push_back(indexes_->getTerminatorGap(startInst->getParent())); LiveRange LR( SlotIndex(getInstructionIndex(startInst).getDefIndex()), getMBBEndIdx(startInst->getParent()), VN); Interval.addRange(LR); return LR; } diff --git a/lib/CodeGen/PHIElimination.cpp b/lib/CodeGen/PHIElimination.cpp index f0057ce8ef8f..165171998dbe 100644 --- a/lib/CodeGen/PHIElimination.cpp +++ b/lib/CodeGen/PHIElimination.cpp @@ -1,445 +1,506 @@ //===-- PhiElimination.cpp - Eliminate PHI nodes by inserting copies ------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This pass eliminates machine instruction PHI nodes by inserting copy // instructions. This destroys SSA information, but is the desired input for // some register allocators. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "phielim" #include "PHIElimination.h" #include "llvm/CodeGen/LiveVariables.h" #include "llvm/CodeGen/Passes.h" #include "llvm/CodeGen/MachineDominators.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/Target/TargetInstrInfo.h" #include "llvm/Function.h" #include "llvm/Target/TargetMachine.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/Statistic.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/Debug.h" +#include "llvm/Support/ErrorHandling.h" #include #include using namespace llvm; STATISTIC(NumAtomic, "Number of atomic phis lowered"); STATISTIC(NumSplits, "Number of critical edges split on demand"); STATISTIC(NumReused, "Number of reused lowered phis"); char PHIElimination::ID = 0; static RegisterPass X("phi-node-elimination", "Eliminate PHI nodes for register allocation"); const PassInfo *const llvm::PHIEliminationID = &X; void llvm::PHIElimination::getAnalysisUsage(AnalysisUsage &AU) const { AU.addPreserved(); AU.addPreserved(); // rdar://7401784 This would be nice: // AU.addPreservedID(MachineLoopInfoID); MachineFunctionPass::getAnalysisUsage(AU); } -bool llvm::PHIElimination::runOnMachineFunction(MachineFunction &Fn) { - MRI = &Fn.getRegInfo(); +bool llvm::PHIElimination::runOnMachineFunction(MachineFunction &MF) { + MRI = &MF.getRegInfo(); bool Changed = false; // Split critical edges to help the coalescer if (LiveVariables *LV = getAnalysisIfAvailable()) - for (MachineFunction::iterator I = Fn.begin(), E = Fn.end(); I != E; ++I) - Changed |= SplitPHIEdges(Fn, *I, *LV); + for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) + Changed |= SplitPHIEdges(MF, *I, *LV); // Populate VRegPHIUseCount - analyzePHINodes(Fn); + analyzePHINodes(MF); // Eliminate PHI instructions by inserting copies into predecessor blocks. - for (MachineFunction::iterator I = Fn.begin(), E = Fn.end(); I != E; ++I) - Changed |= EliminatePHINodes(Fn, *I); + for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) + Changed |= EliminatePHINodes(MF, *I); // Remove dead IMPLICIT_DEF instructions. for (SmallPtrSet::iterator I = ImpDefs.begin(), E = ImpDefs.end(); I != E; ++I) { MachineInstr *DefMI = *I; unsigned DefReg = DefMI->getOperand(0).getReg(); if (MRI->use_nodbg_empty(DefReg)) DefMI->eraseFromParent(); } // Clean up the lowered PHI instructions. for (LoweredPHIMap::iterator I = LoweredPHIs.begin(), E = LoweredPHIs.end(); I != E; ++I) - Fn.DeleteMachineInstr(I->first); + MF.DeleteMachineInstr(I->first); LoweredPHIs.clear(); ImpDefs.clear(); VRegPHIUseCount.clear(); + + // Eliminate REG_SEQUENCE instructions. Their whole purpose was to preseve + // SSA form. + Changed |= EliminateRegSequences(MF); + return Changed; } /// EliminatePHINodes - Eliminate phi nodes by inserting copy instructions in /// predecessor basic blocks. /// bool llvm::PHIElimination::EliminatePHINodes(MachineFunction &MF, MachineBasicBlock &MBB) { if (MBB.empty() || !MBB.front().isPHI()) return false; // Quick exit for basic blocks without PHIs. // Get an iterator to the first instruction after the last PHI node (this may // also be the end of the basic block). MachineBasicBlock::iterator AfterPHIsIt = SkipPHIsAndLabels(MBB, MBB.begin()); while (MBB.front().isPHI()) LowerAtomicPHINode(MBB, AfterPHIsIt); return true; } /// isSourceDefinedByImplicitDef - Return true if all sources of the phi node /// are implicit_def's. static bool isSourceDefinedByImplicitDef(const MachineInstr *MPhi, const MachineRegisterInfo *MRI) { for (unsigned i = 1; i != MPhi->getNumOperands(); i += 2) { unsigned SrcReg = MPhi->getOperand(i).getReg(); const MachineInstr *DefMI = MRI->getVRegDef(SrcReg); if (!DefMI || !DefMI->isImplicitDef()) return false; } return true; } // FindCopyInsertPoint - Find a safe place in MBB to insert a copy from SrcReg // when following the CFG edge to SuccMBB. This needs to be after any def of // SrcReg, but before any subsequent point where control flow might jump out of // the basic block. MachineBasicBlock::iterator llvm::PHIElimination::FindCopyInsertPoint(MachineBasicBlock &MBB, MachineBasicBlock &SuccMBB, unsigned SrcReg) { // Handle the trivial case trivially. if (MBB.empty()) return MBB.begin(); // Usually, we just want to insert the copy before the first terminator // instruction. However, for the edge going to a landing pad, we must insert // the copy before the call/invoke instruction. if (!SuccMBB.isLandingPad()) return MBB.getFirstTerminator(); // Discover any defs/uses in this basic block. SmallPtrSet DefUsesInMBB; for (MachineRegisterInfo::reg_iterator RI = MRI->reg_begin(SrcReg), RE = MRI->reg_end(); RI != RE; ++RI) { MachineInstr *DefUseMI = &*RI; if (DefUseMI->getParent() == &MBB) DefUsesInMBB.insert(DefUseMI); } MachineBasicBlock::iterator InsertPoint; if (DefUsesInMBB.empty()) { // No defs. Insert the copy at the start of the basic block. InsertPoint = MBB.begin(); } else if (DefUsesInMBB.size() == 1) { // Insert the copy immediately after the def/use. InsertPoint = *DefUsesInMBB.begin(); ++InsertPoint; } else { // Insert the copy immediately after the last def/use. InsertPoint = MBB.end(); while (!DefUsesInMBB.count(&*--InsertPoint)) {} ++InsertPoint; } // Make sure the copy goes after any phi nodes however. return SkipPHIsAndLabels(MBB, InsertPoint); } /// LowerAtomicPHINode - Lower the PHI node at the top of the specified block, /// under the assuption that it needs to be lowered in a way that supports /// atomic execution of PHIs. This lowering method is always correct all of the /// time. /// void llvm::PHIElimination::LowerAtomicPHINode( MachineBasicBlock &MBB, MachineBasicBlock::iterator AfterPHIsIt) { ++NumAtomic; // Unlink the PHI node from the basic block, but don't delete the PHI yet. MachineInstr *MPhi = MBB.remove(MBB.begin()); unsigned NumSrcs = (MPhi->getNumOperands() - 1) / 2; unsigned DestReg = MPhi->getOperand(0).getReg(); bool isDead = MPhi->getOperand(0).isDead(); // Create a new register for the incoming PHI arguments. MachineFunction &MF = *MBB.getParent(); const TargetRegisterClass *RC = MF.getRegInfo().getRegClass(DestReg); unsigned IncomingReg = 0; bool reusedIncoming = false; // Is IncomingReg reused from an earlier PHI? // Insert a register to register copy at the top of the current block (but // after any remaining phi nodes) which copies the new incoming register // into the phi node destination. const TargetInstrInfo *TII = MF.getTarget().getInstrInfo(); if (isSourceDefinedByImplicitDef(MPhi, MRI)) // If all sources of a PHI node are implicit_def, just emit an // implicit_def instead of a copy. BuildMI(MBB, AfterPHIsIt, MPhi->getDebugLoc(), TII->get(TargetOpcode::IMPLICIT_DEF), DestReg); else { // Can we reuse an earlier PHI node? This only happens for critical edges, // typically those created by tail duplication. unsigned &entry = LoweredPHIs[MPhi]; if (entry) { // An identical PHI node was already lowered. Reuse the incoming register. IncomingReg = entry; reusedIncoming = true; ++NumReused; DEBUG(dbgs() << "Reusing %reg" << IncomingReg << " for " << *MPhi); } else { entry = IncomingReg = MF.getRegInfo().createVirtualRegister(RC); } TII->copyRegToReg(MBB, AfterPHIsIt, DestReg, IncomingReg, RC, RC); } // Update live variable information if there is any. LiveVariables *LV = getAnalysisIfAvailable(); if (LV) { MachineInstr *PHICopy = prior(AfterPHIsIt); if (IncomingReg) { LiveVariables::VarInfo &VI = LV->getVarInfo(IncomingReg); // Increment use count of the newly created virtual register. VI.NumUses++; LV->setPHIJoin(IncomingReg); // When we are reusing the incoming register, it may already have been // killed in this block. The old kill will also have been inserted at // AfterPHIsIt, so it appears before the current PHICopy. if (reusedIncoming) if (MachineInstr *OldKill = VI.findKill(&MBB)) { DEBUG(dbgs() << "Remove old kill from " << *OldKill); LV->removeVirtualRegisterKilled(IncomingReg, OldKill); DEBUG(MBB.dump()); } // Add information to LiveVariables to know that the incoming value is // killed. Note that because the value is defined in several places (once // each for each incoming block), the "def" block and instruction fields // for the VarInfo is not filled in. LV->addVirtualRegisterKilled(IncomingReg, PHICopy); } // Since we are going to be deleting the PHI node, if it is the last use of // any registers, or if the value itself is dead, we need to move this // information over to the new copy we just inserted. LV->removeVirtualRegistersKilled(MPhi); // If the result is dead, update LV. if (isDead) { LV->addVirtualRegisterDead(DestReg, PHICopy); LV->removeVirtualRegisterDead(DestReg, MPhi); } } // Adjust the VRegPHIUseCount map to account for the removal of this PHI node. for (unsigned i = 1; i != MPhi->getNumOperands(); i += 2) --VRegPHIUseCount[BBVRegPair(MPhi->getOperand(i+1).getMBB()->getNumber(), MPhi->getOperand(i).getReg())]; // Now loop over all of the incoming arguments, changing them to copy into the // IncomingReg register in the corresponding predecessor basic block. SmallPtrSet MBBsInsertedInto; for (int i = NumSrcs - 1; i >= 0; --i) { unsigned SrcReg = MPhi->getOperand(i*2+1).getReg(); assert(TargetRegisterInfo::isVirtualRegister(SrcReg) && "Machine PHI Operands must all be virtual registers!"); // Get the MachineBasicBlock equivalent of the BasicBlock that is the source // path the PHI. MachineBasicBlock &opBlock = *MPhi->getOperand(i*2+2).getMBB(); // If source is defined by an implicit def, there is no need to insert a // copy. MachineInstr *DefMI = MRI->getVRegDef(SrcReg); if (DefMI->isImplicitDef()) { ImpDefs.insert(DefMI); continue; } // Check to make sure we haven't already emitted the copy for this block. // This can happen because PHI nodes may have multiple entries for the same // basic block. if (!MBBsInsertedInto.insert(&opBlock)) continue; // If the copy has already been emitted, we're done. // Find a safe location to insert the copy, this may be the first terminator // in the block (or end()). MachineBasicBlock::iterator InsertPos = FindCopyInsertPoint(opBlock, MBB, SrcReg); // Insert the copy. if (!reusedIncoming && IncomingReg) TII->copyRegToReg(opBlock, InsertPos, IncomingReg, SrcReg, RC, RC); // Now update live variable information if we have it. Otherwise we're done if (!LV) continue; // We want to be able to insert a kill of the register if this PHI (aka, the // copy we just inserted) is the last use of the source value. Live // variable analysis conservatively handles this by saying that the value is // live until the end of the block the PHI entry lives in. If the value // really is dead at the PHI copy, there will be no successor blocks which // have the value live-in. // Also check to see if this register is in use by another PHI node which // has not yet been eliminated. If so, it will be killed at an appropriate // point later. // Is it used by any PHI instructions in this block? bool ValueIsUsed = VRegPHIUseCount[BBVRegPair(opBlock.getNumber(), SrcReg)]; // Okay, if we now know that the value is not live out of the block, we can // add a kill marker in this block saying that it kills the incoming value! if (!ValueIsUsed && !LV->isLiveOut(SrcReg, opBlock)) { // In our final twist, we have to decide which instruction kills the // register. In most cases this is the copy, however, the first // terminator instruction at the end of the block may also use the value. // In this case, we should mark *it* as being the killing block, not the // copy. MachineBasicBlock::iterator KillInst; MachineBasicBlock::iterator Term = opBlock.getFirstTerminator(); if (Term != opBlock.end() && Term->readsRegister(SrcReg)) { KillInst = Term; // Check that no other terminators use values. #ifndef NDEBUG for (MachineBasicBlock::iterator TI = llvm::next(Term); TI != opBlock.end(); ++TI) { assert(!TI->readsRegister(SrcReg) && "Terminator instructions cannot use virtual registers unless" "they are the first terminator in a block!"); } #endif } else if (reusedIncoming || !IncomingReg) { // We may have to rewind a bit if we didn't insert a copy this time. KillInst = Term; while (KillInst != opBlock.begin()) if ((--KillInst)->readsRegister(SrcReg)) break; } else { // We just inserted this copy. KillInst = prior(InsertPos); } assert(KillInst->readsRegister(SrcReg) && "Cannot find kill instruction"); // Finally, mark it killed. LV->addVirtualRegisterKilled(SrcReg, KillInst); // This vreg no longer lives all of the way through opBlock. unsigned opBlockNum = opBlock.getNumber(); LV->getVarInfo(SrcReg).AliveBlocks.reset(opBlockNum); } } // Really delete the PHI instruction now, if it is not in the LoweredPHIs map. if (reusedIncoming || !IncomingReg) MF.DeleteMachineInstr(MPhi); } /// analyzePHINodes - Gather information about the PHI nodes in here. In /// particular, we want to map the number of uses of a virtual register which is /// used in a PHI node. We map that to the BB the vreg is coming from. This is /// used later to determine when the vreg is killed in the BB. /// -void llvm::PHIElimination::analyzePHINodes(const MachineFunction& Fn) { - for (MachineFunction::const_iterator I = Fn.begin(), E = Fn.end(); +void llvm::PHIElimination::analyzePHINodes(const MachineFunction& MF) { + for (MachineFunction::const_iterator I = MF.begin(), E = MF.end(); I != E; ++I) for (MachineBasicBlock::const_iterator BBI = I->begin(), BBE = I->end(); BBI != BBE && BBI->isPHI(); ++BBI) for (unsigned i = 1, e = BBI->getNumOperands(); i != e; i += 2) ++VRegPHIUseCount[BBVRegPair(BBI->getOperand(i+1).getMBB()->getNumber(), BBI->getOperand(i).getReg())]; } bool llvm::PHIElimination::SplitPHIEdges(MachineFunction &MF, MachineBasicBlock &MBB, LiveVariables &LV) { if (MBB.empty() || !MBB.front().isPHI() || MBB.isLandingPad()) return false; // Quick exit for basic blocks without PHIs. for (MachineBasicBlock::const_iterator BBI = MBB.begin(), BBE = MBB.end(); BBI != BBE && BBI->isPHI(); ++BBI) { for (unsigned i = 1, e = BBI->getNumOperands(); i != e; i += 2) { unsigned Reg = BBI->getOperand(i).getReg(); MachineBasicBlock *PreMBB = BBI->getOperand(i+1).getMBB(); // We break edges when registers are live out from the predecessor block // (not considering PHI nodes). If the register is live in to this block // anyway, we would gain nothing from splitting. if (!LV.isLiveIn(Reg, MBB) && LV.isLiveOut(Reg, *PreMBB)) SplitCriticalEdge(PreMBB, &MBB); } } return true; } MachineBasicBlock *PHIElimination::SplitCriticalEdge(MachineBasicBlock *A, MachineBasicBlock *B) { assert(A && B && "Missing MBB end point"); MachineFunction *MF = A->getParent(); // We may need to update A's terminator, but we can't do that if AnalyzeBranch // fails. If A uses a jump table, we won't touch it. const TargetInstrInfo *TII = MF->getTarget().getInstrInfo(); MachineBasicBlock *TBB = 0, *FBB = 0; SmallVector Cond; if (TII->AnalyzeBranch(*A, TBB, FBB, Cond)) return NULL; ++NumSplits; MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock(); MF->insert(llvm::next(MachineFunction::iterator(A)), NMBB); DEBUG(dbgs() << "PHIElimination splitting critical edge:" " BB#" << A->getNumber() << " -- BB#" << NMBB->getNumber() << " -- BB#" << B->getNumber() << '\n'); A->ReplaceUsesOfBlockWith(B, NMBB); A->updateTerminator(); // Insert unconditional "jump B" instruction in NMBB if necessary. NMBB->addSuccessor(B); if (!NMBB->isLayoutSuccessor(B)) { Cond.clear(); MF->getTarget().getInstrInfo()->InsertBranch(*NMBB, B, NULL, Cond); } // Fix PHI nodes in B so they refer to NMBB instead of A for (MachineBasicBlock::iterator i = B->begin(), e = B->end(); i != e && i->isPHI(); ++i) for (unsigned ni = 1, ne = i->getNumOperands(); ni != ne; ni += 2) if (i->getOperand(ni+1).getMBB() == A) i->getOperand(ni+1).setMBB(NMBB); if (LiveVariables *LV=getAnalysisIfAvailable()) LV->addNewBlock(NMBB, A, B); if (MachineDominatorTree *MDT=getAnalysisIfAvailable()) MDT->addNewBlock(NMBB, A); return NMBB; } + +static void UpdateRegSequenceSrcs(unsigned SrcReg, + unsigned DstReg, unsigned SrcIdx, + MachineRegisterInfo *MRI) { + for (MachineRegisterInfo::reg_iterator RI = MRI->reg_begin(SrcReg), + UE = MRI->reg_end(); RI != UE; ) { + MachineOperand &MO = RI.getOperand(); + ++RI; + MO.setReg(DstReg); + MO.setSubReg(SrcIdx); + } +} + +/// EliminateRegSequences - Eliminate REG_SEQUENCE instructions as second part +/// of de-ssa process. This replaces sources of REG_SEQUENCE as sub-register +/// references of the register defined by REG_SEQUENCE. e.g. +/// +/// %reg1029, %reg1030 = VLD1q16 %reg1024, ... +/// %reg1031 = REG_SEQUENCE %reg1029, 5, %reg1030, 6 +/// => +/// %reg1031:5, %reg1031:6 = VLD1q16 %reg1024, ... +bool PHIElimination::EliminateRegSequences(MachineFunction &MF) { + bool Changed = false; + + for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) + for (MachineBasicBlock::iterator BBI = I->begin(), BBE = I->end(); + BBI != BBE; ) { + MachineInstr &MI = *BBI; + ++BBI; + if (MI.getOpcode() != TargetOpcode::REG_SEQUENCE) + continue; + unsigned DstReg = MI.getOperand(0).getReg(); + if (MI.getOperand(0).getSubReg() || + TargetRegisterInfo::isPhysicalRegister(DstReg) || + !(MI.getNumOperands() & 1)) { + DEBUG(dbgs() << "Illegal REG_SEQUENCE instruction:" << MI); + llvm_unreachable(0); + } + for (unsigned i = 1, e = MI.getNumOperands(); i < e; i += 2) { + unsigned SrcReg = MI.getOperand(i).getReg(); + if (MI.getOperand(i).getSubReg() || + TargetRegisterInfo::isPhysicalRegister(SrcReg)) { + DEBUG(dbgs() << "Illegal REG_SEQUENCE instruction:" << MI); + llvm_unreachable(0); + } + unsigned SrcIdx = MI.getOperand(i+1).getImm(); + UpdateRegSequenceSrcs(SrcReg, DstReg, SrcIdx, MRI); + } + + MI.eraseFromParent(); + Changed = true; + } + + return Changed; +} diff --git a/lib/CodeGen/PHIElimination.h b/lib/CodeGen/PHIElimination.h index 7dedf0318a8a..3292aa27af64 100644 --- a/lib/CodeGen/PHIElimination.h +++ b/lib/CodeGen/PHIElimination.h @@ -1,113 +1,115 @@ //===-- lib/CodeGen/PHIElimination.h ----------------------------*- C++ -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// #ifndef LLVM_CODEGEN_PHIELIMINATION_HPP #define LLVM_CODEGEN_PHIELIMINATION_HPP #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/SmallSet.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineRegisterInfo.h" namespace llvm { class LiveVariables; /// Lower PHI instructions to copies. class PHIElimination : public MachineFunctionPass { MachineRegisterInfo *MRI; // Machine register information public: static char ID; // Pass identification, replacement for typeid PHIElimination() : MachineFunctionPass(&ID) {} virtual bool runOnMachineFunction(MachineFunction &Fn); virtual void getAnalysisUsage(AnalysisUsage &AU) const; private: /// EliminatePHINodes - Eliminate phi nodes by inserting copy instructions /// in predecessor basic blocks. /// bool EliminatePHINodes(MachineFunction &MF, MachineBasicBlock &MBB); void LowerAtomicPHINode(MachineBasicBlock &MBB, MachineBasicBlock::iterator AfterPHIsIt); /// analyzePHINodes - Gather information about the PHI nodes in /// here. In particular, we want to map the number of uses of a virtual /// register which is used in a PHI node. We map that to the BB the /// vreg is coming from. This is used later to determine when the vreg /// is killed in the BB. /// void analyzePHINodes(const MachineFunction& Fn); /// Split critical edges where necessary for good coalescer performance. bool SplitPHIEdges(MachineFunction &MF, MachineBasicBlock &MBB, LiveVariables &LV); /// SplitCriticalEdge - Split a critical edge from A to B by /// inserting a new MBB. Update branches in A and PHI instructions /// in B. Return the new block. MachineBasicBlock *SplitCriticalEdge(MachineBasicBlock *A, MachineBasicBlock *B); /// FindCopyInsertPoint - Find a safe place in MBB to insert a copy from /// SrcReg when following the CFG edge to SuccMBB. This needs to be after /// any def of SrcReg, but before any subsequent point where control flow /// might jump out of the basic block. MachineBasicBlock::iterator FindCopyInsertPoint(MachineBasicBlock &MBB, MachineBasicBlock &SuccMBB, unsigned SrcReg); // SkipPHIsAndLabels - Copies need to be inserted after phi nodes and // also after any exception handling labels: in landing pads execution // starts at the label, so any copies placed before it won't be executed! // We also deal with DBG_VALUEs, which are a bit tricky: // PHI // DBG_VALUE // LABEL // Here the DBG_VALUE needs to be skipped, and if it refers to a PHI it // needs to be annulled or, better, moved to follow the label, as well. // PHI // DBG_VALUE // no label // Here it is not a good idea to skip the DBG_VALUE. // FIXME: For now we skip and annul all DBG_VALUEs, maximally simple and // maximally stupid. MachineBasicBlock::iterator SkipPHIsAndLabels(MachineBasicBlock &MBB, MachineBasicBlock::iterator I) { // Rather than assuming that EH labels come before other kinds of labels, // just skip all labels. while (I != MBB.end() && (I->isPHI() || I->isLabel() || I->isDebugValue())) { if (I->isDebugValue() && I->getNumOperands()==3 && I->getOperand(0).isReg()) I->getOperand(0).setReg(0U); ++I; } return I; } + bool EliminateRegSequences(MachineFunction &MF); + typedef std::pair BBVRegPair; typedef DenseMap VRegPHIUse; VRegPHIUse VRegPHIUseCount; // Defs of PHI sources which are implicit_def. SmallPtrSet ImpDefs; // Map reusable lowered PHI node -> incoming join register. typedef DenseMap LoweredPHIMap; LoweredPHIMap LoweredPHIs; }; } #endif /* LLVM_CODEGEN_PHIELIMINATION_HPP */ diff --git a/lib/Target/ARM/ARMISelDAGToDAG.cpp b/lib/Target/ARM/ARMISelDAGToDAG.cpp index 36a1827ce6e8..616942c00343 100644 --- a/lib/Target/ARM/ARMISelDAGToDAG.cpp +++ b/lib/Target/ARM/ARMISelDAGToDAG.cpp @@ -1,1988 +1,1997 @@ //===-- ARMISelDAGToDAG.cpp - A dag to dag inst selector for ARM ----------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file defines an instruction selector for the ARM target. // //===----------------------------------------------------------------------===// #include "ARM.h" #include "ARMAddressingModes.h" #include "ARMTargetMachine.h" #include "llvm/CallingConv.h" #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/Intrinsics.h" #include "llvm/LLVMContext.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/SelectionDAG.h" #include "llvm/CodeGen/SelectionDAGISel.h" #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetOptions.h" +#include "llvm/Support/CommandLine.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" using namespace llvm; +static cl::opt +UseRegSeq("neon-reg-sequence", cl::Hidden, + cl::desc("Use reg_sequence to model ld / st of multiple neon regs")); + //===--------------------------------------------------------------------===// /// ARMDAGToDAGISel - ARM specific code to select ARM machine /// instructions for SelectionDAG operations. /// namespace { class ARMDAGToDAGISel : public SelectionDAGISel { ARMBaseTargetMachine &TM; /// Subtarget - Keep a pointer to the ARMSubtarget around so that we can /// make the right decision when generating code for different targets. const ARMSubtarget *Subtarget; public: explicit ARMDAGToDAGISel(ARMBaseTargetMachine &tm, CodeGenOpt::Level OptLevel) : SelectionDAGISel(tm, OptLevel), TM(tm), Subtarget(&TM.getSubtarget()) { } virtual const char *getPassName() const { return "ARM Instruction Selection"; } /// getI32Imm - Return a target constant of type i32 with the specified /// value. inline SDValue getI32Imm(unsigned Imm) { return CurDAG->getTargetConstant(Imm, MVT::i32); } SDNode *Select(SDNode *N); bool SelectShifterOperandReg(SDNode *Op, SDValue N, SDValue &A, SDValue &B, SDValue &C); bool SelectAddrMode2(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset, SDValue &Opc); bool SelectAddrMode2Offset(SDNode *Op, SDValue N, SDValue &Offset, SDValue &Opc); bool SelectAddrMode3(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset, SDValue &Opc); bool SelectAddrMode3Offset(SDNode *Op, SDValue N, SDValue &Offset, SDValue &Opc); bool SelectAddrMode4(SDNode *Op, SDValue N, SDValue &Addr, SDValue &Mode); bool SelectAddrMode5(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset); bool SelectAddrMode6(SDNode *Op, SDValue N, SDValue &Addr, SDValue &Align); bool SelectAddrModePC(SDNode *Op, SDValue N, SDValue &Offset, SDValue &Label); bool SelectThumbAddrModeRR(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset); bool SelectThumbAddrModeRI5(SDNode *Op, SDValue N, unsigned Scale, SDValue &Base, SDValue &OffImm, SDValue &Offset); bool SelectThumbAddrModeS1(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm, SDValue &Offset); bool SelectThumbAddrModeS2(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm, SDValue &Offset); bool SelectThumbAddrModeS4(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm, SDValue &Offset); bool SelectThumbAddrModeSP(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm); bool SelectT2ShifterOperandReg(SDNode *Op, SDValue N, SDValue &BaseReg, SDValue &Opc); bool SelectT2AddrModeImm12(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm); bool SelectT2AddrModeImm8(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm); bool SelectT2AddrModeImm8Offset(SDNode *Op, SDValue N, SDValue &OffImm); bool SelectT2AddrModeImm8s4(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm); bool SelectT2AddrModeSoReg(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffReg, SDValue &ShImm); // Include the pieces autogenerated from the target description. #include "ARMGenDAGISel.inc" private: /// SelectARMIndexedLoad - Indexed (pre/post inc/dec) load matching code for /// ARM. SDNode *SelectARMIndexedLoad(SDNode *N); SDNode *SelectT2IndexedLoad(SDNode *N); /// SelectVLD - Select NEON load intrinsics. NumVecs should be /// 1, 2, 3 or 4. The opcode arrays specify the instructions used for /// loads of D registers and even subregs and odd subregs of Q registers. /// For NumVecs <= 2, QOpcodes1 is not used. SDNode *SelectVLD(SDNode *N, unsigned NumVecs, unsigned *DOpcodes, unsigned *QOpcodes0, unsigned *QOpcodes1); /// SelectVST - Select NEON store intrinsics. NumVecs should /// be 1, 2, 3 or 4. The opcode arrays specify the instructions used for /// stores of D registers and even subregs and odd subregs of Q registers. /// For NumVecs <= 2, QOpcodes1 is not used. SDNode *SelectVST(SDNode *N, unsigned NumVecs, unsigned *DOpcodes, unsigned *QOpcodes0, unsigned *QOpcodes1); /// SelectVLDSTLane - Select NEON load/store lane intrinsics. NumVecs should /// be 2, 3 or 4. The opcode arrays specify the instructions used for /// load/store of D registers and even subregs and odd subregs of Q registers. SDNode *SelectVLDSTLane(SDNode *N, bool IsLoad, unsigned NumVecs, unsigned *DOpcodes, unsigned *QOpcodes0, unsigned *QOpcodes1); /// SelectV6T2BitfieldExtractOp - Select SBFX/UBFX instructions for ARM. SDNode *SelectV6T2BitfieldExtractOp(SDNode *N, bool isSigned); /// SelectCMOVOp - Select CMOV instructions for ARM. SDNode *SelectCMOVOp(SDNode *N); SDNode *SelectT2CMOVShiftOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag); SDNode *SelectARMCMOVShiftOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag); SDNode *SelectT2CMOVSoImmOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag); SDNode *SelectARMCMOVSoImmOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag); /// SelectInlineAsmMemoryOperand - Implement addressing mode selection for /// inline asm expressions. virtual bool SelectInlineAsmMemoryOperand(const SDValue &Op, char ConstraintCode, std::vector &OutOps); /// PairDRegs - Insert a pair of double registers into an implicit def to /// form a quad register. SDNode *PairDRegs(EVT VT, SDValue V0, SDValue V1); }; } /// isInt32Immediate - This method tests to see if the node is a 32-bit constant /// operand. If so Imm will receive the 32-bit value. static bool isInt32Immediate(SDNode *N, unsigned &Imm) { if (N->getOpcode() == ISD::Constant && N->getValueType(0) == MVT::i32) { Imm = cast(N)->getZExtValue(); return true; } return false; } // isInt32Immediate - This method tests to see if a constant operand. // If so Imm will receive the 32 bit value. static bool isInt32Immediate(SDValue N, unsigned &Imm) { return isInt32Immediate(N.getNode(), Imm); } // isOpcWithIntImmediate - This method tests to see if the node is a specific // opcode and that it has a immediate integer right operand. // If so Imm will receive the 32 bit value. static bool isOpcWithIntImmediate(SDNode *N, unsigned Opc, unsigned& Imm) { return N->getOpcode() == Opc && isInt32Immediate(N->getOperand(1).getNode(), Imm); } bool ARMDAGToDAGISel::SelectShifterOperandReg(SDNode *Op, SDValue N, SDValue &BaseReg, SDValue &ShReg, SDValue &Opc) { ARM_AM::ShiftOpc ShOpcVal = ARM_AM::getShiftOpcForNode(N); // Don't match base register only case. That is matched to a separate // lower complexity pattern with explicit register operand. if (ShOpcVal == ARM_AM::no_shift) return false; BaseReg = N.getOperand(0); unsigned ShImmVal = 0; if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { ShReg = CurDAG->getRegister(0, MVT::i32); ShImmVal = RHS->getZExtValue() & 31; } else { ShReg = N.getOperand(1); } Opc = CurDAG->getTargetConstant(ARM_AM::getSORegOpc(ShOpcVal, ShImmVal), MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode2(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset, SDValue &Opc) { if (N.getOpcode() == ISD::MUL) { if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { // X * [3,5,9] -> X + X * [2,4,8] etc. int RHSC = (int)RHS->getZExtValue(); if (RHSC & 1) { RHSC = RHSC & ~1; ARM_AM::AddrOpc AddSub = ARM_AM::add; if (RHSC < 0) { AddSub = ARM_AM::sub; RHSC = - RHSC; } if (isPowerOf2_32(RHSC)) { unsigned ShAmt = Log2_32(RHSC); Base = Offset = N.getOperand(0); Opc = CurDAG->getTargetConstant(ARM_AM::getAM2Opc(AddSub, ShAmt, ARM_AM::lsl), MVT::i32); return true; } } } } if (N.getOpcode() != ISD::ADD && N.getOpcode() != ISD::SUB) { Base = N; if (N.getOpcode() == ISD::FrameIndex) { int FI = cast(N)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } else if (N.getOpcode() == ARMISD::Wrapper && !(Subtarget->useMovt() && N.getOperand(0).getOpcode() == ISD::TargetGlobalAddress)) { Base = N.getOperand(0); } Offset = CurDAG->getRegister(0, MVT::i32); Opc = CurDAG->getTargetConstant(ARM_AM::getAM2Opc(ARM_AM::add, 0, ARM_AM::no_shift), MVT::i32); return true; } // Match simple R +/- imm12 operands. if (N.getOpcode() == ISD::ADD) if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if ((RHSC >= 0 && RHSC < 0x1000) || (RHSC < 0 && RHSC > -0x1000)) { // 12 bits. Base = N.getOperand(0); if (Base.getOpcode() == ISD::FrameIndex) { int FI = cast(Base)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } Offset = CurDAG->getRegister(0, MVT::i32); ARM_AM::AddrOpc AddSub = ARM_AM::add; if (RHSC < 0) { AddSub = ARM_AM::sub; RHSC = - RHSC; } Opc = CurDAG->getTargetConstant(ARM_AM::getAM2Opc(AddSub, RHSC, ARM_AM::no_shift), MVT::i32); return true; } } // Otherwise this is R +/- [possibly shifted] R. ARM_AM::AddrOpc AddSub = N.getOpcode() == ISD::ADD ? ARM_AM::add:ARM_AM::sub; ARM_AM::ShiftOpc ShOpcVal = ARM_AM::getShiftOpcForNode(N.getOperand(1)); unsigned ShAmt = 0; Base = N.getOperand(0); Offset = N.getOperand(1); if (ShOpcVal != ARM_AM::no_shift) { // Check to see if the RHS of the shift is a constant, if not, we can't fold // it. if (ConstantSDNode *Sh = dyn_cast(N.getOperand(1).getOperand(1))) { ShAmt = Sh->getZExtValue(); Offset = N.getOperand(1).getOperand(0); } else { ShOpcVal = ARM_AM::no_shift; } } // Try matching (R shl C) + (R). if (N.getOpcode() == ISD::ADD && ShOpcVal == ARM_AM::no_shift) { ShOpcVal = ARM_AM::getShiftOpcForNode(N.getOperand(0)); if (ShOpcVal != ARM_AM::no_shift) { // Check to see if the RHS of the shift is a constant, if not, we can't // fold it. if (ConstantSDNode *Sh = dyn_cast(N.getOperand(0).getOperand(1))) { ShAmt = Sh->getZExtValue(); Offset = N.getOperand(0).getOperand(0); Base = N.getOperand(1); } else { ShOpcVal = ARM_AM::no_shift; } } } Opc = CurDAG->getTargetConstant(ARM_AM::getAM2Opc(AddSub, ShAmt, ShOpcVal), MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode2Offset(SDNode *Op, SDValue N, SDValue &Offset, SDValue &Opc) { unsigned Opcode = Op->getOpcode(); ISD::MemIndexedMode AM = (Opcode == ISD::LOAD) ? cast(Op)->getAddressingMode() : cast(Op)->getAddressingMode(); ARM_AM::AddrOpc AddSub = (AM == ISD::PRE_INC || AM == ISD::POST_INC) ? ARM_AM::add : ARM_AM::sub; if (ConstantSDNode *C = dyn_cast(N)) { int Val = (int)C->getZExtValue(); if (Val >= 0 && Val < 0x1000) { // 12 bits. Offset = CurDAG->getRegister(0, MVT::i32); Opc = CurDAG->getTargetConstant(ARM_AM::getAM2Opc(AddSub, Val, ARM_AM::no_shift), MVT::i32); return true; } } Offset = N; ARM_AM::ShiftOpc ShOpcVal = ARM_AM::getShiftOpcForNode(N); unsigned ShAmt = 0; if (ShOpcVal != ARM_AM::no_shift) { // Check to see if the RHS of the shift is a constant, if not, we can't fold // it. if (ConstantSDNode *Sh = dyn_cast(N.getOperand(1))) { ShAmt = Sh->getZExtValue(); Offset = N.getOperand(0); } else { ShOpcVal = ARM_AM::no_shift; } } Opc = CurDAG->getTargetConstant(ARM_AM::getAM2Opc(AddSub, ShAmt, ShOpcVal), MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode3(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset, SDValue &Opc) { if (N.getOpcode() == ISD::SUB) { // X - C is canonicalize to X + -C, no need to handle it here. Base = N.getOperand(0); Offset = N.getOperand(1); Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(ARM_AM::sub, 0),MVT::i32); return true; } if (N.getOpcode() != ISD::ADD) { Base = N; if (N.getOpcode() == ISD::FrameIndex) { int FI = cast(N)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } Offset = CurDAG->getRegister(0, MVT::i32); Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(ARM_AM::add, 0),MVT::i32); return true; } // If the RHS is +/- imm8, fold into addr mode. if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if ((RHSC >= 0 && RHSC < 256) || (RHSC < 0 && RHSC > -256)) { // note -256 itself isn't allowed. Base = N.getOperand(0); if (Base.getOpcode() == ISD::FrameIndex) { int FI = cast(Base)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } Offset = CurDAG->getRegister(0, MVT::i32); ARM_AM::AddrOpc AddSub = ARM_AM::add; if (RHSC < 0) { AddSub = ARM_AM::sub; RHSC = - RHSC; } Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(AddSub, RHSC),MVT::i32); return true; } } Base = N.getOperand(0); Offset = N.getOperand(1); Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(ARM_AM::add, 0), MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode3Offset(SDNode *Op, SDValue N, SDValue &Offset, SDValue &Opc) { unsigned Opcode = Op->getOpcode(); ISD::MemIndexedMode AM = (Opcode == ISD::LOAD) ? cast(Op)->getAddressingMode() : cast(Op)->getAddressingMode(); ARM_AM::AddrOpc AddSub = (AM == ISD::PRE_INC || AM == ISD::POST_INC) ? ARM_AM::add : ARM_AM::sub; if (ConstantSDNode *C = dyn_cast(N)) { int Val = (int)C->getZExtValue(); if (Val >= 0 && Val < 256) { Offset = CurDAG->getRegister(0, MVT::i32); Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(AddSub, Val), MVT::i32); return true; } } Offset = N; Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(AddSub, 0), MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode4(SDNode *Op, SDValue N, SDValue &Addr, SDValue &Mode) { Addr = N; Mode = CurDAG->getTargetConstant(0, MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode5(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset) { if (N.getOpcode() != ISD::ADD) { Base = N; if (N.getOpcode() == ISD::FrameIndex) { int FI = cast(N)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } else if (N.getOpcode() == ARMISD::Wrapper && !(Subtarget->useMovt() && N.getOperand(0).getOpcode() == ISD::TargetGlobalAddress)) { Base = N.getOperand(0); } Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::add, 0), MVT::i32); return true; } // If the RHS is +/- imm8, fold into addr mode. if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if ((RHSC & 3) == 0) { // The constant is implicitly multiplied by 4. RHSC >>= 2; if ((RHSC >= 0 && RHSC < 256) || (RHSC < 0 && RHSC > -256)) { // note -256 itself isn't allowed. Base = N.getOperand(0); if (Base.getOpcode() == ISD::FrameIndex) { int FI = cast(Base)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } ARM_AM::AddrOpc AddSub = ARM_AM::add; if (RHSC < 0) { AddSub = ARM_AM::sub; RHSC = - RHSC; } Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(AddSub, RHSC), MVT::i32); return true; } } } Base = N; Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::add, 0), MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrMode6(SDNode *Op, SDValue N, SDValue &Addr, SDValue &Align) { Addr = N; // Default to no alignment. Align = CurDAG->getTargetConstant(0, MVT::i32); return true; } bool ARMDAGToDAGISel::SelectAddrModePC(SDNode *Op, SDValue N, SDValue &Offset, SDValue &Label) { if (N.getOpcode() == ARMISD::PIC_ADD && N.hasOneUse()) { Offset = N.getOperand(0); SDValue N1 = N.getOperand(1); Label = CurDAG->getTargetConstant(cast(N1)->getZExtValue(), MVT::i32); return true; } return false; } bool ARMDAGToDAGISel::SelectThumbAddrModeRR(SDNode *Op, SDValue N, SDValue &Base, SDValue &Offset){ // FIXME dl should come from the parent load or store, not the address DebugLoc dl = Op->getDebugLoc(); if (N.getOpcode() != ISD::ADD) { ConstantSDNode *NC = dyn_cast(N); if (!NC || NC->getZExtValue() != 0) return false; Base = Offset = N; return true; } Base = N.getOperand(0); Offset = N.getOperand(1); return true; } bool ARMDAGToDAGISel::SelectThumbAddrModeRI5(SDNode *Op, SDValue N, unsigned Scale, SDValue &Base, SDValue &OffImm, SDValue &Offset) { if (Scale == 4) { SDValue TmpBase, TmpOffImm; if (SelectThumbAddrModeSP(Op, N, TmpBase, TmpOffImm)) return false; // We want to select tLDRspi / tSTRspi instead. if (N.getOpcode() == ARMISD::Wrapper && N.getOperand(0).getOpcode() == ISD::TargetConstantPool) return false; // We want to select tLDRpci instead. } if (N.getOpcode() != ISD::ADD) { if (N.getOpcode() == ARMISD::Wrapper && !(Subtarget->useMovt() && N.getOperand(0).getOpcode() == ISD::TargetGlobalAddress)) { Base = N.getOperand(0); } else Base = N; Offset = CurDAG->getRegister(0, MVT::i32); OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } // Thumb does not have [sp, r] address mode. RegisterSDNode *LHSR = dyn_cast(N.getOperand(0)); RegisterSDNode *RHSR = dyn_cast(N.getOperand(1)); if ((LHSR && LHSR->getReg() == ARM::SP) || (RHSR && RHSR->getReg() == ARM::SP)) { Base = N; Offset = CurDAG->getRegister(0, MVT::i32); OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } // If the RHS is + imm5 * scale, fold into addr mode. if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if ((RHSC & (Scale-1)) == 0) { // The constant is implicitly multiplied. RHSC /= Scale; if (RHSC >= 0 && RHSC < 32) { Base = N.getOperand(0); Offset = CurDAG->getRegister(0, MVT::i32); OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32); return true; } } } Base = N.getOperand(0); Offset = N.getOperand(1); OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } bool ARMDAGToDAGISel::SelectThumbAddrModeS1(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm, SDValue &Offset) { return SelectThumbAddrModeRI5(Op, N, 1, Base, OffImm, Offset); } bool ARMDAGToDAGISel::SelectThumbAddrModeS2(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm, SDValue &Offset) { return SelectThumbAddrModeRI5(Op, N, 2, Base, OffImm, Offset); } bool ARMDAGToDAGISel::SelectThumbAddrModeS4(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm, SDValue &Offset) { return SelectThumbAddrModeRI5(Op, N, 4, Base, OffImm, Offset); } bool ARMDAGToDAGISel::SelectThumbAddrModeSP(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm) { if (N.getOpcode() == ISD::FrameIndex) { int FI = cast(N)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } if (N.getOpcode() != ISD::ADD) return false; RegisterSDNode *LHSR = dyn_cast(N.getOperand(0)); if (N.getOperand(0).getOpcode() == ISD::FrameIndex || (LHSR && LHSR->getReg() == ARM::SP)) { // If the RHS is + imm8 * scale, fold into addr mode. if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if ((RHSC & 3) == 0) { // The constant is implicitly multiplied. RHSC >>= 2; if (RHSC >= 0 && RHSC < 256) { Base = N.getOperand(0); if (Base.getOpcode() == ISD::FrameIndex) { int FI = cast(Base)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32); return true; } } } } return false; } bool ARMDAGToDAGISel::SelectT2ShifterOperandReg(SDNode *Op, SDValue N, SDValue &BaseReg, SDValue &Opc) { ARM_AM::ShiftOpc ShOpcVal = ARM_AM::getShiftOpcForNode(N); // Don't match base register only case. That is matched to a separate // lower complexity pattern with explicit register operand. if (ShOpcVal == ARM_AM::no_shift) return false; BaseReg = N.getOperand(0); unsigned ShImmVal = 0; if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { ShImmVal = RHS->getZExtValue() & 31; Opc = getI32Imm(ARM_AM::getSORegOpc(ShOpcVal, ShImmVal)); return true; } return false; } bool ARMDAGToDAGISel::SelectT2AddrModeImm12(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm) { // Match simple R + imm12 operands. // Base only. if (N.getOpcode() != ISD::ADD && N.getOpcode() != ISD::SUB) { if (N.getOpcode() == ISD::FrameIndex) { // Match frame index... int FI = cast(N)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } else if (N.getOpcode() == ARMISD::Wrapper && !(Subtarget->useMovt() && N.getOperand(0).getOpcode() == ISD::TargetGlobalAddress)) { Base = N.getOperand(0); if (Base.getOpcode() == ISD::TargetConstantPool) return false; // We want to select t2LDRpci instead. } else Base = N; OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { if (SelectT2AddrModeImm8(Op, N, Base, OffImm)) // Let t2LDRi8 handle (R - imm8). return false; int RHSC = (int)RHS->getZExtValue(); if (N.getOpcode() == ISD::SUB) RHSC = -RHSC; if (RHSC >= 0 && RHSC < 0x1000) { // 12 bits (unsigned) Base = N.getOperand(0); if (Base.getOpcode() == ISD::FrameIndex) { int FI = cast(Base)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32); return true; } } // Base only. Base = N; OffImm = CurDAG->getTargetConstant(0, MVT::i32); return true; } bool ARMDAGToDAGISel::SelectT2AddrModeImm8(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm) { // Match simple R - imm8 operands. if (N.getOpcode() == ISD::ADD || N.getOpcode() == ISD::SUB) { if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getSExtValue(); if (N.getOpcode() == ISD::SUB) RHSC = -RHSC; if ((RHSC >= -255) && (RHSC < 0)) { // 8 bits (always negative) Base = N.getOperand(0); if (Base.getOpcode() == ISD::FrameIndex) { int FI = cast(Base)->getIndex(); Base = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); } OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32); return true; } } } return false; } bool ARMDAGToDAGISel::SelectT2AddrModeImm8Offset(SDNode *Op, SDValue N, SDValue &OffImm){ unsigned Opcode = Op->getOpcode(); ISD::MemIndexedMode AM = (Opcode == ISD::LOAD) ? cast(Op)->getAddressingMode() : cast(Op)->getAddressingMode(); if (ConstantSDNode *RHS = dyn_cast(N)) { int RHSC = (int)RHS->getZExtValue(); if (RHSC >= 0 && RHSC < 0x100) { // 8 bits. OffImm = ((AM == ISD::PRE_INC) || (AM == ISD::POST_INC)) ? CurDAG->getTargetConstant(RHSC, MVT::i32) : CurDAG->getTargetConstant(-RHSC, MVT::i32); return true; } } return false; } bool ARMDAGToDAGISel::SelectT2AddrModeImm8s4(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffImm) { if (N.getOpcode() == ISD::ADD) { if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if (((RHSC & 0x3) == 0) && ((RHSC >= 0 && RHSC < 0x400) || (RHSC < 0 && RHSC > -0x400))) { // 8 bits. Base = N.getOperand(0); OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32); return true; } } } else if (N.getOpcode() == ISD::SUB) { if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if (((RHSC & 0x3) == 0) && (RHSC >= 0 && RHSC < 0x400)) { // 8 bits. Base = N.getOperand(0); OffImm = CurDAG->getTargetConstant(-RHSC, MVT::i32); return true; } } } return false; } bool ARMDAGToDAGISel::SelectT2AddrModeSoReg(SDNode *Op, SDValue N, SDValue &Base, SDValue &OffReg, SDValue &ShImm) { // (R - imm8) should be handled by t2LDRi8. The rest are handled by t2LDRi12. if (N.getOpcode() != ISD::ADD) return false; // Leave (R + imm12) for t2LDRi12, (R - imm8) for t2LDRi8. if (ConstantSDNode *RHS = dyn_cast(N.getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if (RHSC >= 0 && RHSC < 0x1000) // 12 bits (unsigned) return false; else if (RHSC < 0 && RHSC >= -255) // 8 bits return false; } // Look for (R + R) or (R + (R << [1,2,3])). unsigned ShAmt = 0; Base = N.getOperand(0); OffReg = N.getOperand(1); // Swap if it is ((R << c) + R). ARM_AM::ShiftOpc ShOpcVal = ARM_AM::getShiftOpcForNode(OffReg); if (ShOpcVal != ARM_AM::lsl) { ShOpcVal = ARM_AM::getShiftOpcForNode(Base); if (ShOpcVal == ARM_AM::lsl) std::swap(Base, OffReg); } if (ShOpcVal == ARM_AM::lsl) { // Check to see if the RHS of the shift is a constant, if not, we can't fold // it. if (ConstantSDNode *Sh = dyn_cast(OffReg.getOperand(1))) { ShAmt = Sh->getZExtValue(); if (ShAmt >= 4) { ShAmt = 0; ShOpcVal = ARM_AM::no_shift; } else OffReg = OffReg.getOperand(0); } else { ShOpcVal = ARM_AM::no_shift; } } ShImm = CurDAG->getTargetConstant(ShAmt, MVT::i32); return true; } //===--------------------------------------------------------------------===// /// getAL - Returns a ARMCC::AL immediate node. static inline SDValue getAL(SelectionDAG *CurDAG) { return CurDAG->getTargetConstant((uint64_t)ARMCC::AL, MVT::i32); } SDNode *ARMDAGToDAGISel::SelectARMIndexedLoad(SDNode *N) { LoadSDNode *LD = cast(N); ISD::MemIndexedMode AM = LD->getAddressingMode(); if (AM == ISD::UNINDEXED) return NULL; EVT LoadedVT = LD->getMemoryVT(); SDValue Offset, AMOpc; bool isPre = (AM == ISD::PRE_INC) || (AM == ISD::PRE_DEC); unsigned Opcode = 0; bool Match = false; if (LoadedVT == MVT::i32 && SelectAddrMode2Offset(N, LD->getOffset(), Offset, AMOpc)) { Opcode = isPre ? ARM::LDR_PRE : ARM::LDR_POST; Match = true; } else if (LoadedVT == MVT::i16 && SelectAddrMode3Offset(N, LD->getOffset(), Offset, AMOpc)) { Match = true; Opcode = (LD->getExtensionType() == ISD::SEXTLOAD) ? (isPre ? ARM::LDRSH_PRE : ARM::LDRSH_POST) : (isPre ? ARM::LDRH_PRE : ARM::LDRH_POST); } else if (LoadedVT == MVT::i8 || LoadedVT == MVT::i1) { if (LD->getExtensionType() == ISD::SEXTLOAD) { if (SelectAddrMode3Offset(N, LD->getOffset(), Offset, AMOpc)) { Match = true; Opcode = isPre ? ARM::LDRSB_PRE : ARM::LDRSB_POST; } } else { if (SelectAddrMode2Offset(N, LD->getOffset(), Offset, AMOpc)) { Match = true; Opcode = isPre ? ARM::LDRB_PRE : ARM::LDRB_POST; } } } if (Match) { SDValue Chain = LD->getChain(); SDValue Base = LD->getBasePtr(); SDValue Ops[]= { Base, Offset, AMOpc, getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), Chain }; return CurDAG->getMachineNode(Opcode, N->getDebugLoc(), MVT::i32, MVT::i32, MVT::Other, Ops, 6); } return NULL; } SDNode *ARMDAGToDAGISel::SelectT2IndexedLoad(SDNode *N) { LoadSDNode *LD = cast(N); ISD::MemIndexedMode AM = LD->getAddressingMode(); if (AM == ISD::UNINDEXED) return NULL; EVT LoadedVT = LD->getMemoryVT(); bool isSExtLd = LD->getExtensionType() == ISD::SEXTLOAD; SDValue Offset; bool isPre = (AM == ISD::PRE_INC) || (AM == ISD::PRE_DEC); unsigned Opcode = 0; bool Match = false; if (SelectT2AddrModeImm8Offset(N, LD->getOffset(), Offset)) { switch (LoadedVT.getSimpleVT().SimpleTy) { case MVT::i32: Opcode = isPre ? ARM::t2LDR_PRE : ARM::t2LDR_POST; break; case MVT::i16: if (isSExtLd) Opcode = isPre ? ARM::t2LDRSH_PRE : ARM::t2LDRSH_POST; else Opcode = isPre ? ARM::t2LDRH_PRE : ARM::t2LDRH_POST; break; case MVT::i8: case MVT::i1: if (isSExtLd) Opcode = isPre ? ARM::t2LDRSB_PRE : ARM::t2LDRSB_POST; else Opcode = isPre ? ARM::t2LDRB_PRE : ARM::t2LDRB_POST; break; default: return NULL; } Match = true; } if (Match) { SDValue Chain = LD->getChain(); SDValue Base = LD->getBasePtr(); SDValue Ops[]= { Base, Offset, getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), Chain }; return CurDAG->getMachineNode(Opcode, N->getDebugLoc(), MVT::i32, MVT::i32, MVT::Other, Ops, 5); } return NULL; } /// PairDRegs - Insert a pair of double registers into an implicit def to /// form a quad register. SDNode *ARMDAGToDAGISel::PairDRegs(EVT VT, SDValue V0, SDValue V1) { DebugLoc dl = V0.getNode()->getDebugLoc(); - SDValue Undef = - SDValue(CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, dl, VT), 0); SDValue SubReg0 = CurDAG->getTargetConstant(ARM::DSUBREG_0, MVT::i32); SDValue SubReg1 = CurDAG->getTargetConstant(ARM::DSUBREG_1, MVT::i32); + if (UseRegSeq) { + const SDValue Ops[] = { V0, SubReg0, V1, SubReg1 }; + return CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE, dl, VT, Ops, 4); + } + SDValue Undef = + SDValue(CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, dl, VT), 0); SDNode *Pair = CurDAG->getMachineNode(TargetOpcode::INSERT_SUBREG, dl, VT, Undef, V0, SubReg0); return CurDAG->getMachineNode(TargetOpcode::INSERT_SUBREG, dl, VT, SDValue(Pair, 0), V1, SubReg1); } /// GetNEONSubregVT - Given a type for a 128-bit NEON vector, return the type /// for a 64-bit subregister of the vector. static EVT GetNEONSubregVT(EVT VT) { switch (VT.getSimpleVT().SimpleTy) { default: llvm_unreachable("unhandled NEON type"); case MVT::v16i8: return MVT::v8i8; case MVT::v8i16: return MVT::v4i16; case MVT::v4f32: return MVT::v2f32; case MVT::v4i32: return MVT::v2i32; case MVT::v2i64: return MVT::v1i64; } } SDNode *ARMDAGToDAGISel::SelectVLD(SDNode *N, unsigned NumVecs, unsigned *DOpcodes, unsigned *QOpcodes0, unsigned *QOpcodes1) { assert(NumVecs >= 1 && NumVecs <= 4 && "VLD NumVecs out-of-range"); DebugLoc dl = N->getDebugLoc(); SDValue MemAddr, Align; if (!SelectAddrMode6(N, N->getOperand(2), MemAddr, Align)) return NULL; SDValue Chain = N->getOperand(0); EVT VT = N->getValueType(0); bool is64BitVector = VT.is64BitVector(); unsigned OpcodeIndex; switch (VT.getSimpleVT().SimpleTy) { default: llvm_unreachable("unhandled vld type"); // Double-register operations: case MVT::v8i8: OpcodeIndex = 0; break; case MVT::v4i16: OpcodeIndex = 1; break; case MVT::v2f32: case MVT::v2i32: OpcodeIndex = 2; break; case MVT::v1i64: OpcodeIndex = 3; break; // Quad-register operations: case MVT::v16i8: OpcodeIndex = 0; break; case MVT::v8i16: OpcodeIndex = 1; break; case MVT::v4f32: case MVT::v4i32: OpcodeIndex = 2; break; case MVT::v2i64: OpcodeIndex = 3; assert(NumVecs == 1 && "v2i64 type only supported for VLD1"); break; } SDValue Pred = getAL(CurDAG); SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); if (is64BitVector) { unsigned Opc = DOpcodes[OpcodeIndex]; const SDValue Ops[] = { MemAddr, Align, Pred, Reg0, Chain }; std::vector ResTys(NumVecs, VT); ResTys.push_back(MVT::Other); return CurDAG->getMachineNode(Opc, dl, ResTys, Ops, 5); } EVT RegVT = GetNEONSubregVT(VT); if (NumVecs <= 2) { // Quad registers are directly supported for VLD1 and VLD2, // loading pairs of D regs. unsigned Opc = QOpcodes0[OpcodeIndex]; const SDValue Ops[] = { MemAddr, Align, Pred, Reg0, Chain }; std::vector ResTys(2 * NumVecs, RegVT); ResTys.push_back(MVT::Other); SDNode *VLd = CurDAG->getMachineNode(Opc, dl, ResTys, Ops, 5); Chain = SDValue(VLd, 2 * NumVecs); // Combine the even and odd subregs to produce the result. for (unsigned Vec = 0; Vec < NumVecs; ++Vec) { SDNode *Q = PairDRegs(VT, SDValue(VLd, 2*Vec), SDValue(VLd, 2*Vec+1)); ReplaceUses(SDValue(N, Vec), SDValue(Q, 0)); } } else { // Otherwise, quad registers are loaded with two separate instructions, // where one loads the even registers and the other loads the odd registers. std::vector ResTys(NumVecs, RegVT); ResTys.push_back(MemAddr.getValueType()); ResTys.push_back(MVT::Other); // Load the even subregs. unsigned Opc = QOpcodes0[OpcodeIndex]; const SDValue OpsA[] = { MemAddr, Align, Reg0, Pred, Reg0, Chain }; SDNode *VLdA = CurDAG->getMachineNode(Opc, dl, ResTys, OpsA, 6); Chain = SDValue(VLdA, NumVecs+1); // Load the odd subregs. Opc = QOpcodes1[OpcodeIndex]; const SDValue OpsB[] = { SDValue(VLdA, NumVecs), Align, Reg0, Pred, Reg0, Chain }; SDNode *VLdB = CurDAG->getMachineNode(Opc, dl, ResTys, OpsB, 6); Chain = SDValue(VLdB, NumVecs+1); // Combine the even and odd subregs to produce the result. for (unsigned Vec = 0; Vec < NumVecs; ++Vec) { SDNode *Q = PairDRegs(VT, SDValue(VLdA, Vec), SDValue(VLdB, Vec)); ReplaceUses(SDValue(N, Vec), SDValue(Q, 0)); } } ReplaceUses(SDValue(N, NumVecs), Chain); return NULL; } SDNode *ARMDAGToDAGISel::SelectVST(SDNode *N, unsigned NumVecs, unsigned *DOpcodes, unsigned *QOpcodes0, unsigned *QOpcodes1) { assert(NumVecs >=1 && NumVecs <= 4 && "VST NumVecs out-of-range"); DebugLoc dl = N->getDebugLoc(); SDValue MemAddr, Align; if (!SelectAddrMode6(N, N->getOperand(2), MemAddr, Align)) return NULL; SDValue Chain = N->getOperand(0); EVT VT = N->getOperand(3).getValueType(); bool is64BitVector = VT.is64BitVector(); unsigned OpcodeIndex; switch (VT.getSimpleVT().SimpleTy) { default: llvm_unreachable("unhandled vst type"); // Double-register operations: case MVT::v8i8: OpcodeIndex = 0; break; case MVT::v4i16: OpcodeIndex = 1; break; case MVT::v2f32: case MVT::v2i32: OpcodeIndex = 2; break; case MVT::v1i64: OpcodeIndex = 3; break; // Quad-register operations: case MVT::v16i8: OpcodeIndex = 0; break; case MVT::v8i16: OpcodeIndex = 1; break; case MVT::v4f32: case MVT::v4i32: OpcodeIndex = 2; break; case MVT::v2i64: OpcodeIndex = 3; assert(NumVecs == 1 && "v2i64 type only supported for VST1"); break; } SDValue Pred = getAL(CurDAG); SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); SmallVector Ops; Ops.push_back(MemAddr); Ops.push_back(Align); if (is64BitVector) { unsigned Opc = DOpcodes[OpcodeIndex]; for (unsigned Vec = 0; Vec < NumVecs; ++Vec) Ops.push_back(N->getOperand(Vec+3)); Ops.push_back(Pred); Ops.push_back(Reg0); // predicate register Ops.push_back(Chain); return CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops.data(), NumVecs+5); } EVT RegVT = GetNEONSubregVT(VT); if (NumVecs <= 2) { // Quad registers are directly supported for VST1 and VST2, // storing pairs of D regs. unsigned Opc = QOpcodes0[OpcodeIndex]; for (unsigned Vec = 0; Vec < NumVecs; ++Vec) { Ops.push_back(CurDAG->getTargetExtractSubreg(ARM::DSUBREG_0, dl, RegVT, N->getOperand(Vec+3))); Ops.push_back(CurDAG->getTargetExtractSubreg(ARM::DSUBREG_1, dl, RegVT, N->getOperand(Vec+3))); } Ops.push_back(Pred); Ops.push_back(Reg0); // predicate register Ops.push_back(Chain); return CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops.data(), 5 + 2 * NumVecs); } // Otherwise, quad registers are stored with two separate instructions, // where one stores the even registers and the other stores the odd registers. Ops.push_back(Reg0); // post-access address offset // Store the even subregs. for (unsigned Vec = 0; Vec < NumVecs; ++Vec) Ops.push_back(CurDAG->getTargetExtractSubreg(ARM::DSUBREG_0, dl, RegVT, N->getOperand(Vec+3))); Ops.push_back(Pred); Ops.push_back(Reg0); // predicate register Ops.push_back(Chain); unsigned Opc = QOpcodes0[OpcodeIndex]; SDNode *VStA = CurDAG->getMachineNode(Opc, dl, MemAddr.getValueType(), MVT::Other, Ops.data(), NumVecs+6); Chain = SDValue(VStA, 1); // Store the odd subregs. Ops[0] = SDValue(VStA, 0); // MemAddr for (unsigned Vec = 0; Vec < NumVecs; ++Vec) Ops[Vec+3] = CurDAG->getTargetExtractSubreg(ARM::DSUBREG_1, dl, RegVT, N->getOperand(Vec+3)); Ops[NumVecs+5] = Chain; Opc = QOpcodes1[OpcodeIndex]; SDNode *VStB = CurDAG->getMachineNode(Opc, dl, MemAddr.getValueType(), MVT::Other, Ops.data(), NumVecs+6); Chain = SDValue(VStB, 1); ReplaceUses(SDValue(N, 0), Chain); return NULL; } SDNode *ARMDAGToDAGISel::SelectVLDSTLane(SDNode *N, bool IsLoad, unsigned NumVecs, unsigned *DOpcodes, unsigned *QOpcodes0, unsigned *QOpcodes1) { assert(NumVecs >=2 && NumVecs <= 4 && "VLDSTLane NumVecs out-of-range"); DebugLoc dl = N->getDebugLoc(); SDValue MemAddr, Align; if (!SelectAddrMode6(N, N->getOperand(2), MemAddr, Align)) return NULL; SDValue Chain = N->getOperand(0); unsigned Lane = cast(N->getOperand(NumVecs+3))->getZExtValue(); EVT VT = IsLoad ? N->getValueType(0) : N->getOperand(3).getValueType(); bool is64BitVector = VT.is64BitVector(); // Quad registers are handled by load/store of subregs. Find the subreg info. unsigned NumElts = 0; int SubregIdx = 0; EVT RegVT = VT; if (!is64BitVector) { RegVT = GetNEONSubregVT(VT); NumElts = RegVT.getVectorNumElements(); SubregIdx = (Lane < NumElts) ? ARM::DSUBREG_0 : ARM::DSUBREG_1; } unsigned OpcodeIndex; switch (VT.getSimpleVT().SimpleTy) { default: llvm_unreachable("unhandled vld/vst lane type"); // Double-register operations: case MVT::v8i8: OpcodeIndex = 0; break; case MVT::v4i16: OpcodeIndex = 1; break; case MVT::v2f32: case MVT::v2i32: OpcodeIndex = 2; break; // Quad-register operations: case MVT::v8i16: OpcodeIndex = 0; break; case MVT::v4f32: case MVT::v4i32: OpcodeIndex = 1; break; } SDValue Pred = getAL(CurDAG); SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); SmallVector Ops; Ops.push_back(MemAddr); Ops.push_back(Align); unsigned Opc = 0; if (is64BitVector) { Opc = DOpcodes[OpcodeIndex]; for (unsigned Vec = 0; Vec < NumVecs; ++Vec) Ops.push_back(N->getOperand(Vec+3)); } else { // Check if this is loading the even or odd subreg of a Q register. if (Lane < NumElts) { Opc = QOpcodes0[OpcodeIndex]; } else { Lane -= NumElts; Opc = QOpcodes1[OpcodeIndex]; } // Extract the subregs of the input vector. for (unsigned Vec = 0; Vec < NumVecs; ++Vec) Ops.push_back(CurDAG->getTargetExtractSubreg(SubregIdx, dl, RegVT, N->getOperand(Vec+3))); } Ops.push_back(getI32Imm(Lane)); Ops.push_back(Pred); Ops.push_back(Reg0); Ops.push_back(Chain); if (!IsLoad) return CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops.data(), NumVecs+6); std::vector ResTys(NumVecs, RegVT); ResTys.push_back(MVT::Other); SDNode *VLdLn = CurDAG->getMachineNode(Opc, dl, ResTys, Ops.data(), NumVecs+6); // For a 64-bit vector load to D registers, nothing more needs to be done. if (is64BitVector) return VLdLn; // For 128-bit vectors, take the 64-bit results of the load and insert them // as subregs into the result. for (unsigned Vec = 0; Vec < NumVecs; ++Vec) { SDValue QuadVec = CurDAG->getTargetInsertSubreg(SubregIdx, dl, VT, N->getOperand(Vec+3), SDValue(VLdLn, Vec)); ReplaceUses(SDValue(N, Vec), QuadVec); } Chain = SDValue(VLdLn, NumVecs); ReplaceUses(SDValue(N, NumVecs), Chain); return NULL; } SDNode *ARMDAGToDAGISel::SelectV6T2BitfieldExtractOp(SDNode *N, bool isSigned) { if (!Subtarget->hasV6T2Ops()) return NULL; unsigned Opc = isSigned ? (Subtarget->isThumb() ? ARM::t2SBFX : ARM::SBFX) : (Subtarget->isThumb() ? ARM::t2UBFX : ARM::UBFX); // For unsigned extracts, check for a shift right and mask unsigned And_imm = 0; if (N->getOpcode() == ISD::AND) { if (isOpcWithIntImmediate(N, ISD::AND, And_imm)) { // The immediate is a mask of the low bits iff imm & (imm+1) == 0 if (And_imm & (And_imm + 1)) return NULL; unsigned Srl_imm = 0; if (isOpcWithIntImmediate(N->getOperand(0).getNode(), ISD::SRL, Srl_imm)) { assert(Srl_imm > 0 && Srl_imm < 32 && "bad amount in shift node!"); unsigned Width = CountTrailingOnes_32(And_imm); unsigned LSB = Srl_imm; SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(0).getOperand(0), CurDAG->getTargetConstant(LSB, MVT::i32), CurDAG->getTargetConstant(Width, MVT::i32), getAL(CurDAG), Reg0 }; return CurDAG->SelectNodeTo(N, Opc, MVT::i32, Ops, 5); } } return NULL; } // Otherwise, we're looking for a shift of a shift unsigned Shl_imm = 0; if (isOpcWithIntImmediate(N->getOperand(0).getNode(), ISD::SHL, Shl_imm)) { assert(Shl_imm > 0 && Shl_imm < 32 && "bad amount in shift node!"); unsigned Srl_imm = 0; if (isInt32Immediate(N->getOperand(1), Srl_imm)) { assert(Srl_imm > 0 && Srl_imm < 32 && "bad amount in shift node!"); unsigned Width = 32 - Srl_imm; int LSB = Srl_imm - Shl_imm; if (LSB < 0) return NULL; SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(0).getOperand(0), CurDAG->getTargetConstant(LSB, MVT::i32), CurDAG->getTargetConstant(Width, MVT::i32), getAL(CurDAG), Reg0 }; return CurDAG->SelectNodeTo(N, Opc, MVT::i32, Ops, 5); } } return NULL; } SDNode *ARMDAGToDAGISel:: SelectT2CMOVShiftOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag) { SDValue CPTmp0; SDValue CPTmp1; if (SelectT2ShifterOperandReg(N, TrueVal, CPTmp0, CPTmp1)) { unsigned SOVal = cast(CPTmp1)->getZExtValue(); unsigned SOShOp = ARM_AM::getSORegShOp(SOVal); unsigned Opc = 0; switch (SOShOp) { case ARM_AM::lsl: Opc = ARM::t2MOVCClsl; break; case ARM_AM::lsr: Opc = ARM::t2MOVCClsr; break; case ARM_AM::asr: Opc = ARM::t2MOVCCasr; break; case ARM_AM::ror: Opc = ARM::t2MOVCCror; break; default: llvm_unreachable("Unknown so_reg opcode!"); break; } SDValue SOShImm = CurDAG->getTargetConstant(ARM_AM::getSORegOffset(SOVal), MVT::i32); SDValue CC = CurDAG->getTargetConstant(CCVal, MVT::i32); SDValue Ops[] = { FalseVal, CPTmp0, SOShImm, CC, CCR, InFlag }; return CurDAG->SelectNodeTo(N, Opc, MVT::i32,Ops, 6); } return 0; } SDNode *ARMDAGToDAGISel:: SelectARMCMOVShiftOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag) { SDValue CPTmp0; SDValue CPTmp1; SDValue CPTmp2; if (SelectShifterOperandReg(N, TrueVal, CPTmp0, CPTmp1, CPTmp2)) { SDValue CC = CurDAG->getTargetConstant(CCVal, MVT::i32); SDValue Ops[] = { FalseVal, CPTmp0, CPTmp1, CPTmp2, CC, CCR, InFlag }; return CurDAG->SelectNodeTo(N, ARM::MOVCCs, MVT::i32, Ops, 7); } return 0; } SDNode *ARMDAGToDAGISel:: SelectT2CMOVSoImmOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag) { ConstantSDNode *T = dyn_cast(TrueVal); if (!T) return 0; if (Predicate_t2_so_imm(TrueVal.getNode())) { SDValue True = CurDAG->getTargetConstant(T->getZExtValue(), MVT::i32); SDValue CC = CurDAG->getTargetConstant(CCVal, MVT::i32); SDValue Ops[] = { FalseVal, True, CC, CCR, InFlag }; return CurDAG->SelectNodeTo(N, ARM::t2MOVCCi, MVT::i32, Ops, 5); } return 0; } SDNode *ARMDAGToDAGISel:: SelectARMCMOVSoImmOp(SDNode *N, SDValue FalseVal, SDValue TrueVal, ARMCC::CondCodes CCVal, SDValue CCR, SDValue InFlag) { ConstantSDNode *T = dyn_cast(TrueVal); if (!T) return 0; if (Predicate_so_imm(TrueVal.getNode())) { SDValue True = CurDAG->getTargetConstant(T->getZExtValue(), MVT::i32); SDValue CC = CurDAG->getTargetConstant(CCVal, MVT::i32); SDValue Ops[] = { FalseVal, True, CC, CCR, InFlag }; return CurDAG->SelectNodeTo(N, ARM::MOVCCi, MVT::i32, Ops, 5); } return 0; } SDNode *ARMDAGToDAGISel::SelectCMOVOp(SDNode *N) { EVT VT = N->getValueType(0); SDValue FalseVal = N->getOperand(0); SDValue TrueVal = N->getOperand(1); SDValue CC = N->getOperand(2); SDValue CCR = N->getOperand(3); SDValue InFlag = N->getOperand(4); assert(CC.getOpcode() == ISD::Constant); assert(CCR.getOpcode() == ISD::Register); ARMCC::CondCodes CCVal = (ARMCC::CondCodes)cast(CC)->getZExtValue(); if (!Subtarget->isThumb1Only() && VT == MVT::i32) { // Pattern: (ARMcmov:i32 GPR:i32:$false, so_reg:i32:$true, (imm:i32):$cc) // Emits: (MOVCCs:i32 GPR:i32:$false, so_reg:i32:$true, (imm:i32):$cc) // Pattern complexity = 18 cost = 1 size = 0 SDValue CPTmp0; SDValue CPTmp1; SDValue CPTmp2; if (Subtarget->isThumb()) { SDNode *Res = SelectT2CMOVShiftOp(N, FalseVal, TrueVal, CCVal, CCR, InFlag); if (!Res) Res = SelectT2CMOVShiftOp(N, TrueVal, FalseVal, ARMCC::getOppositeCondition(CCVal), CCR, InFlag); if (Res) return Res; } else { SDNode *Res = SelectARMCMOVShiftOp(N, FalseVal, TrueVal, CCVal, CCR, InFlag); if (!Res) Res = SelectARMCMOVShiftOp(N, TrueVal, FalseVal, ARMCC::getOppositeCondition(CCVal), CCR, InFlag); if (Res) return Res; } // Pattern: (ARMcmov:i32 GPR:i32:$false, // (imm:i32)<>:$true, // (imm:i32):$cc) // Emits: (MOVCCi:i32 GPR:i32:$false, // (so_imm:i32 (imm:i32):$true), (imm:i32):$cc) // Pattern complexity = 10 cost = 1 size = 0 if (Subtarget->isThumb()) { SDNode *Res = SelectT2CMOVSoImmOp(N, FalseVal, TrueVal, CCVal, CCR, InFlag); if (!Res) Res = SelectT2CMOVSoImmOp(N, TrueVal, FalseVal, ARMCC::getOppositeCondition(CCVal), CCR, InFlag); if (Res) return Res; } else { SDNode *Res = SelectARMCMOVSoImmOp(N, FalseVal, TrueVal, CCVal, CCR, InFlag); if (!Res) Res = SelectARMCMOVSoImmOp(N, TrueVal, FalseVal, ARMCC::getOppositeCondition(CCVal), CCR, InFlag); if (Res) return Res; } } // Pattern: (ARMcmov:i32 GPR:i32:$false, GPR:i32:$true, (imm:i32):$cc) // Emits: (MOVCCr:i32 GPR:i32:$false, GPR:i32:$true, (imm:i32):$cc) // Pattern complexity = 6 cost = 1 size = 0 // // Pattern: (ARMcmov:i32 GPR:i32:$false, GPR:i32:$true, (imm:i32):$cc) // Emits: (tMOVCCr:i32 GPR:i32:$false, GPR:i32:$true, (imm:i32):$cc) // Pattern complexity = 6 cost = 11 size = 0 // // Also FCPYScc and FCPYDcc. SDValue Tmp2 = CurDAG->getTargetConstant(CCVal, MVT::i32); SDValue Ops[] = { FalseVal, TrueVal, Tmp2, CCR, InFlag }; unsigned Opc = 0; switch (VT.getSimpleVT().SimpleTy) { default: assert(false && "Illegal conditional move type!"); break; case MVT::i32: Opc = Subtarget->isThumb() ? (Subtarget->hasThumb2() ? ARM::t2MOVCCr : ARM::tMOVCCr_pseudo) : ARM::MOVCCr; break; case MVT::f32: Opc = ARM::VMOVScc; break; case MVT::f64: Opc = ARM::VMOVDcc; break; } return CurDAG->SelectNodeTo(N, Opc, VT, Ops, 5); } SDNode *ARMDAGToDAGISel::Select(SDNode *N) { DebugLoc dl = N->getDebugLoc(); if (N->isMachineOpcode()) return NULL; // Already selected. switch (N->getOpcode()) { default: break; case ISD::Constant: { unsigned Val = cast(N)->getZExtValue(); bool UseCP = true; if (Subtarget->hasThumb2()) // Thumb2-aware targets have the MOVT instruction, so all immediates can // be done with MOV + MOVT, at worst. UseCP = 0; else { if (Subtarget->isThumb()) { UseCP = (Val > 255 && // MOV ~Val > 255 && // MOV + MVN !ARM_AM::isThumbImmShiftedVal(Val)); // MOV + LSL } else UseCP = (ARM_AM::getSOImmVal(Val) == -1 && // MOV ARM_AM::getSOImmVal(~Val) == -1 && // MVN !ARM_AM::isSOImmTwoPartVal(Val)); // two instrs. } if (UseCP) { SDValue CPIdx = CurDAG->getTargetConstantPool(ConstantInt::get( Type::getInt32Ty(*CurDAG->getContext()), Val), TLI.getPointerTy()); SDNode *ResNode; if (Subtarget->isThumb1Only()) { SDValue Pred = getAL(CurDAG); SDValue PredReg = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { CPIdx, Pred, PredReg, CurDAG->getEntryNode() }; ResNode = CurDAG->getMachineNode(ARM::tLDRcp, dl, MVT::i32, MVT::Other, Ops, 4); } else { SDValue Ops[] = { CPIdx, CurDAG->getRegister(0, MVT::i32), CurDAG->getTargetConstant(0, MVT::i32), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), CurDAG->getEntryNode() }; ResNode=CurDAG->getMachineNode(ARM::LDRcp, dl, MVT::i32, MVT::Other, Ops, 6); } ReplaceUses(SDValue(N, 0), SDValue(ResNode, 0)); return NULL; } // Other cases are autogenerated. break; } case ISD::FrameIndex: { // Selects to ADDri FI, 0 which in turn will become ADDri SP, imm. int FI = cast(N)->getIndex(); SDValue TFI = CurDAG->getTargetFrameIndex(FI, TLI.getPointerTy()); if (Subtarget->isThumb1Only()) { return CurDAG->SelectNodeTo(N, ARM::tADDrSPi, MVT::i32, TFI, CurDAG->getTargetConstant(0, MVT::i32)); } else { unsigned Opc = ((Subtarget->isThumb() && Subtarget->hasThumb2()) ? ARM::t2ADDri : ARM::ADDri); SDValue Ops[] = { TFI, CurDAG->getTargetConstant(0, MVT::i32), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), CurDAG->getRegister(0, MVT::i32) }; return CurDAG->SelectNodeTo(N, Opc, MVT::i32, Ops, 5); } } case ISD::SRL: if (SDNode *I = SelectV6T2BitfieldExtractOp(N, false)) return I; break; case ISD::SRA: if (SDNode *I = SelectV6T2BitfieldExtractOp(N, true)) return I; break; case ISD::MUL: if (Subtarget->isThumb1Only()) break; if (ConstantSDNode *C = dyn_cast(N->getOperand(1))) { unsigned RHSV = C->getZExtValue(); if (!RHSV) break; if (isPowerOf2_32(RHSV-1)) { // 2^n+1? unsigned ShImm = Log2_32(RHSV-1); if (ShImm >= 32) break; SDValue V = N->getOperand(0); ShImm = ARM_AM::getSORegOpc(ARM_AM::lsl, ShImm); SDValue ShImmOp = CurDAG->getTargetConstant(ShImm, MVT::i32); SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); if (Subtarget->isThumb()) { SDValue Ops[] = { V, V, ShImmOp, getAL(CurDAG), Reg0, Reg0 }; return CurDAG->SelectNodeTo(N, ARM::t2ADDrs, MVT::i32, Ops, 6); } else { SDValue Ops[] = { V, V, Reg0, ShImmOp, getAL(CurDAG), Reg0, Reg0 }; return CurDAG->SelectNodeTo(N, ARM::ADDrs, MVT::i32, Ops, 7); } } if (isPowerOf2_32(RHSV+1)) { // 2^n-1? unsigned ShImm = Log2_32(RHSV+1); if (ShImm >= 32) break; SDValue V = N->getOperand(0); ShImm = ARM_AM::getSORegOpc(ARM_AM::lsl, ShImm); SDValue ShImmOp = CurDAG->getTargetConstant(ShImm, MVT::i32); SDValue Reg0 = CurDAG->getRegister(0, MVT::i32); if (Subtarget->isThumb()) { SDValue Ops[] = { V, V, ShImmOp, getAL(CurDAG), Reg0 }; return CurDAG->SelectNodeTo(N, ARM::t2RSBrs, MVT::i32, Ops, 5); } else { SDValue Ops[] = { V, V, Reg0, ShImmOp, getAL(CurDAG), Reg0, Reg0 }; return CurDAG->SelectNodeTo(N, ARM::RSBrs, MVT::i32, Ops, 7); } } } break; case ISD::AND: { // Check for unsigned bitfield extract if (SDNode *I = SelectV6T2BitfieldExtractOp(N, false)) return I; // (and (or x, c2), c1) and top 16-bits of c1 and c2 match, lower 16-bits // of c1 are 0xffff, and lower 16-bit of c2 are 0. That is, the top 16-bits // are entirely contributed by c2 and lower 16-bits are entirely contributed // by x. That's equal to (or (and x, 0xffff), (and c1, 0xffff0000)). // Select it to: "movt x, ((c1 & 0xffff) >> 16) EVT VT = N->getValueType(0); if (VT != MVT::i32) break; unsigned Opc = (Subtarget->isThumb() && Subtarget->hasThumb2()) ? ARM::t2MOVTi16 : (Subtarget->hasV6T2Ops() ? ARM::MOVTi16 : 0); if (!Opc) break; SDValue N0 = N->getOperand(0), N1 = N->getOperand(1); ConstantSDNode *N1C = dyn_cast(N1); if (!N1C) break; if (N0.getOpcode() == ISD::OR && N0.getNode()->hasOneUse()) { SDValue N2 = N0.getOperand(1); ConstantSDNode *N2C = dyn_cast(N2); if (!N2C) break; unsigned N1CVal = N1C->getZExtValue(); unsigned N2CVal = N2C->getZExtValue(); if ((N1CVal & 0xffff0000U) == (N2CVal & 0xffff0000U) && (N1CVal & 0xffffU) == 0xffffU && (N2CVal & 0xffffU) == 0x0U) { SDValue Imm16 = CurDAG->getTargetConstant((N2CVal & 0xFFFF0000U) >> 16, MVT::i32); SDValue Ops[] = { N0.getOperand(0), Imm16, getAL(CurDAG), CurDAG->getRegister(0, MVT::i32) }; return CurDAG->getMachineNode(Opc, dl, VT, Ops, 4); } } break; } case ARMISD::VMOVRRD: return CurDAG->getMachineNode(ARM::VMOVRRD, dl, MVT::i32, MVT::i32, N->getOperand(0), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32)); case ISD::UMUL_LOHI: { if (Subtarget->isThumb1Only()) break; if (Subtarget->isThumb()) { SDValue Ops[] = { N->getOperand(0), N->getOperand(1), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), CurDAG->getRegister(0, MVT::i32) }; return CurDAG->getMachineNode(ARM::t2UMULL, dl, MVT::i32, MVT::i32, Ops,4); } else { SDValue Ops[] = { N->getOperand(0), N->getOperand(1), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), CurDAG->getRegister(0, MVT::i32) }; return CurDAG->getMachineNode(ARM::UMULL, dl, MVT::i32, MVT::i32, Ops, 5); } } case ISD::SMUL_LOHI: { if (Subtarget->isThumb1Only()) break; if (Subtarget->isThumb()) { SDValue Ops[] = { N->getOperand(0), N->getOperand(1), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32) }; return CurDAG->getMachineNode(ARM::t2SMULL, dl, MVT::i32, MVT::i32, Ops,4); } else { SDValue Ops[] = { N->getOperand(0), N->getOperand(1), getAL(CurDAG), CurDAG->getRegister(0, MVT::i32), CurDAG->getRegister(0, MVT::i32) }; return CurDAG->getMachineNode(ARM::SMULL, dl, MVT::i32, MVT::i32, Ops, 5); } } case ISD::LOAD: { SDNode *ResNode = 0; if (Subtarget->isThumb() && Subtarget->hasThumb2()) ResNode = SelectT2IndexedLoad(N); else ResNode = SelectARMIndexedLoad(N); if (ResNode) return ResNode; // VLDMQ must be custom-selected for "v2f64 load" to set the AM5Opc value. if (Subtarget->hasVFP2() && N->getValueType(0).getSimpleVT().SimpleTy == MVT::v2f64) { SDValue Chain = N->getOperand(0); SDValue AM5Opc = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::ia, 4), MVT::i32); SDValue Pred = getAL(CurDAG); SDValue PredReg = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(1), AM5Opc, Pred, PredReg, Chain }; return CurDAG->getMachineNode(ARM::VLDMQ, dl, MVT::v2f64, MVT::Other, Ops, 5); } // Other cases are autogenerated. break; } case ISD::STORE: { // VSTMQ must be custom-selected for "v2f64 store" to set the AM5Opc value. if (Subtarget->hasVFP2() && N->getOperand(1).getValueType().getSimpleVT().SimpleTy == MVT::v2f64) { SDValue Chain = N->getOperand(0); SDValue AM5Opc = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::ia, 4), MVT::i32); SDValue Pred = getAL(CurDAG); SDValue PredReg = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(1), N->getOperand(2), AM5Opc, Pred, PredReg, Chain }; return CurDAG->getMachineNode(ARM::VSTMQ, dl, MVT::Other, Ops, 6); } // Other cases are autogenerated. break; } case ARMISD::BRCOND: { // Pattern: (ARMbrcond:void (bb:Other):$dst, (imm:i32):$cc) // Emits: (Bcc:void (bb:Other):$dst, (imm:i32):$cc) // Pattern complexity = 6 cost = 1 size = 0 // Pattern: (ARMbrcond:void (bb:Other):$dst, (imm:i32):$cc) // Emits: (tBcc:void (bb:Other):$dst, (imm:i32):$cc) // Pattern complexity = 6 cost = 1 size = 0 // Pattern: (ARMbrcond:void (bb:Other):$dst, (imm:i32):$cc) // Emits: (t2Bcc:void (bb:Other):$dst, (imm:i32):$cc) // Pattern complexity = 6 cost = 1 size = 0 unsigned Opc = Subtarget->isThumb() ? ((Subtarget->hasThumb2()) ? ARM::t2Bcc : ARM::tBcc) : ARM::Bcc; SDValue Chain = N->getOperand(0); SDValue N1 = N->getOperand(1); SDValue N2 = N->getOperand(2); SDValue N3 = N->getOperand(3); SDValue InFlag = N->getOperand(4); assert(N1.getOpcode() == ISD::BasicBlock); assert(N2.getOpcode() == ISD::Constant); assert(N3.getOpcode() == ISD::Register); SDValue Tmp2 = CurDAG->getTargetConstant(((unsigned) cast(N2)->getZExtValue()), MVT::i32); SDValue Ops[] = { N1, Tmp2, N3, Chain, InFlag }; SDNode *ResNode = CurDAG->getMachineNode(Opc, dl, MVT::Other, MVT::Flag, Ops, 5); Chain = SDValue(ResNode, 0); if (N->getNumValues() == 2) { InFlag = SDValue(ResNode, 1); ReplaceUses(SDValue(N, 1), InFlag); } ReplaceUses(SDValue(N, 0), SDValue(Chain.getNode(), Chain.getResNo())); return NULL; } case ARMISD::CMOV: return SelectCMOVOp(N); case ARMISD::CNEG: { EVT VT = N->getValueType(0); SDValue N0 = N->getOperand(0); SDValue N1 = N->getOperand(1); SDValue N2 = N->getOperand(2); SDValue N3 = N->getOperand(3); SDValue InFlag = N->getOperand(4); assert(N2.getOpcode() == ISD::Constant); assert(N3.getOpcode() == ISD::Register); SDValue Tmp2 = CurDAG->getTargetConstant(((unsigned) cast(N2)->getZExtValue()), MVT::i32); SDValue Ops[] = { N0, N1, Tmp2, N3, InFlag }; unsigned Opc = 0; switch (VT.getSimpleVT().SimpleTy) { default: assert(false && "Illegal conditional move type!"); break; case MVT::f32: Opc = ARM::VNEGScc; break; case MVT::f64: Opc = ARM::VNEGDcc; break; } return CurDAG->SelectNodeTo(N, Opc, VT, Ops, 5); } case ARMISD::VZIP: { unsigned Opc = 0; EVT VT = N->getValueType(0); switch (VT.getSimpleVT().SimpleTy) { default: return NULL; case MVT::v8i8: Opc = ARM::VZIPd8; break; case MVT::v4i16: Opc = ARM::VZIPd16; break; case MVT::v2f32: case MVT::v2i32: Opc = ARM::VZIPd32; break; case MVT::v16i8: Opc = ARM::VZIPq8; break; case MVT::v8i16: Opc = ARM::VZIPq16; break; case MVT::v4f32: case MVT::v4i32: Opc = ARM::VZIPq32; break; } SDValue Pred = getAL(CurDAG); SDValue PredReg = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(0), N->getOperand(1), Pred, PredReg }; return CurDAG->getMachineNode(Opc, dl, VT, VT, Ops, 4); } case ARMISD::VUZP: { unsigned Opc = 0; EVT VT = N->getValueType(0); switch (VT.getSimpleVT().SimpleTy) { default: return NULL; case MVT::v8i8: Opc = ARM::VUZPd8; break; case MVT::v4i16: Opc = ARM::VUZPd16; break; case MVT::v2f32: case MVT::v2i32: Opc = ARM::VUZPd32; break; case MVT::v16i8: Opc = ARM::VUZPq8; break; case MVT::v8i16: Opc = ARM::VUZPq16; break; case MVT::v4f32: case MVT::v4i32: Opc = ARM::VUZPq32; break; } SDValue Pred = getAL(CurDAG); SDValue PredReg = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(0), N->getOperand(1), Pred, PredReg }; return CurDAG->getMachineNode(Opc, dl, VT, VT, Ops, 4); } case ARMISD::VTRN: { unsigned Opc = 0; EVT VT = N->getValueType(0); switch (VT.getSimpleVT().SimpleTy) { default: return NULL; case MVT::v8i8: Opc = ARM::VTRNd8; break; case MVT::v4i16: Opc = ARM::VTRNd16; break; case MVT::v2f32: case MVT::v2i32: Opc = ARM::VTRNd32; break; case MVT::v16i8: Opc = ARM::VTRNq8; break; case MVT::v8i16: Opc = ARM::VTRNq16; break; case MVT::v4f32: case MVT::v4i32: Opc = ARM::VTRNq32; break; } SDValue Pred = getAL(CurDAG); SDValue PredReg = CurDAG->getRegister(0, MVT::i32); SDValue Ops[] = { N->getOperand(0), N->getOperand(1), Pred, PredReg }; return CurDAG->getMachineNode(Opc, dl, VT, VT, Ops, 4); } case ISD::INTRINSIC_VOID: case ISD::INTRINSIC_W_CHAIN: { unsigned IntNo = cast(N->getOperand(1))->getZExtValue(); switch (IntNo) { default: break; case Intrinsic::arm_neon_vld1: { unsigned DOpcodes[] = { ARM::VLD1d8, ARM::VLD1d16, ARM::VLD1d32, ARM::VLD1d64 }; unsigned QOpcodes[] = { ARM::VLD1q8, ARM::VLD1q16, ARM::VLD1q32, ARM::VLD1q64 }; return SelectVLD(N, 1, DOpcodes, QOpcodes, 0); } case Intrinsic::arm_neon_vld2: { unsigned DOpcodes[] = { ARM::VLD2d8, ARM::VLD2d16, ARM::VLD2d32, ARM::VLD1q64 }; unsigned QOpcodes[] = { ARM::VLD2q8, ARM::VLD2q16, ARM::VLD2q32 }; return SelectVLD(N, 2, DOpcodes, QOpcodes, 0); } case Intrinsic::arm_neon_vld3: { unsigned DOpcodes[] = { ARM::VLD3d8, ARM::VLD3d16, ARM::VLD3d32, ARM::VLD1d64T }; unsigned QOpcodes0[] = { ARM::VLD3q8_UPD, ARM::VLD3q16_UPD, ARM::VLD3q32_UPD }; unsigned QOpcodes1[] = { ARM::VLD3q8odd_UPD, ARM::VLD3q16odd_UPD, ARM::VLD3q32odd_UPD }; return SelectVLD(N, 3, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vld4: { unsigned DOpcodes[] = { ARM::VLD4d8, ARM::VLD4d16, ARM::VLD4d32, ARM::VLD1d64Q }; unsigned QOpcodes0[] = { ARM::VLD4q8_UPD, ARM::VLD4q16_UPD, ARM::VLD4q32_UPD }; unsigned QOpcodes1[] = { ARM::VLD4q8odd_UPD, ARM::VLD4q16odd_UPD, ARM::VLD4q32odd_UPD }; return SelectVLD(N, 4, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vld2lane: { unsigned DOpcodes[] = { ARM::VLD2LNd8, ARM::VLD2LNd16, ARM::VLD2LNd32 }; unsigned QOpcodes0[] = { ARM::VLD2LNq16, ARM::VLD2LNq32 }; unsigned QOpcodes1[] = { ARM::VLD2LNq16odd, ARM::VLD2LNq32odd }; return SelectVLDSTLane(N, true, 2, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vld3lane: { unsigned DOpcodes[] = { ARM::VLD3LNd8, ARM::VLD3LNd16, ARM::VLD3LNd32 }; unsigned QOpcodes0[] = { ARM::VLD3LNq16, ARM::VLD3LNq32 }; unsigned QOpcodes1[] = { ARM::VLD3LNq16odd, ARM::VLD3LNq32odd }; return SelectVLDSTLane(N, true, 3, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vld4lane: { unsigned DOpcodes[] = { ARM::VLD4LNd8, ARM::VLD4LNd16, ARM::VLD4LNd32 }; unsigned QOpcodes0[] = { ARM::VLD4LNq16, ARM::VLD4LNq32 }; unsigned QOpcodes1[] = { ARM::VLD4LNq16odd, ARM::VLD4LNq32odd }; return SelectVLDSTLane(N, true, 4, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vst1: { unsigned DOpcodes[] = { ARM::VST1d8, ARM::VST1d16, ARM::VST1d32, ARM::VST1d64 }; unsigned QOpcodes[] = { ARM::VST1q8, ARM::VST1q16, ARM::VST1q32, ARM::VST1q64 }; return SelectVST(N, 1, DOpcodes, QOpcodes, 0); } case Intrinsic::arm_neon_vst2: { unsigned DOpcodes[] = { ARM::VST2d8, ARM::VST2d16, ARM::VST2d32, ARM::VST1q64 }; unsigned QOpcodes[] = { ARM::VST2q8, ARM::VST2q16, ARM::VST2q32 }; return SelectVST(N, 2, DOpcodes, QOpcodes, 0); } case Intrinsic::arm_neon_vst3: { unsigned DOpcodes[] = { ARM::VST3d8, ARM::VST3d16, ARM::VST3d32, ARM::VST1d64T }; unsigned QOpcodes0[] = { ARM::VST3q8_UPD, ARM::VST3q16_UPD, ARM::VST3q32_UPD }; unsigned QOpcodes1[] = { ARM::VST3q8odd_UPD, ARM::VST3q16odd_UPD, ARM::VST3q32odd_UPD }; return SelectVST(N, 3, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vst4: { unsigned DOpcodes[] = { ARM::VST4d8, ARM::VST4d16, ARM::VST4d32, ARM::VST1d64Q }; unsigned QOpcodes0[] = { ARM::VST4q8_UPD, ARM::VST4q16_UPD, ARM::VST4q32_UPD }; unsigned QOpcodes1[] = { ARM::VST4q8odd_UPD, ARM::VST4q16odd_UPD, ARM::VST4q32odd_UPD }; return SelectVST(N, 4, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vst2lane: { unsigned DOpcodes[] = { ARM::VST2LNd8, ARM::VST2LNd16, ARM::VST2LNd32 }; unsigned QOpcodes0[] = { ARM::VST2LNq16, ARM::VST2LNq32 }; unsigned QOpcodes1[] = { ARM::VST2LNq16odd, ARM::VST2LNq32odd }; return SelectVLDSTLane(N, false, 2, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vst3lane: { unsigned DOpcodes[] = { ARM::VST3LNd8, ARM::VST3LNd16, ARM::VST3LNd32 }; unsigned QOpcodes0[] = { ARM::VST3LNq16, ARM::VST3LNq32 }; unsigned QOpcodes1[] = { ARM::VST3LNq16odd, ARM::VST3LNq32odd }; return SelectVLDSTLane(N, false, 3, DOpcodes, QOpcodes0, QOpcodes1); } case Intrinsic::arm_neon_vst4lane: { unsigned DOpcodes[] = { ARM::VST4LNd8, ARM::VST4LNd16, ARM::VST4LNd32 }; unsigned QOpcodes0[] = { ARM::VST4LNq16, ARM::VST4LNq32 }; unsigned QOpcodes1[] = { ARM::VST4LNq16odd, ARM::VST4LNq32odd }; return SelectVLDSTLane(N, false, 4, DOpcodes, QOpcodes0, QOpcodes1); } } } } return SelectCode(N); } bool ARMDAGToDAGISel:: SelectInlineAsmMemoryOperand(const SDValue &Op, char ConstraintCode, std::vector &OutOps) { assert(ConstraintCode == 'm' && "unexpected asm memory constraint"); // Require the address to be in a register. That is safe for all ARM // variants and it is hard to do anything much smarter without knowing // how the operand is used. OutOps.push_back(Op); return false; } /// createARMISelDag - This pass converts a legalized DAG into a /// ARM-specific DAG, ready for instruction scheduling. /// FunctionPass *llvm::createARMISelDag(ARMBaseTargetMachine &TM, CodeGenOpt::Level OptLevel) { return new ARMDAGToDAGISel(TM, OptLevel); } diff --git a/lib/Target/ARM/NEONPreAllocPass.cpp b/lib/Target/ARM/NEONPreAllocPass.cpp index 7334259bf574..ef6bf3a132bb 100644 --- a/lib/Target/ARM/NEONPreAllocPass.cpp +++ b/lib/Target/ARM/NEONPreAllocPass.cpp @@ -1,400 +1,428 @@ //===-- NEONPreAllocPass.cpp - Allocate adjacent NEON registers--*- C++ -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "neon-prealloc" #include "ARM.h" #include "ARMInstrInfo.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineInstrBuilder.h" +#include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/MachineFunctionPass.h" using namespace llvm; namespace { class NEONPreAllocPass : public MachineFunctionPass { const TargetInstrInfo *TII; + MachineRegisterInfo *MRI; public: static char ID; NEONPreAllocPass() : MachineFunctionPass(&ID) {} virtual bool runOnMachineFunction(MachineFunction &MF); virtual const char *getPassName() const { return "NEON register pre-allocation pass"; } private: + bool FormsRegSequence(MachineInstr *MI, + unsigned FirstOpnd, unsigned NumRegs); bool PreAllocNEONRegisters(MachineBasicBlock &MBB); }; char NEONPreAllocPass::ID = 0; } static bool isNEONMultiRegOp(int Opcode, unsigned &FirstOpnd, unsigned &NumRegs, unsigned &Offset, unsigned &Stride) { // Default to unit stride with no offset. Stride = 1; Offset = 0; switch (Opcode) { default: break; case ARM::VLD1q8: case ARM::VLD1q16: case ARM::VLD1q32: case ARM::VLD1q64: case ARM::VLD2d8: case ARM::VLD2d16: case ARM::VLD2d32: case ARM::VLD2LNd8: case ARM::VLD2LNd16: case ARM::VLD2LNd32: FirstOpnd = 0; NumRegs = 2; return true; case ARM::VLD2q8: case ARM::VLD2q16: case ARM::VLD2q32: FirstOpnd = 0; NumRegs = 4; return true; case ARM::VLD2LNq16: case ARM::VLD2LNq32: FirstOpnd = 0; NumRegs = 2; Offset = 0; Stride = 2; return true; case ARM::VLD2LNq16odd: case ARM::VLD2LNq32odd: FirstOpnd = 0; NumRegs = 2; Offset = 1; Stride = 2; return true; case ARM::VLD3d8: case ARM::VLD3d16: case ARM::VLD3d32: case ARM::VLD1d64T: case ARM::VLD3LNd8: case ARM::VLD3LNd16: case ARM::VLD3LNd32: FirstOpnd = 0; NumRegs = 3; return true; case ARM::VLD3q8_UPD: case ARM::VLD3q16_UPD: case ARM::VLD3q32_UPD: FirstOpnd = 0; NumRegs = 3; Offset = 0; Stride = 2; return true; case ARM::VLD3q8odd_UPD: case ARM::VLD3q16odd_UPD: case ARM::VLD3q32odd_UPD: FirstOpnd = 0; NumRegs = 3; Offset = 1; Stride = 2; return true; case ARM::VLD3LNq16: case ARM::VLD3LNq32: FirstOpnd = 0; NumRegs = 3; Offset = 0; Stride = 2; return true; case ARM::VLD3LNq16odd: case ARM::VLD3LNq32odd: FirstOpnd = 0; NumRegs = 3; Offset = 1; Stride = 2; return true; case ARM::VLD4d8: case ARM::VLD4d16: case ARM::VLD4d32: case ARM::VLD1d64Q: case ARM::VLD4LNd8: case ARM::VLD4LNd16: case ARM::VLD4LNd32: FirstOpnd = 0; NumRegs = 4; return true; case ARM::VLD4q8_UPD: case ARM::VLD4q16_UPD: case ARM::VLD4q32_UPD: FirstOpnd = 0; NumRegs = 4; Offset = 0; Stride = 2; return true; case ARM::VLD4q8odd_UPD: case ARM::VLD4q16odd_UPD: case ARM::VLD4q32odd_UPD: FirstOpnd = 0; NumRegs = 4; Offset = 1; Stride = 2; return true; case ARM::VLD4LNq16: case ARM::VLD4LNq32: FirstOpnd = 0; NumRegs = 4; Offset = 0; Stride = 2; return true; case ARM::VLD4LNq16odd: case ARM::VLD4LNq32odd: FirstOpnd = 0; NumRegs = 4; Offset = 1; Stride = 2; return true; case ARM::VST1q8: case ARM::VST1q16: case ARM::VST1q32: case ARM::VST1q64: case ARM::VST2d8: case ARM::VST2d16: case ARM::VST2d32: case ARM::VST2LNd8: case ARM::VST2LNd16: case ARM::VST2LNd32: FirstOpnd = 2; NumRegs = 2; return true; case ARM::VST2q8: case ARM::VST2q16: case ARM::VST2q32: FirstOpnd = 2; NumRegs = 4; return true; case ARM::VST2LNq16: case ARM::VST2LNq32: FirstOpnd = 2; NumRegs = 2; Offset = 0; Stride = 2; return true; case ARM::VST2LNq16odd: case ARM::VST2LNq32odd: FirstOpnd = 2; NumRegs = 2; Offset = 1; Stride = 2; return true; case ARM::VST3d8: case ARM::VST3d16: case ARM::VST3d32: case ARM::VST1d64T: case ARM::VST3LNd8: case ARM::VST3LNd16: case ARM::VST3LNd32: FirstOpnd = 2; NumRegs = 3; return true; case ARM::VST3q8_UPD: case ARM::VST3q16_UPD: case ARM::VST3q32_UPD: FirstOpnd = 4; NumRegs = 3; Offset = 0; Stride = 2; return true; case ARM::VST3q8odd_UPD: case ARM::VST3q16odd_UPD: case ARM::VST3q32odd_UPD: FirstOpnd = 4; NumRegs = 3; Offset = 1; Stride = 2; return true; case ARM::VST3LNq16: case ARM::VST3LNq32: FirstOpnd = 2; NumRegs = 3; Offset = 0; Stride = 2; return true; case ARM::VST3LNq16odd: case ARM::VST3LNq32odd: FirstOpnd = 2; NumRegs = 3; Offset = 1; Stride = 2; return true; case ARM::VST4d8: case ARM::VST4d16: case ARM::VST4d32: case ARM::VST1d64Q: case ARM::VST4LNd8: case ARM::VST4LNd16: case ARM::VST4LNd32: FirstOpnd = 2; NumRegs = 4; return true; case ARM::VST4q8_UPD: case ARM::VST4q16_UPD: case ARM::VST4q32_UPD: FirstOpnd = 4; NumRegs = 4; Offset = 0; Stride = 2; return true; case ARM::VST4q8odd_UPD: case ARM::VST4q16odd_UPD: case ARM::VST4q32odd_UPD: FirstOpnd = 4; NumRegs = 4; Offset = 1; Stride = 2; return true; case ARM::VST4LNq16: case ARM::VST4LNq32: FirstOpnd = 2; NumRegs = 4; Offset = 0; Stride = 2; return true; case ARM::VST4LNq16odd: case ARM::VST4LNq32odd: FirstOpnd = 2; NumRegs = 4; Offset = 1; Stride = 2; return true; case ARM::VTBL2: FirstOpnd = 1; NumRegs = 2; return true; case ARM::VTBL3: FirstOpnd = 1; NumRegs = 3; return true; case ARM::VTBL4: FirstOpnd = 1; NumRegs = 4; return true; case ARM::VTBX2: FirstOpnd = 2; NumRegs = 2; return true; case ARM::VTBX3: FirstOpnd = 2; NumRegs = 3; return true; case ARM::VTBX4: FirstOpnd = 2; NumRegs = 4; return true; } return false; } +bool NEONPreAllocPass::FormsRegSequence(MachineInstr *MI, + unsigned FirstOpnd, unsigned NumRegs) { + MachineInstr *RegSeq = 0; + for (unsigned R = 0; R < NumRegs; ++R) { + MachineOperand &MO = MI->getOperand(FirstOpnd + R); + assert(MO.isReg() && MO.getSubReg() == 0 && "unexpected operand"); + unsigned VirtReg = MO.getReg(); + assert(TargetRegisterInfo::isVirtualRegister(VirtReg) && + "expected a virtual register"); + if (!MRI->hasOneNonDBGUse(VirtReg)) + return false; + MachineInstr *UseMI = &*MRI->use_nodbg_begin(VirtReg); + if (UseMI->getOpcode() != TargetOpcode::REG_SEQUENCE) + return false; + if (RegSeq && RegSeq != UseMI) + return false; + RegSeq = UseMI; + } + return true; +} + bool NEONPreAllocPass::PreAllocNEONRegisters(MachineBasicBlock &MBB) { bool Modified = false; MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); for (; MBBI != E; ++MBBI) { MachineInstr *MI = &*MBBI; unsigned FirstOpnd, NumRegs, Offset, Stride; if (!isNEONMultiRegOp(MI->getOpcode(), FirstOpnd, NumRegs, Offset, Stride)) continue; + if (FormsRegSequence(MI, FirstOpnd, NumRegs)) + continue; MachineBasicBlock::iterator NextI = llvm::next(MBBI); for (unsigned R = 0; R < NumRegs; ++R) { MachineOperand &MO = MI->getOperand(FirstOpnd + R); assert(MO.isReg() && MO.getSubReg() == 0 && "unexpected operand"); unsigned VirtReg = MO.getReg(); assert(TargetRegisterInfo::isVirtualRegister(VirtReg) && "expected a virtual register"); // For now, just assign a fixed set of adjacent registers. // This leaves plenty of room for future improvements. static const unsigned NEONDRegs[] = { ARM::D0, ARM::D1, ARM::D2, ARM::D3, ARM::D4, ARM::D5, ARM::D6, ARM::D7 }; MO.setReg(NEONDRegs[Offset + R * Stride]); if (MO.isUse()) { // Insert a copy from VirtReg. TII->copyRegToReg(MBB, MBBI, MO.getReg(), VirtReg, ARM::DPRRegisterClass, ARM::DPRRegisterClass); if (MO.isKill()) { MachineInstr *CopyMI = prior(MBBI); CopyMI->findRegisterUseOperand(VirtReg)->setIsKill(); } MO.setIsKill(); } else if (MO.isDef() && !MO.isDead()) { // Add a copy to VirtReg. TII->copyRegToReg(MBB, NextI, VirtReg, MO.getReg(), ARM::DPRRegisterClass, ARM::DPRRegisterClass); } } } return Modified; } bool NEONPreAllocPass::runOnMachineFunction(MachineFunction &MF) { TII = MF.getTarget().getInstrInfo(); + MRI = &MF.getRegInfo(); bool Modified = false; for (MachineFunction::iterator MFI = MF.begin(), E = MF.end(); MFI != E; ++MFI) { MachineBasicBlock &MBB = *MFI; Modified |= PreAllocNEONRegisters(MBB); } return Modified; } /// createNEONPreAllocPass - returns an instance of the NEON register /// pre-allocation pass. FunctionPass *llvm::createNEONPreAllocPass() { return new NEONPreAllocPass(); } diff --git a/lib/Target/CellSPU/SPUISelDAGToDAG.cpp b/lib/Target/CellSPU/SPUISelDAGToDAG.cpp index c3c2b3947e06..9afdb2b97f30 100644 --- a/lib/Target/CellSPU/SPUISelDAGToDAG.cpp +++ b/lib/Target/CellSPU/SPUISelDAGToDAG.cpp @@ -1,1257 +1,1265 @@ //===-- SPUISelDAGToDAG.cpp - CellSPU pattern matching inst selector ------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file defines a pattern matching instruction selector for the Cell SPU, // converting from a legalized dag to a SPU-target dag. // //===----------------------------------------------------------------------===// #include "SPU.h" #include "SPUTargetMachine.h" #include "SPUHazardRecognizers.h" #include "SPUFrameInfo.h" #include "SPURegisterNames.h" #include "SPUTargetMachine.h" #include "llvm/CodeGen/MachineConstantPool.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/SelectionDAG.h" #include "llvm/CodeGen/SelectionDAGISel.h" #include "llvm/CodeGen/PseudoSourceValue.h" #include "llvm/Target/TargetOptions.h" #include "llvm/ADT/Statistic.h" #include "llvm/Constants.h" #include "llvm/GlobalValue.h" #include "llvm/Intrinsics.h" #include "llvm/LLVMContext.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/raw_ostream.h" using namespace llvm; namespace { //! ConstantSDNode predicate for i32 sign-extended, 10-bit immediates bool isI64IntS10Immediate(ConstantSDNode *CN) { return isInt<10>(CN->getSExtValue()); } //! ConstantSDNode predicate for i32 sign-extended, 10-bit immediates bool isI32IntS10Immediate(ConstantSDNode *CN) { return isInt<10>(CN->getSExtValue()); } //! ConstantSDNode predicate for i32 unsigned 10-bit immediate values bool isI32IntU10Immediate(ConstantSDNode *CN) { return isUInt<10>(CN->getSExtValue()); } //! ConstantSDNode predicate for i16 sign-extended, 10-bit immediate values bool isI16IntS10Immediate(ConstantSDNode *CN) { return isInt<10>(CN->getSExtValue()); } //! SDNode predicate for i16 sign-extended, 10-bit immediate values bool isI16IntS10Immediate(SDNode *N) { ConstantSDNode *CN = dyn_cast(N); return (CN != 0 && isI16IntS10Immediate(CN)); } //! ConstantSDNode predicate for i16 unsigned 10-bit immediate values bool isI16IntU10Immediate(ConstantSDNode *CN) { return isUInt<10>((short) CN->getZExtValue()); } //! SDNode predicate for i16 sign-extended, 10-bit immediate values bool isI16IntU10Immediate(SDNode *N) { return (N->getOpcode() == ISD::Constant && isI16IntU10Immediate(cast(N))); } //! ConstantSDNode predicate for signed 16-bit values /*! \arg CN The constant SelectionDAG node holding the value \arg Imm The returned 16-bit value, if returning true This predicate tests the value in \a CN to see whether it can be represented as a 16-bit, sign-extended quantity. Returns true if this is the case. */ bool isIntS16Immediate(ConstantSDNode *CN, short &Imm) { EVT vt = CN->getValueType(0); Imm = (short) CN->getZExtValue(); if (vt.getSimpleVT() >= MVT::i1 && vt.getSimpleVT() <= MVT::i16) { return true; } else if (vt == MVT::i32) { int32_t i_val = (int32_t) CN->getZExtValue(); short s_val = (short) i_val; return i_val == s_val; } else { int64_t i_val = (int64_t) CN->getZExtValue(); short s_val = (short) i_val; return i_val == s_val; } return false; } //! SDNode predicate for signed 16-bit values. bool isIntS16Immediate(SDNode *N, short &Imm) { return (N->getOpcode() == ISD::Constant && isIntS16Immediate(cast(N), Imm)); } //! ConstantFPSDNode predicate for representing floats as 16-bit sign ext. static bool isFPS16Immediate(ConstantFPSDNode *FPN, short &Imm) { EVT vt = FPN->getValueType(0); if (vt == MVT::f32) { int val = FloatToBits(FPN->getValueAPF().convertToFloat()); int sval = (int) ((val << 16) >> 16); Imm = (short) val; return val == sval; } return false; } bool isHighLow(const SDValue &Op) { return (Op.getOpcode() == SPUISD::IndirectAddr && ((Op.getOperand(0).getOpcode() == SPUISD::Hi && Op.getOperand(1).getOpcode() == SPUISD::Lo) || (Op.getOperand(0).getOpcode() == SPUISD::Lo && Op.getOperand(1).getOpcode() == SPUISD::Hi))); } //===------------------------------------------------------------------===// //! EVT to "useful stuff" mapping structure: struct valtype_map_s { EVT VT; unsigned ldresult_ins; /// LDRESULT instruction (0 = undefined) bool ldresult_imm; /// LDRESULT instruction requires immediate? unsigned lrinst; /// LR instruction }; const valtype_map_s valtype_map[] = { { MVT::i8, SPU::ORBIr8, true, SPU::LRr8 }, { MVT::i16, SPU::ORHIr16, true, SPU::LRr16 }, { MVT::i32, SPU::ORIr32, true, SPU::LRr32 }, { MVT::i64, SPU::ORr64, false, SPU::LRr64 }, { MVT::f32, SPU::ORf32, false, SPU::LRf32 }, { MVT::f64, SPU::ORf64, false, SPU::LRf64 }, // vector types... (sigh!) { MVT::v16i8, 0, false, SPU::LRv16i8 }, { MVT::v8i16, 0, false, SPU::LRv8i16 }, { MVT::v4i32, 0, false, SPU::LRv4i32 }, { MVT::v2i64, 0, false, SPU::LRv2i64 }, { MVT::v4f32, 0, false, SPU::LRv4f32 }, { MVT::v2f64, 0, false, SPU::LRv2f64 } }; const size_t n_valtype_map = sizeof(valtype_map) / sizeof(valtype_map[0]); const valtype_map_s *getValueTypeMapEntry(EVT VT) { const valtype_map_s *retval = 0; for (size_t i = 0; i < n_valtype_map; ++i) { if (valtype_map[i].VT == VT) { retval = valtype_map + i; break; } } #ifndef NDEBUG if (retval == 0) { report_fatal_error("SPUISelDAGToDAG.cpp: getValueTypeMapEntry returns" "NULL for " + Twine(VT.getEVTString())); } #endif return retval; } //! Generate the carry-generate shuffle mask. SDValue getCarryGenerateShufMask(SelectionDAG &DAG, DebugLoc dl) { SmallVector ShufBytes; // Create the shuffle mask for "rotating" the borrow up one register slot // once the borrow is generated. ShufBytes.push_back(DAG.getConstant(0x04050607, MVT::i32)); ShufBytes.push_back(DAG.getConstant(0x80808080, MVT::i32)); ShufBytes.push_back(DAG.getConstant(0x0c0d0e0f, MVT::i32)); ShufBytes.push_back(DAG.getConstant(0x80808080, MVT::i32)); return DAG.getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, &ShufBytes[0], ShufBytes.size()); } //! Generate the borrow-generate shuffle mask SDValue getBorrowGenerateShufMask(SelectionDAG &DAG, DebugLoc dl) { SmallVector ShufBytes; // Create the shuffle mask for "rotating" the borrow up one register slot // once the borrow is generated. ShufBytes.push_back(DAG.getConstant(0x04050607, MVT::i32)); ShufBytes.push_back(DAG.getConstant(0xc0c0c0c0, MVT::i32)); ShufBytes.push_back(DAG.getConstant(0x0c0d0e0f, MVT::i32)); ShufBytes.push_back(DAG.getConstant(0xc0c0c0c0, MVT::i32)); return DAG.getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, &ShufBytes[0], ShufBytes.size()); } //===------------------------------------------------------------------===// /// SPUDAGToDAGISel - Cell SPU-specific code to select SPU machine /// instructions for SelectionDAG operations. /// class SPUDAGToDAGISel : public SelectionDAGISel { const SPUTargetMachine &TM; const SPUTargetLowering &SPUtli; unsigned GlobalBaseReg; public: explicit SPUDAGToDAGISel(SPUTargetMachine &tm) : SelectionDAGISel(tm), TM(tm), SPUtli(*tm.getTargetLowering()) { } virtual bool runOnMachineFunction(MachineFunction &MF) { // Make sure we re-emit a set of the global base reg if necessary GlobalBaseReg = 0; SelectionDAGISel::runOnMachineFunction(MF); return true; } /// getI32Imm - Return a target constant with the specified value, of type /// i32. inline SDValue getI32Imm(uint32_t Imm) { return CurDAG->getTargetConstant(Imm, MVT::i32); } /// getI64Imm - Return a target constant with the specified value, of type /// i64. inline SDValue getI64Imm(uint64_t Imm) { return CurDAG->getTargetConstant(Imm, MVT::i64); } /// getSmallIPtrImm - Return a target constant of pointer type. inline SDValue getSmallIPtrImm(unsigned Imm) { return CurDAG->getTargetConstant(Imm, SPUtli.getPointerTy()); } SDNode *emitBuildVector(SDNode *bvNode) { EVT vecVT = bvNode->getValueType(0); EVT eltVT = vecVT.getVectorElementType(); DebugLoc dl = bvNode->getDebugLoc(); // Check to see if this vector can be represented as a CellSPU immediate // constant by invoking all of the instruction selection predicates: if (((vecVT == MVT::v8i16) && (SPU::get_vec_i16imm(bvNode, *CurDAG, MVT::i16).getNode() != 0)) || ((vecVT == MVT::v4i32) && ((SPU::get_vec_i16imm(bvNode, *CurDAG, MVT::i32).getNode() != 0) || (SPU::get_ILHUvec_imm(bvNode, *CurDAG, MVT::i32).getNode() != 0) || (SPU::get_vec_u18imm(bvNode, *CurDAG, MVT::i32).getNode() != 0) || (SPU::get_v4i32_imm(bvNode, *CurDAG).getNode() != 0))) || ((vecVT == MVT::v2i64) && ((SPU::get_vec_i16imm(bvNode, *CurDAG, MVT::i64).getNode() != 0) || (SPU::get_ILHUvec_imm(bvNode, *CurDAG, MVT::i64).getNode() != 0) || (SPU::get_vec_u18imm(bvNode, *CurDAG, MVT::i64).getNode() != 0)))) { HandleSDNode Dummy(SDValue(bvNode, 0)); if (SDNode *N = Select(bvNode)) return N; return Dummy.getValue().getNode(); } // No, need to emit a constant pool spill: std::vector CV; for (size_t i = 0; i < bvNode->getNumOperands(); ++i) { ConstantSDNode *V = cast (bvNode->getOperand(i)); CV.push_back(const_cast(V->getConstantIntValue())); } const Constant *CP = ConstantVector::get(CV); SDValue CPIdx = CurDAG->getConstantPool(CP, SPUtli.getPointerTy()); unsigned Alignment = cast(CPIdx)->getAlignment(); SDValue CGPoolOffset = SPU::LowerConstantPool(CPIdx, *CurDAG, TM); HandleSDNode Dummy(CurDAG->getLoad(vecVT, dl, CurDAG->getEntryNode(), CGPoolOffset, PseudoSourceValue::getConstantPool(),0, false, false, Alignment)); CurDAG->ReplaceAllUsesWith(SDValue(bvNode, 0), Dummy.getValue()); if (SDNode *N = SelectCode(Dummy.getValue().getNode())) return N; return Dummy.getValue().getNode(); } /// Select - Convert the specified operand from a target-independent to a /// target-specific node if it hasn't already been changed. SDNode *Select(SDNode *N); //! Emit the instruction sequence for i64 shl SDNode *SelectSHLi64(SDNode *N, EVT OpVT); //! Emit the instruction sequence for i64 srl SDNode *SelectSRLi64(SDNode *N, EVT OpVT); //! Emit the instruction sequence for i64 sra SDNode *SelectSRAi64(SDNode *N, EVT OpVT); //! Emit the necessary sequence for loading i64 constants: SDNode *SelectI64Constant(SDNode *N, EVT OpVT, DebugLoc dl); //! Alternate instruction emit sequence for loading i64 constants SDNode *SelectI64Constant(uint64_t i64const, EVT OpVT, DebugLoc dl); //! Returns true if the address N is an A-form (local store) address bool SelectAFormAddr(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index); //! D-form address predicate bool SelectDFormAddr(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index); /// Alternate D-form address using i7 offset predicate bool SelectDForm2Addr(SDNode *Op, SDValue N, SDValue &Disp, SDValue &Base); /// D-form address selection workhorse bool DFormAddressPredicate(SDNode *Op, SDValue N, SDValue &Disp, SDValue &Base, int minOffset, int maxOffset); //! Address predicate if N can be expressed as an indexed [r+r] operation. bool SelectXFormAddr(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index); /// SelectInlineAsmMemoryOperand - Implement addressing mode selection for /// inline asm expressions. virtual bool SelectInlineAsmMemoryOperand(const SDValue &Op, char ConstraintCode, std::vector &OutOps) { SDValue Op0, Op1; switch (ConstraintCode) { default: return true; case 'm': // memory if (!SelectDFormAddr(Op.getNode(), Op, Op0, Op1) && !SelectAFormAddr(Op.getNode(), Op, Op0, Op1)) SelectXFormAddr(Op.getNode(), Op, Op0, Op1); break; case 'o': // offsetable if (!SelectDFormAddr(Op.getNode(), Op, Op0, Op1) && !SelectAFormAddr(Op.getNode(), Op, Op0, Op1)) { Op0 = Op; Op1 = getSmallIPtrImm(0); } break; case 'v': // not offsetable #if 1 llvm_unreachable("InlineAsmMemoryOperand 'v' constraint not handled."); #else SelectAddrIdxOnly(Op, Op, Op0, Op1); #endif break; } OutOps.push_back(Op0); OutOps.push_back(Op1); return false; } virtual const char *getPassName() const { return "Cell SPU DAG->DAG Pattern Instruction Selection"; } /// CreateTargetHazardRecognizer - Return the hazard recognizer to use for /// this target when scheduling the DAG. virtual ScheduleHazardRecognizer *CreateTargetHazardRecognizer() { const TargetInstrInfo *II = TM.getInstrInfo(); assert(II && "No InstrInfo?"); return new SPUHazardRecognizer(*II); } // Include the pieces autogenerated from the target description. #include "SPUGenDAGISel.inc" }; } /*! \arg Op The ISD instruction operand \arg N The address to be tested \arg Base The base address \arg Index The base address index */ bool SPUDAGToDAGISel::SelectAFormAddr(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index) { // These match the addr256k operand type: EVT OffsVT = MVT::i16; SDValue Zero = CurDAG->getTargetConstant(0, OffsVT); switch (N.getOpcode()) { case ISD::Constant: case ISD::ConstantPool: case ISD::GlobalAddress: report_fatal_error("SPU SelectAFormAddr: Constant/Pool/Global not lowered."); /*NOTREACHED*/ case ISD::TargetConstant: case ISD::TargetGlobalAddress: case ISD::TargetJumpTable: report_fatal_error("SPUSelectAFormAddr: Target Constant/Pool/Global " "not wrapped as A-form address."); /*NOTREACHED*/ case SPUISD::AFormAddr: // Just load from memory if there's only a single use of the location, // otherwise, this will get handled below with D-form offset addresses if (N.hasOneUse()) { SDValue Op0 = N.getOperand(0); switch (Op0.getOpcode()) { case ISD::TargetConstantPool: case ISD::TargetJumpTable: Base = Op0; Index = Zero; return true; case ISD::TargetGlobalAddress: { GlobalAddressSDNode *GSDN = cast(Op0); const GlobalValue *GV = GSDN->getGlobal(); if (GV->getAlignment() == 16) { Base = Op0; Index = Zero; return true; } break; } } } break; } return false; } bool SPUDAGToDAGISel::SelectDForm2Addr(SDNode *Op, SDValue N, SDValue &Disp, SDValue &Base) { const int minDForm2Offset = -(1 << 7); const int maxDForm2Offset = (1 << 7) - 1; return DFormAddressPredicate(Op, N, Disp, Base, minDForm2Offset, maxDForm2Offset); } /*! \arg Op The ISD instruction (ignored) \arg N The address to be tested \arg Base Base address register/pointer \arg Index Base address index Examine the input address by a base register plus a signed 10-bit displacement, [r+I10] (D-form address). \return true if \a N is a D-form address with \a Base and \a Index set to non-empty SDValue instances. */ bool SPUDAGToDAGISel::SelectDFormAddr(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index) { return DFormAddressPredicate(Op, N, Base, Index, SPUFrameInfo::minFrameOffset(), SPUFrameInfo::maxFrameOffset()); } bool SPUDAGToDAGISel::DFormAddressPredicate(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index, int minOffset, int maxOffset) { unsigned Opc = N.getOpcode(); EVT PtrTy = SPUtli.getPointerTy(); if (Opc == ISD::FrameIndex) { // Stack frame index must be less than 512 (divided by 16): FrameIndexSDNode *FIN = cast(N); int FI = int(FIN->getIndex()); DEBUG(errs() << "SelectDFormAddr: ISD::FrameIndex = " << FI << "\n"); if (SPUFrameInfo::FItoStackOffset(FI) < maxOffset) { Base = CurDAG->getTargetConstant(0, PtrTy); Index = CurDAG->getTargetFrameIndex(FI, PtrTy); return true; } } else if (Opc == ISD::ADD) { // Generated by getelementptr const SDValue Op0 = N.getOperand(0); const SDValue Op1 = N.getOperand(1); if ((Op0.getOpcode() == SPUISD::Hi && Op1.getOpcode() == SPUISD::Lo) || (Op1.getOpcode() == SPUISD::Hi && Op0.getOpcode() == SPUISD::Lo)) { Base = CurDAG->getTargetConstant(0, PtrTy); Index = N; return true; } else if (Op1.getOpcode() == ISD::Constant || Op1.getOpcode() == ISD::TargetConstant) { ConstantSDNode *CN = cast(Op1); int32_t offset = int32_t(CN->getSExtValue()); if (Op0.getOpcode() == ISD::FrameIndex) { FrameIndexSDNode *FIN = cast(Op0); int FI = int(FIN->getIndex()); DEBUG(errs() << "SelectDFormAddr: ISD::ADD offset = " << offset << " frame index = " << FI << "\n"); if (SPUFrameInfo::FItoStackOffset(FI) < maxOffset) { Base = CurDAG->getTargetConstant(offset, PtrTy); Index = CurDAG->getTargetFrameIndex(FI, PtrTy); return true; } } else if (offset > minOffset && offset < maxOffset) { Base = CurDAG->getTargetConstant(offset, PtrTy); Index = Op0; return true; } } else if (Op0.getOpcode() == ISD::Constant || Op0.getOpcode() == ISD::TargetConstant) { ConstantSDNode *CN = cast(Op0); int32_t offset = int32_t(CN->getSExtValue()); if (Op1.getOpcode() == ISD::FrameIndex) { FrameIndexSDNode *FIN = cast(Op1); int FI = int(FIN->getIndex()); DEBUG(errs() << "SelectDFormAddr: ISD::ADD offset = " << offset << " frame index = " << FI << "\n"); if (SPUFrameInfo::FItoStackOffset(FI) < maxOffset) { Base = CurDAG->getTargetConstant(offset, PtrTy); Index = CurDAG->getTargetFrameIndex(FI, PtrTy); return true; } } else if (offset > minOffset && offset < maxOffset) { Base = CurDAG->getTargetConstant(offset, PtrTy); Index = Op1; return true; } } } else if (Opc == SPUISD::IndirectAddr) { // Indirect with constant offset -> D-Form address const SDValue Op0 = N.getOperand(0); const SDValue Op1 = N.getOperand(1); if (Op0.getOpcode() == SPUISD::Hi && Op1.getOpcode() == SPUISD::Lo) { // (SPUindirect (SPUhi , 0), (SPUlo , 0)) Base = CurDAG->getTargetConstant(0, PtrTy); Index = N; return true; } else if (isa(Op0) || isa(Op1)) { int32_t offset = 0; SDValue idxOp; if (isa(Op1)) { ConstantSDNode *CN = cast(Op1); offset = int32_t(CN->getSExtValue()); idxOp = Op0; } else if (isa(Op0)) { ConstantSDNode *CN = cast(Op0); offset = int32_t(CN->getSExtValue()); idxOp = Op1; } if (offset >= minOffset && offset <= maxOffset) { Base = CurDAG->getTargetConstant(offset, PtrTy); Index = idxOp; return true; } } } else if (Opc == SPUISD::AFormAddr) { Base = CurDAG->getTargetConstant(0, N.getValueType()); Index = N; return true; } else if (Opc == SPUISD::LDRESULT) { Base = CurDAG->getTargetConstant(0, N.getValueType()); Index = N; return true; } else if (Opc == ISD::Register || Opc == ISD::CopyFromReg) { unsigned OpOpc = Op->getOpcode(); if (OpOpc == ISD::STORE || OpOpc == ISD::LOAD) { // Direct load/store without getelementptr SDValue Addr, Offs; // Get the register from CopyFromReg if (Opc == ISD::CopyFromReg) Addr = N.getOperand(1); else Addr = N; // Register Offs = ((OpOpc == ISD::STORE) ? Op->getOperand(3) : Op->getOperand(2)); if (Offs.getOpcode() == ISD::Constant || Offs.getOpcode() == ISD::UNDEF) { if (Offs.getOpcode() == ISD::UNDEF) Offs = CurDAG->getTargetConstant(0, Offs.getValueType()); Base = Offs; Index = Addr; return true; } } else { /* If otherwise unadorned, default to D-form address with 0 offset: */ if (Opc == ISD::CopyFromReg) { Index = N.getOperand(1); } else { Index = N; } Base = CurDAG->getTargetConstant(0, Index.getValueType()); return true; } } return false; } /*! \arg Op The ISD instruction operand \arg N The address operand \arg Base The base pointer operand \arg Index The offset/index operand If the address \a N can be expressed as an A-form or D-form address, returns false. Otherwise, creates two operands, Base and Index that will become the (r)(r) X-form address. */ bool SPUDAGToDAGISel::SelectXFormAddr(SDNode *Op, SDValue N, SDValue &Base, SDValue &Index) { if (!SelectAFormAddr(Op, N, Base, Index) && !SelectDFormAddr(Op, N, Base, Index)) { // If the address is neither A-form or D-form, punt and use an X-form // address: Base = N.getOperand(1); Index = N.getOperand(0); return true; } return false; } //! Convert the operand from a target-independent to a target-specific node /*! */ SDNode * SPUDAGToDAGISel::Select(SDNode *N) { unsigned Opc = N->getOpcode(); int n_ops = -1; unsigned NewOpc; EVT OpVT = N->getValueType(0); SDValue Ops[8]; DebugLoc dl = N->getDebugLoc(); if (N->isMachineOpcode()) return NULL; // Already selected. if (Opc == ISD::FrameIndex) { int FI = cast(N)->getIndex(); SDValue TFI = CurDAG->getTargetFrameIndex(FI, N->getValueType(0)); SDValue Imm0 = CurDAG->getTargetConstant(0, N->getValueType(0)); if (FI < 128) { NewOpc = SPU::AIr32; Ops[0] = TFI; Ops[1] = Imm0; n_ops = 2; } else { NewOpc = SPU::Ar32; Ops[0] = CurDAG->getRegister(SPU::R1, N->getValueType(0)); Ops[1] = SDValue(CurDAG->getMachineNode(SPU::ILAr32, dl, N->getValueType(0), TFI, Imm0), 0); n_ops = 2; } } else if (Opc == ISD::Constant && OpVT == MVT::i64) { // Catch the i64 constants that end up here. Note: The backend doesn't // attempt to legalize the constant (it's useless because DAGCombiner // will insert 64-bit constants and we can't stop it). return SelectI64Constant(N, OpVT, N->getDebugLoc()); } else if ((Opc == ISD::ZERO_EXTEND || Opc == ISD::ANY_EXTEND) && OpVT == MVT::i64) { SDValue Op0 = N->getOperand(0); EVT Op0VT = Op0.getValueType(); EVT Op0VecVT = EVT::getVectorVT(*CurDAG->getContext(), Op0VT, (128 / Op0VT.getSizeInBits())); EVT OpVecVT = EVT::getVectorVT(*CurDAG->getContext(), OpVT, (128 / OpVT.getSizeInBits())); SDValue shufMask; switch (Op0VT.getSimpleVT().SimpleTy) { default: report_fatal_error("CellSPU Select: Unhandled zero/any extend EVT"); /*NOTREACHED*/ case MVT::i32: shufMask = CurDAG->getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, CurDAG->getConstant(0x80808080, MVT::i32), CurDAG->getConstant(0x00010203, MVT::i32), CurDAG->getConstant(0x80808080, MVT::i32), CurDAG->getConstant(0x08090a0b, MVT::i32)); break; case MVT::i16: shufMask = CurDAG->getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, CurDAG->getConstant(0x80808080, MVT::i32), CurDAG->getConstant(0x80800203, MVT::i32), CurDAG->getConstant(0x80808080, MVT::i32), CurDAG->getConstant(0x80800a0b, MVT::i32)); break; case MVT::i8: shufMask = CurDAG->getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, CurDAG->getConstant(0x80808080, MVT::i32), CurDAG->getConstant(0x80808003, MVT::i32), CurDAG->getConstant(0x80808080, MVT::i32), CurDAG->getConstant(0x8080800b, MVT::i32)); break; } SDNode *shufMaskLoad = emitBuildVector(shufMask.getNode()); HandleSDNode PromoteScalar(CurDAG->getNode(SPUISD::PREFSLOT2VEC, dl, Op0VecVT, Op0)); SDValue PromScalar; if (SDNode *N = SelectCode(PromoteScalar.getValue().getNode())) PromScalar = SDValue(N, 0); else PromScalar = PromoteScalar.getValue(); SDValue zextShuffle = CurDAG->getNode(SPUISD::SHUFB, dl, OpVecVT, PromScalar, PromScalar, SDValue(shufMaskLoad, 0)); HandleSDNode Dummy2(zextShuffle); if (SDNode *N = SelectCode(Dummy2.getValue().getNode())) zextShuffle = SDValue(N, 0); else zextShuffle = Dummy2.getValue(); HandleSDNode Dummy(CurDAG->getNode(SPUISD::VEC2PREFSLOT, dl, OpVT, zextShuffle)); CurDAG->ReplaceAllUsesWith(N, Dummy.getValue().getNode()); SelectCode(Dummy.getValue().getNode()); return Dummy.getValue().getNode(); } else if (Opc == ISD::ADD && (OpVT == MVT::i64 || OpVT == MVT::v2i64)) { SDNode *CGLoad = emitBuildVector(getCarryGenerateShufMask(*CurDAG, dl).getNode()); HandleSDNode Dummy(CurDAG->getNode(SPUISD::ADD64_MARKER, dl, OpVT, N->getOperand(0), N->getOperand(1), SDValue(CGLoad, 0))); CurDAG->ReplaceAllUsesWith(N, Dummy.getValue().getNode()); if (SDNode *N = SelectCode(Dummy.getValue().getNode())) return N; return Dummy.getValue().getNode(); } else if (Opc == ISD::SUB && (OpVT == MVT::i64 || OpVT == MVT::v2i64)) { SDNode *CGLoad = emitBuildVector(getBorrowGenerateShufMask(*CurDAG, dl).getNode()); HandleSDNode Dummy(CurDAG->getNode(SPUISD::SUB64_MARKER, dl, OpVT, N->getOperand(0), N->getOperand(1), SDValue(CGLoad, 0))); CurDAG->ReplaceAllUsesWith(N, Dummy.getValue().getNode()); if (SDNode *N = SelectCode(Dummy.getValue().getNode())) return N; return Dummy.getValue().getNode(); } else if (Opc == ISD::MUL && (OpVT == MVT::i64 || OpVT == MVT::v2i64)) { SDNode *CGLoad = emitBuildVector(getCarryGenerateShufMask(*CurDAG, dl).getNode()); HandleSDNode Dummy(CurDAG->getNode(SPUISD::MUL64_MARKER, dl, OpVT, N->getOperand(0), N->getOperand(1), SDValue(CGLoad, 0))); CurDAG->ReplaceAllUsesWith(N, Dummy.getValue().getNode()); if (SDNode *N = SelectCode(Dummy.getValue().getNode())) return N; return Dummy.getValue().getNode(); } else if (Opc == ISD::TRUNCATE) { SDValue Op0 = N->getOperand(0); if ((Op0.getOpcode() == ISD::SRA || Op0.getOpcode() == ISD::SRL) && OpVT == MVT::i32 && Op0.getValueType() == MVT::i64) { // Catch (truncate:i32 ([sra|srl]:i64 arg, c), where c >= 32 // // Take advantage of the fact that the upper 32 bits are in the // i32 preferred slot and avoid shuffle gymnastics: ConstantSDNode *CN = dyn_cast(Op0.getOperand(1)); if (CN != 0) { unsigned shift_amt = unsigned(CN->getZExtValue()); if (shift_amt >= 32) { SDNode *hi32 = CurDAG->getMachineNode(SPU::ORr32_r64, dl, OpVT, Op0.getOperand(0)); shift_amt -= 32; if (shift_amt > 0) { // Take care of the additional shift, if present: SDValue shift = CurDAG->getTargetConstant(shift_amt, MVT::i32); unsigned Opc = SPU::ROTMAIr32_i32; if (Op0.getOpcode() == ISD::SRL) Opc = SPU::ROTMr32; hi32 = CurDAG->getMachineNode(Opc, dl, OpVT, SDValue(hi32, 0), shift); } return hi32; } } } } else if (Opc == ISD::SHL) { if (OpVT == MVT::i64) return SelectSHLi64(N, OpVT); } else if (Opc == ISD::SRL) { if (OpVT == MVT::i64) return SelectSRLi64(N, OpVT); } else if (Opc == ISD::SRA) { if (OpVT == MVT::i64) return SelectSRAi64(N, OpVT); } else if (Opc == ISD::FNEG && (OpVT == MVT::f64 || OpVT == MVT::v2f64)) { DebugLoc dl = N->getDebugLoc(); // Check if the pattern is a special form of DFNMS: // (fneg (fsub (fmul R64FP:$rA, R64FP:$rB), R64FP:$rC)) SDValue Op0 = N->getOperand(0); if (Op0.getOpcode() == ISD::FSUB) { SDValue Op00 = Op0.getOperand(0); if (Op00.getOpcode() == ISD::FMUL) { unsigned Opc = SPU::DFNMSf64; if (OpVT == MVT::v2f64) Opc = SPU::DFNMSv2f64; return CurDAG->getMachineNode(Opc, dl, OpVT, Op00.getOperand(0), Op00.getOperand(1), Op0.getOperand(1)); } } SDValue negConst = CurDAG->getConstant(0x8000000000000000ULL, MVT::i64); SDNode *signMask = 0; unsigned Opc = SPU::XORfneg64; if (OpVT == MVT::f64) { signMask = SelectI64Constant(negConst.getNode(), MVT::i64, dl); } else if (OpVT == MVT::v2f64) { Opc = SPU::XORfnegvec; signMask = emitBuildVector(CurDAG->getNode(ISD::BUILD_VECTOR, dl, MVT::v2i64, negConst, negConst).getNode()); } return CurDAG->getMachineNode(Opc, dl, OpVT, N->getOperand(0), SDValue(signMask, 0)); } else if (Opc == ISD::FABS) { if (OpVT == MVT::f64) { SDNode *signMask = SelectI64Constant(0x7fffffffffffffffULL, MVT::i64, dl); return CurDAG->getMachineNode(SPU::ANDfabs64, dl, OpVT, N->getOperand(0), SDValue(signMask, 0)); } else if (OpVT == MVT::v2f64) { SDValue absConst = CurDAG->getConstant(0x7fffffffffffffffULL, MVT::i64); SDValue absVec = CurDAG->getNode(ISD::BUILD_VECTOR, dl, MVT::v2i64, absConst, absConst); SDNode *signMask = emitBuildVector(absVec.getNode()); return CurDAG->getMachineNode(SPU::ANDfabsvec, dl, OpVT, N->getOperand(0), SDValue(signMask, 0)); } } else if (Opc == SPUISD::LDRESULT) { // Custom select instructions for LDRESULT EVT VT = N->getValueType(0); SDValue Arg = N->getOperand(0); SDValue Chain = N->getOperand(1); SDNode *Result; const valtype_map_s *vtm = getValueTypeMapEntry(VT); if (vtm->ldresult_ins == 0) { report_fatal_error("LDRESULT for unsupported type: " + Twine(VT.getEVTString())); } Opc = vtm->ldresult_ins; if (vtm->ldresult_imm) { SDValue Zero = CurDAG->getTargetConstant(0, VT); Result = CurDAG->getMachineNode(Opc, dl, VT, MVT::Other, Arg, Zero, Chain); } else { Result = CurDAG->getMachineNode(Opc, dl, VT, MVT::Other, Arg, Arg, Chain); } return Result; } else if (Opc == SPUISD::IndirectAddr) { // Look at the operands: SelectCode() will catch the cases that aren't // specifically handled here. // // SPUInstrInfo catches the following patterns: // (SPUindirect (SPUhi ...), (SPUlo ...)) // (SPUindirect $sp, imm) EVT VT = N->getValueType(0); SDValue Op0 = N->getOperand(0); SDValue Op1 = N->getOperand(1); RegisterSDNode *RN; if ((Op0.getOpcode() != SPUISD::Hi && Op1.getOpcode() != SPUISD::Lo) || (Op0.getOpcode() == ISD::Register && ((RN = dyn_cast(Op0.getNode())) != 0 && RN->getReg() != SPU::R1))) { NewOpc = SPU::Ar32; + Ops[1] = Op1; if (Op1.getOpcode() == ISD::Constant) { ConstantSDNode *CN = cast(Op1); Op1 = CurDAG->getTargetConstant(CN->getSExtValue(), VT); - NewOpc = (isI32IntS10Immediate(CN) ? SPU::AIr32 : SPU::Ar32); + if (isInt<10>(CN->getSExtValue())) { + NewOpc = SPU::AIr32; + Ops[1] = Op1; + } else { + Ops[1] = SDValue(CurDAG->getMachineNode(SPU::ILr32, dl, + N->getValueType(0), + Op1), + 0); + } } Ops[0] = Op0; - Ops[1] = Op1; n_ops = 2; } } if (n_ops > 0) { if (N->hasOneUse()) return CurDAG->SelectNodeTo(N, NewOpc, OpVT, Ops, n_ops); else return CurDAG->getMachineNode(NewOpc, dl, OpVT, Ops, n_ops); } else return SelectCode(N); } /*! * Emit the instruction sequence for i64 left shifts. The basic algorithm * is to fill the bottom two word slots with zeros so that zeros are shifted * in as the entire quadword is shifted left. * * \note This code could also be used to implement v2i64 shl. * * @param Op The shl operand * @param OpVT Op's machine value value type (doesn't need to be passed, but * makes life easier.) * @return The SDNode with the entire instruction sequence */ SDNode * SPUDAGToDAGISel::SelectSHLi64(SDNode *N, EVT OpVT) { SDValue Op0 = N->getOperand(0); EVT VecVT = EVT::getVectorVT(*CurDAG->getContext(), OpVT, (128 / OpVT.getSizeInBits())); SDValue ShiftAmt = N->getOperand(1); EVT ShiftAmtVT = ShiftAmt.getValueType(); SDNode *VecOp0, *SelMask, *ZeroFill, *Shift = 0; SDValue SelMaskVal; DebugLoc dl = N->getDebugLoc(); VecOp0 = CurDAG->getMachineNode(SPU::ORv2i64_i64, dl, VecVT, Op0); SelMaskVal = CurDAG->getTargetConstant(0xff00ULL, MVT::i16); SelMask = CurDAG->getMachineNode(SPU::FSMBIv2i64, dl, VecVT, SelMaskVal); ZeroFill = CurDAG->getMachineNode(SPU::ILv2i64, dl, VecVT, CurDAG->getTargetConstant(0, OpVT)); VecOp0 = CurDAG->getMachineNode(SPU::SELBv2i64, dl, VecVT, SDValue(ZeroFill, 0), SDValue(VecOp0, 0), SDValue(SelMask, 0)); if (ConstantSDNode *CN = dyn_cast(ShiftAmt)) { unsigned bytes = unsigned(CN->getZExtValue()) >> 3; unsigned bits = unsigned(CN->getZExtValue()) & 7; if (bytes > 0) { Shift = CurDAG->getMachineNode(SPU::SHLQBYIv2i64, dl, VecVT, SDValue(VecOp0, 0), CurDAG->getTargetConstant(bytes, ShiftAmtVT)); } if (bits > 0) { Shift = CurDAG->getMachineNode(SPU::SHLQBIIv2i64, dl, VecVT, SDValue((Shift != 0 ? Shift : VecOp0), 0), CurDAG->getTargetConstant(bits, ShiftAmtVT)); } } else { SDNode *Bytes = CurDAG->getMachineNode(SPU::ROTMIr32, dl, ShiftAmtVT, ShiftAmt, CurDAG->getTargetConstant(3, ShiftAmtVT)); SDNode *Bits = CurDAG->getMachineNode(SPU::ANDIr32, dl, ShiftAmtVT, ShiftAmt, CurDAG->getTargetConstant(7, ShiftAmtVT)); Shift = CurDAG->getMachineNode(SPU::SHLQBYv2i64, dl, VecVT, SDValue(VecOp0, 0), SDValue(Bytes, 0)); Shift = CurDAG->getMachineNode(SPU::SHLQBIv2i64, dl, VecVT, SDValue(Shift, 0), SDValue(Bits, 0)); } return CurDAG->getMachineNode(SPU::ORi64_v2i64, dl, OpVT, SDValue(Shift, 0)); } /*! * Emit the instruction sequence for i64 logical right shifts. * * @param Op The shl operand * @param OpVT Op's machine value value type (doesn't need to be passed, but * makes life easier.) * @return The SDNode with the entire instruction sequence */ SDNode * SPUDAGToDAGISel::SelectSRLi64(SDNode *N, EVT OpVT) { SDValue Op0 = N->getOperand(0); EVT VecVT = EVT::getVectorVT(*CurDAG->getContext(), OpVT, (128 / OpVT.getSizeInBits())); SDValue ShiftAmt = N->getOperand(1); EVT ShiftAmtVT = ShiftAmt.getValueType(); SDNode *VecOp0, *Shift = 0; DebugLoc dl = N->getDebugLoc(); VecOp0 = CurDAG->getMachineNode(SPU::ORv2i64_i64, dl, VecVT, Op0); if (ConstantSDNode *CN = dyn_cast(ShiftAmt)) { unsigned bytes = unsigned(CN->getZExtValue()) >> 3; unsigned bits = unsigned(CN->getZExtValue()) & 7; if (bytes > 0) { Shift = CurDAG->getMachineNode(SPU::ROTQMBYIv2i64, dl, VecVT, SDValue(VecOp0, 0), CurDAG->getTargetConstant(bytes, ShiftAmtVT)); } if (bits > 0) { Shift = CurDAG->getMachineNode(SPU::ROTQMBIIv2i64, dl, VecVT, SDValue((Shift != 0 ? Shift : VecOp0), 0), CurDAG->getTargetConstant(bits, ShiftAmtVT)); } } else { SDNode *Bytes = CurDAG->getMachineNode(SPU::ROTMIr32, dl, ShiftAmtVT, ShiftAmt, CurDAG->getTargetConstant(3, ShiftAmtVT)); SDNode *Bits = CurDAG->getMachineNode(SPU::ANDIr32, dl, ShiftAmtVT, ShiftAmt, CurDAG->getTargetConstant(7, ShiftAmtVT)); // Ensure that the shift amounts are negated! Bytes = CurDAG->getMachineNode(SPU::SFIr32, dl, ShiftAmtVT, SDValue(Bytes, 0), CurDAG->getTargetConstant(0, ShiftAmtVT)); Bits = CurDAG->getMachineNode(SPU::SFIr32, dl, ShiftAmtVT, SDValue(Bits, 0), CurDAG->getTargetConstant(0, ShiftAmtVT)); Shift = CurDAG->getMachineNode(SPU::ROTQMBYv2i64, dl, VecVT, SDValue(VecOp0, 0), SDValue(Bytes, 0)); Shift = CurDAG->getMachineNode(SPU::ROTQMBIv2i64, dl, VecVT, SDValue(Shift, 0), SDValue(Bits, 0)); } return CurDAG->getMachineNode(SPU::ORi64_v2i64, dl, OpVT, SDValue(Shift, 0)); } /*! * Emit the instruction sequence for i64 arithmetic right shifts. * * @param Op The shl operand * @param OpVT Op's machine value value type (doesn't need to be passed, but * makes life easier.) * @return The SDNode with the entire instruction sequence */ SDNode * SPUDAGToDAGISel::SelectSRAi64(SDNode *N, EVT OpVT) { // Promote Op0 to vector EVT VecVT = EVT::getVectorVT(*CurDAG->getContext(), OpVT, (128 / OpVT.getSizeInBits())); SDValue ShiftAmt = N->getOperand(1); EVT ShiftAmtVT = ShiftAmt.getValueType(); DebugLoc dl = N->getDebugLoc(); SDNode *VecOp0 = CurDAG->getMachineNode(SPU::ORv2i64_i64, dl, VecVT, N->getOperand(0)); SDValue SignRotAmt = CurDAG->getTargetConstant(31, ShiftAmtVT); SDNode *SignRot = CurDAG->getMachineNode(SPU::ROTMAIv2i64_i32, dl, MVT::v2i64, SDValue(VecOp0, 0), SignRotAmt); SDNode *UpperHalfSign = CurDAG->getMachineNode(SPU::ORi32_v4i32, dl, MVT::i32, SDValue(SignRot, 0)); SDNode *UpperHalfSignMask = CurDAG->getMachineNode(SPU::FSM64r32, dl, VecVT, SDValue(UpperHalfSign, 0)); SDNode *UpperLowerMask = CurDAG->getMachineNode(SPU::FSMBIv2i64, dl, VecVT, CurDAG->getTargetConstant(0xff00ULL, MVT::i16)); SDNode *UpperLowerSelect = CurDAG->getMachineNode(SPU::SELBv2i64, dl, VecVT, SDValue(UpperHalfSignMask, 0), SDValue(VecOp0, 0), SDValue(UpperLowerMask, 0)); SDNode *Shift = 0; if (ConstantSDNode *CN = dyn_cast(ShiftAmt)) { unsigned bytes = unsigned(CN->getZExtValue()) >> 3; unsigned bits = unsigned(CN->getZExtValue()) & 7; if (bytes > 0) { bytes = 31 - bytes; Shift = CurDAG->getMachineNode(SPU::ROTQBYIv2i64, dl, VecVT, SDValue(UpperLowerSelect, 0), CurDAG->getTargetConstant(bytes, ShiftAmtVT)); } if (bits > 0) { bits = 8 - bits; Shift = CurDAG->getMachineNode(SPU::ROTQBIIv2i64, dl, VecVT, SDValue((Shift != 0 ? Shift : UpperLowerSelect), 0), CurDAG->getTargetConstant(bits, ShiftAmtVT)); } } else { SDNode *NegShift = CurDAG->getMachineNode(SPU::SFIr32, dl, ShiftAmtVT, ShiftAmt, CurDAG->getTargetConstant(0, ShiftAmtVT)); Shift = CurDAG->getMachineNode(SPU::ROTQBYBIv2i64_r32, dl, VecVT, SDValue(UpperLowerSelect, 0), SDValue(NegShift, 0)); Shift = CurDAG->getMachineNode(SPU::ROTQBIv2i64, dl, VecVT, SDValue(Shift, 0), SDValue(NegShift, 0)); } return CurDAG->getMachineNode(SPU::ORi64_v2i64, dl, OpVT, SDValue(Shift, 0)); } /*! Do the necessary magic necessary to load a i64 constant */ SDNode *SPUDAGToDAGISel::SelectI64Constant(SDNode *N, EVT OpVT, DebugLoc dl) { ConstantSDNode *CN = cast(N); return SelectI64Constant(CN->getZExtValue(), OpVT, dl); } SDNode *SPUDAGToDAGISel::SelectI64Constant(uint64_t Value64, EVT OpVT, DebugLoc dl) { EVT OpVecVT = EVT::getVectorVT(*CurDAG->getContext(), OpVT, 2); SDValue i64vec = SPU::LowerV2I64Splat(OpVecVT, *CurDAG, Value64, dl); // Here's where it gets interesting, because we have to parse out the // subtree handed back in i64vec: if (i64vec.getOpcode() == ISD::BIT_CONVERT) { // The degenerate case where the upper and lower bits in the splat are // identical: SDValue Op0 = i64vec.getOperand(0); ReplaceUses(i64vec, Op0); return CurDAG->getMachineNode(SPU::ORi64_v2i64, dl, OpVT, SDValue(emitBuildVector(Op0.getNode()), 0)); } else if (i64vec.getOpcode() == SPUISD::SHUFB) { SDValue lhs = i64vec.getOperand(0); SDValue rhs = i64vec.getOperand(1); SDValue shufmask = i64vec.getOperand(2); if (lhs.getOpcode() == ISD::BIT_CONVERT) { ReplaceUses(lhs, lhs.getOperand(0)); lhs = lhs.getOperand(0); } SDNode *lhsNode = (lhs.getNode()->isMachineOpcode() ? lhs.getNode() : emitBuildVector(lhs.getNode())); if (rhs.getOpcode() == ISD::BIT_CONVERT) { ReplaceUses(rhs, rhs.getOperand(0)); rhs = rhs.getOperand(0); } SDNode *rhsNode = (rhs.getNode()->isMachineOpcode() ? rhs.getNode() : emitBuildVector(rhs.getNode())); if (shufmask.getOpcode() == ISD::BIT_CONVERT) { ReplaceUses(shufmask, shufmask.getOperand(0)); shufmask = shufmask.getOperand(0); } SDNode *shufMaskNode = (shufmask.getNode()->isMachineOpcode() ? shufmask.getNode() : emitBuildVector(shufmask.getNode())); SDValue shufNode = CurDAG->getNode(SPUISD::SHUFB, dl, OpVecVT, SDValue(lhsNode, 0), SDValue(rhsNode, 0), SDValue(shufMaskNode, 0)); HandleSDNode Dummy(shufNode); SDNode *SN = SelectCode(Dummy.getValue().getNode()); if (SN == 0) SN = Dummy.getValue().getNode(); return CurDAG->getMachineNode(SPU::ORi64_v2i64, dl, OpVT, SDValue(SN, 0)); } else if (i64vec.getOpcode() == ISD::BUILD_VECTOR) { return CurDAG->getMachineNode(SPU::ORi64_v2i64, dl, OpVT, SDValue(emitBuildVector(i64vec.getNode()), 0)); } else { report_fatal_error("SPUDAGToDAGISel::SelectI64Constant: Unhandled i64vec" "condition"); } } /// createSPUISelDag - This pass converts a legalized DAG into a /// SPU-specific DAG, ready for instruction scheduling. /// FunctionPass *llvm::createSPUISelDag(SPUTargetMachine &TM) { return new SPUDAGToDAGISel(TM); } diff --git a/lib/Target/X86/AsmParser/X86AsmParser.cpp b/lib/Target/X86/AsmParser/X86AsmParser.cpp index da013505dabd..6b403c10a1eb 100644 --- a/lib/Target/X86/AsmParser/X86AsmParser.cpp +++ b/lib/Target/X86/AsmParser/X86AsmParser.cpp @@ -1,644 +1,700 @@ //===-- X86AsmParser.cpp - Parse X86 assembly to MCInst instructions ------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// #include "llvm/Target/TargetAsmParser.h" #include "X86.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringSwitch.h" #include "llvm/ADT/Twine.h" #include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCExpr.h" #include "llvm/MC/MCInst.h" #include "llvm/MC/MCParser/MCAsmLexer.h" #include "llvm/MC/MCParser/MCAsmParser.h" #include "llvm/MC/MCParser/MCParsedAsmOperand.h" #include "llvm/Support/SourceMgr.h" #include "llvm/Target/TargetRegistry.h" #include "llvm/Target/TargetAsmParser.h" using namespace llvm; namespace { struct X86Operand; class X86ATTAsmParser : public TargetAsmParser { MCAsmParser &Parser; protected: unsigned Is64Bit : 1; private: MCAsmParser &getParser() const { return Parser; } MCAsmLexer &getLexer() const { return Parser.getLexer(); } void Warning(SMLoc L, const Twine &Msg) { Parser.Warning(L, Msg); } bool Error(SMLoc L, const Twine &Msg) { return Parser.Error(L, Msg); } bool ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc); X86Operand *ParseOperand(); X86Operand *ParseMemOperand(unsigned SegReg, SMLoc StartLoc); bool ParseDirectiveWord(unsigned Size, SMLoc L); void InstructionCleanup(MCInst &Inst); /// @name Auto-generated Match Functions - /// { + /// { bool MatchInstruction(const SmallVectorImpl &Operands, MCInst &Inst); + bool MatchInstructionImpl( + const SmallVectorImpl &Operands, MCInst &Inst); + /// } public: X86ATTAsmParser(const Target &T, MCAsmParser &_Parser) : TargetAsmParser(T), Parser(_Parser) {} virtual bool ParseInstruction(const StringRef &Name, SMLoc NameLoc, SmallVectorImpl &Operands); virtual bool ParseDirective(AsmToken DirectiveID); }; class X86_32ATTAsmParser : public X86ATTAsmParser { public: X86_32ATTAsmParser(const Target &T, MCAsmParser &_Parser) : X86ATTAsmParser(T, _Parser) { Is64Bit = false; } }; class X86_64ATTAsmParser : public X86ATTAsmParser { public: X86_64ATTAsmParser(const Target &T, MCAsmParser &_Parser) : X86ATTAsmParser(T, _Parser) { Is64Bit = true; } }; } // end anonymous namespace /// @name Auto-generated Match Functions /// { static unsigned MatchRegisterName(StringRef Name); /// } namespace { /// X86Operand - Instances of this class represent a parsed X86 machine /// instruction. struct X86Operand : public MCParsedAsmOperand { enum KindTy { Token, Register, Immediate, Memory } Kind; SMLoc StartLoc, EndLoc; union { struct { const char *Data; unsigned Length; } Tok; struct { unsigned RegNo; } Reg; struct { const MCExpr *Val; } Imm; struct { unsigned SegReg; const MCExpr *Disp; unsigned BaseReg; unsigned IndexReg; unsigned Scale; } Mem; }; X86Operand(KindTy K, SMLoc Start, SMLoc End) : Kind(K), StartLoc(Start), EndLoc(End) {} - + /// getStartLoc - Get the location of the first token of this operand. SMLoc getStartLoc() const { return StartLoc; } /// getEndLoc - Get the location of the last token of this operand. SMLoc getEndLoc() const { return EndLoc; } StringRef getToken() const { assert(Kind == Token && "Invalid access!"); return StringRef(Tok.Data, Tok.Length); } + void setTokenValue(StringRef Value) { + assert(Kind == Token && "Invalid access!"); + Tok.Data = Value.data(); + Tok.Length = Value.size(); + } unsigned getReg() const { assert(Kind == Register && "Invalid access!"); return Reg.RegNo; } const MCExpr *getImm() const { assert(Kind == Immediate && "Invalid access!"); return Imm.Val; } const MCExpr *getMemDisp() const { assert(Kind == Memory && "Invalid access!"); return Mem.Disp; } unsigned getMemSegReg() const { assert(Kind == Memory && "Invalid access!"); return Mem.SegReg; } unsigned getMemBaseReg() const { assert(Kind == Memory && "Invalid access!"); return Mem.BaseReg; } unsigned getMemIndexReg() const { assert(Kind == Memory && "Invalid access!"); return Mem.IndexReg; } unsigned getMemScale() const { assert(Kind == Memory && "Invalid access!"); return Mem.Scale; } bool isToken() const {return Kind == Token; } bool isImm() const { return Kind == Immediate; } bool isImmSExt8() const { // Accept immediates which fit in 8 bits when sign extended, and // non-absolute immediates. if (!isImm()) return false; if (const MCConstantExpr *CE = dyn_cast(getImm())) { int64_t Value = CE->getValue(); return Value == (int64_t) (int8_t) Value; } return true; } bool isMem() const { return Kind == Memory; } bool isAbsMem() const { return Kind == Memory && !getMemSegReg() && !getMemBaseReg() && !getMemIndexReg() && getMemScale() == 1; } bool isNoSegMem() const { return Kind == Memory && !getMemSegReg(); } bool isReg() const { return Kind == Register; } void addExpr(MCInst &Inst, const MCExpr *Expr) const { // Add as immediates when possible. if (const MCConstantExpr *CE = dyn_cast(Expr)) Inst.addOperand(MCOperand::CreateImm(CE->getValue())); else Inst.addOperand(MCOperand::CreateExpr(Expr)); } void addRegOperands(MCInst &Inst, unsigned N) const { assert(N == 1 && "Invalid number of operands!"); Inst.addOperand(MCOperand::CreateReg(getReg())); } void addImmOperands(MCInst &Inst, unsigned N) const { assert(N == 1 && "Invalid number of operands!"); addExpr(Inst, getImm()); } void addImmSExt8Operands(MCInst &Inst, unsigned N) const { // FIXME: Support user customization of the render method. assert(N == 1 && "Invalid number of operands!"); addExpr(Inst, getImm()); } void addMemOperands(MCInst &Inst, unsigned N) const { assert((N == 5) && "Invalid number of operands!"); Inst.addOperand(MCOperand::CreateReg(getMemBaseReg())); Inst.addOperand(MCOperand::CreateImm(getMemScale())); Inst.addOperand(MCOperand::CreateReg(getMemIndexReg())); addExpr(Inst, getMemDisp()); Inst.addOperand(MCOperand::CreateReg(getMemSegReg())); } void addAbsMemOperands(MCInst &Inst, unsigned N) const { assert((N == 1) && "Invalid number of operands!"); Inst.addOperand(MCOperand::CreateExpr(getMemDisp())); } void addNoSegMemOperands(MCInst &Inst, unsigned N) const { assert((N == 4) && "Invalid number of operands!"); Inst.addOperand(MCOperand::CreateReg(getMemBaseReg())); Inst.addOperand(MCOperand::CreateImm(getMemScale())); Inst.addOperand(MCOperand::CreateReg(getMemIndexReg())); addExpr(Inst, getMemDisp()); } static X86Operand *CreateToken(StringRef Str, SMLoc Loc) { X86Operand *Res = new X86Operand(Token, Loc, Loc); Res->Tok.Data = Str.data(); Res->Tok.Length = Str.size(); return Res; } static X86Operand *CreateReg(unsigned RegNo, SMLoc StartLoc, SMLoc EndLoc) { X86Operand *Res = new X86Operand(Register, StartLoc, EndLoc); Res->Reg.RegNo = RegNo; return Res; } static X86Operand *CreateImm(const MCExpr *Val, SMLoc StartLoc, SMLoc EndLoc){ X86Operand *Res = new X86Operand(Immediate, StartLoc, EndLoc); Res->Imm.Val = Val; return Res; } /// Create an absolute memory operand. static X86Operand *CreateMem(const MCExpr *Disp, SMLoc StartLoc, SMLoc EndLoc) { X86Operand *Res = new X86Operand(Memory, StartLoc, EndLoc); Res->Mem.SegReg = 0; Res->Mem.Disp = Disp; Res->Mem.BaseReg = 0; Res->Mem.IndexReg = 0; Res->Mem.Scale = 1; return Res; } /// Create a generalized memory operand. static X86Operand *CreateMem(unsigned SegReg, const MCExpr *Disp, unsigned BaseReg, unsigned IndexReg, unsigned Scale, SMLoc StartLoc, SMLoc EndLoc) { // We should never just have a displacement, that should be parsed as an // absolute memory operand. assert((SegReg || BaseReg || IndexReg) && "Invalid memory operand!"); // The scale should always be one of {1,2,4,8}. assert(((Scale == 1 || Scale == 2 || Scale == 4 || Scale == 8)) && "Invalid scale!"); X86Operand *Res = new X86Operand(Memory, StartLoc, EndLoc); Res->Mem.SegReg = SegReg; Res->Mem.Disp = Disp; Res->Mem.BaseReg = BaseReg; Res->Mem.IndexReg = IndexReg; Res->Mem.Scale = Scale; return Res; } }; } // end anonymous namespace. bool X86ATTAsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc) { RegNo = 0; const AsmToken &TokPercent = Parser.getTok(); assert(TokPercent.is(AsmToken::Percent) && "Invalid token kind!"); StartLoc = TokPercent.getLoc(); Parser.Lex(); // Eat percent token. const AsmToken &Tok = Parser.getTok(); if (Tok.isNot(AsmToken::Identifier)) return Error(Tok.getLoc(), "invalid register name"); // FIXME: Validate register for the current architecture; we have to do // validation later, so maybe there is no need for this here. RegNo = MatchRegisterName(Tok.getString()); // Parse %st(1) and "%st" as "%st(0)" if (RegNo == 0 && Tok.getString() == "st") { RegNo = X86::ST0; EndLoc = Tok.getLoc(); Parser.Lex(); // Eat 'st' // Check to see if we have '(4)' after %st. if (getLexer().isNot(AsmToken::LParen)) return false; // Lex the paren. getParser().Lex(); const AsmToken &IntTok = Parser.getTok(); if (IntTok.isNot(AsmToken::Integer)) return Error(IntTok.getLoc(), "expected stack index"); switch (IntTok.getIntVal()) { case 0: RegNo = X86::ST0; break; case 1: RegNo = X86::ST1; break; case 2: RegNo = X86::ST2; break; case 3: RegNo = X86::ST3; break; case 4: RegNo = X86::ST4; break; case 5: RegNo = X86::ST5; break; case 6: RegNo = X86::ST6; break; case 7: RegNo = X86::ST7; break; default: return Error(IntTok.getLoc(), "invalid stack index"); } if (getParser().Lex().isNot(AsmToken::RParen)) return Error(Parser.getTok().getLoc(), "expected ')'"); EndLoc = Tok.getLoc(); Parser.Lex(); // Eat ')' return false; } if (RegNo == 0) return Error(Tok.getLoc(), "invalid register name"); EndLoc = Tok.getLoc(); Parser.Lex(); // Eat identifier token. return false; } X86Operand *X86ATTAsmParser::ParseOperand() { switch (getLexer().getKind()) { default: // Parse a memory operand with no segment register. return ParseMemOperand(0, Parser.getTok().getLoc()); case AsmToken::Percent: { // Read the register. unsigned RegNo; SMLoc Start, End; if (ParseRegister(RegNo, Start, End)) return 0; // If this is a segment register followed by a ':', then this is the start // of a memory reference, otherwise this is a normal register reference. if (getLexer().isNot(AsmToken::Colon)) return X86Operand::CreateReg(RegNo, Start, End); getParser().Lex(); // Eat the colon. return ParseMemOperand(RegNo, Start); } case AsmToken::Dollar: { // $42 -> immediate. SMLoc Start = Parser.getTok().getLoc(), End; Parser.Lex(); const MCExpr *Val; if (getParser().ParseExpression(Val, End)) return 0; return X86Operand::CreateImm(Val, Start, End); } } } /// ParseMemOperand: segment: disp(basereg, indexreg, scale). The '%ds:' prefix /// has already been parsed if present. X86Operand *X86ATTAsmParser::ParseMemOperand(unsigned SegReg, SMLoc MemStart) { // We have to disambiguate a parenthesized expression "(4+5)" from the start // of a memory operand with a missing displacement "(%ebx)" or "(,%eax)". The // only way to do this without lookahead is to eat the '(' and see what is // after it. const MCExpr *Disp = MCConstantExpr::Create(0, getParser().getContext()); if (getLexer().isNot(AsmToken::LParen)) { SMLoc ExprEnd; if (getParser().ParseExpression(Disp, ExprEnd)) return 0; // After parsing the base expression we could either have a parenthesized // memory address or not. If not, return now. If so, eat the (. if (getLexer().isNot(AsmToken::LParen)) { // Unless we have a segment register, treat this as an immediate. if (SegReg == 0) return X86Operand::CreateMem(Disp, MemStart, ExprEnd); return X86Operand::CreateMem(SegReg, Disp, 0, 0, 1, MemStart, ExprEnd); } // Eat the '('. Parser.Lex(); } else { // Okay, we have a '('. We don't know if this is an expression or not, but // so we have to eat the ( to see beyond it. SMLoc LParenLoc = Parser.getTok().getLoc(); Parser.Lex(); // Eat the '('. if (getLexer().is(AsmToken::Percent) || getLexer().is(AsmToken::Comma)) { // Nothing to do here, fall into the code below with the '(' part of the // memory operand consumed. } else { SMLoc ExprEnd; // It must be an parenthesized expression, parse it now. if (getParser().ParseParenExpression(Disp, ExprEnd)) return 0; // After parsing the base expression we could either have a parenthesized // memory address or not. If not, return now. If so, eat the (. if (getLexer().isNot(AsmToken::LParen)) { // Unless we have a segment register, treat this as an immediate. if (SegReg == 0) return X86Operand::CreateMem(Disp, LParenLoc, ExprEnd); return X86Operand::CreateMem(SegReg, Disp, 0, 0, 1, MemStart, ExprEnd); } // Eat the '('. Parser.Lex(); } } // If we reached here, then we just ate the ( of the memory operand. Process // the rest of the memory operand. unsigned BaseReg = 0, IndexReg = 0, Scale = 1; if (getLexer().is(AsmToken::Percent)) { SMLoc L; if (ParseRegister(BaseReg, L, L)) return 0; } if (getLexer().is(AsmToken::Comma)) { Parser.Lex(); // Eat the comma. // Following the comma we should have either an index register, or a scale // value. We don't support the later form, but we want to parse it // correctly. // // Not that even though it would be completely consistent to support syntax // like "1(%eax,,1)", the assembler doesn't. if (getLexer().is(AsmToken::Percent)) { SMLoc L; if (ParseRegister(IndexReg, L, L)) return 0; if (getLexer().isNot(AsmToken::RParen)) { // Parse the scale amount: // ::= ',' [scale-expression] if (getLexer().isNot(AsmToken::Comma)) { Error(Parser.getTok().getLoc(), "expected comma in scale expression"); return 0; } Parser.Lex(); // Eat the comma. if (getLexer().isNot(AsmToken::RParen)) { SMLoc Loc = Parser.getTok().getLoc(); int64_t ScaleVal; if (getParser().ParseAbsoluteExpression(ScaleVal)) return 0; // Validate the scale amount. if (ScaleVal != 1 && ScaleVal != 2 && ScaleVal != 4 && ScaleVal != 8){ Error(Loc, "scale factor in address must be 1, 2, 4 or 8"); return 0; } Scale = (unsigned)ScaleVal; } } } else if (getLexer().isNot(AsmToken::RParen)) { // Otherwise we have the unsupported form of a scale amount without an // index. SMLoc Loc = Parser.getTok().getLoc(); int64_t Value; if (getParser().ParseAbsoluteExpression(Value)) return 0; Error(Loc, "cannot have scale factor without index register"); return 0; } } // Ok, we've eaten the memory operand, verify we have a ')' and eat it too. if (getLexer().isNot(AsmToken::RParen)) { Error(Parser.getTok().getLoc(), "unexpected token in memory operand"); return 0; } SMLoc MemEnd = Parser.getTok().getLoc(); Parser.Lex(); // Eat the ')'. return X86Operand::CreateMem(SegReg, Disp, BaseReg, IndexReg, Scale, MemStart, MemEnd); } bool X86ATTAsmParser:: ParseInstruction(const StringRef &Name, SMLoc NameLoc, SmallVectorImpl &Operands) { // FIXME: Hack to recognize "sal..." and "rep..." for now. We need a way to // represent alternative syntaxes in the .td file, without requiring // instruction duplication. StringRef PatchedName = StringSwitch(Name) .Case("sal", "shl") .Case("salb", "shlb") .Case("sall", "shll") .Case("salq", "shlq") .Case("salw", "shlw") .Case("repe", "rep") .Case("repz", "rep") .Case("repnz", "repne") .Default(Name); Operands.push_back(X86Operand::CreateToken(PatchedName, NameLoc)); if (getLexer().isNot(AsmToken::EndOfStatement)) { // Parse '*' modifier. if (getLexer().is(AsmToken::Star)) { SMLoc Loc = Parser.getTok().getLoc(); Operands.push_back(X86Operand::CreateToken("*", Loc)); Parser.Lex(); // Eat the star. } // Read the first operand. if (X86Operand *Op = ParseOperand()) Operands.push_back(Op); else return true; while (getLexer().is(AsmToken::Comma)) { Parser.Lex(); // Eat the comma. // Parse and remember the operand. if (X86Operand *Op = ParseOperand()) Operands.push_back(Op); else return true; } } // FIXME: Hack to handle recognizing s{hr,ar,hl}? $1. if ((Name.startswith("shr") || Name.startswith("sar") || Name.startswith("shl")) && Operands.size() == 3 && static_cast(Operands[1])->isImm() && isa(static_cast(Operands[1])->getImm()) && cast(static_cast(Operands[1])->getImm())->getValue() == 1) { delete Operands[1]; Operands.erase(Operands.begin() + 1); } return false; } bool X86ATTAsmParser::ParseDirective(AsmToken DirectiveID) { StringRef IDVal = DirectiveID.getIdentifier(); if (IDVal == ".word") return ParseDirectiveWord(2, DirectiveID.getLoc()); return true; } /// ParseDirectiveWord /// ::= .word [ expression (, expression)* ] bool X86ATTAsmParser::ParseDirectiveWord(unsigned Size, SMLoc L) { if (getLexer().isNot(AsmToken::EndOfStatement)) { for (;;) { const MCExpr *Value; if (getParser().ParseExpression(Value)) return true; getParser().getStreamer().EmitValue(Value, Size, 0 /*addrspace*/); if (getLexer().is(AsmToken::EndOfStatement)) break; // FIXME: Improve diagnostic. if (getLexer().isNot(AsmToken::Comma)) return Error(L, "unexpected token in directive"); Parser.Lex(); } } Parser.Lex(); return false; } // FIXME: Custom X86 cleanup function to implement a temporary hack to handle // matching INCL/DECL correctly for x86_64. This needs to be replaced by a // proper mechanism for supporting (ambiguous) feature dependent instructions. void X86ATTAsmParser::InstructionCleanup(MCInst &Inst) { if (!Is64Bit) return; switch (Inst.getOpcode()) { case X86::DEC16r: Inst.setOpcode(X86::DEC64_16r); break; case X86::DEC16m: Inst.setOpcode(X86::DEC64_16m); break; case X86::DEC32r: Inst.setOpcode(X86::DEC64_32r); break; case X86::DEC32m: Inst.setOpcode(X86::DEC64_32m); break; case X86::INC16r: Inst.setOpcode(X86::INC64_16r); break; case X86::INC16m: Inst.setOpcode(X86::INC64_16m); break; case X86::INC32r: Inst.setOpcode(X86::INC64_32r); break; case X86::INC32m: Inst.setOpcode(X86::INC64_32m); break; } } +bool +X86ATTAsmParser::MatchInstruction(const SmallVectorImpl + &Operands, + MCInst &Inst) { + // First, try a direct match. + if (!MatchInstructionImpl(Operands, Inst)) + return false; + + // Ignore anything which is obviously not a suffix match. + if (Operands.size() == 0) + return true; + X86Operand *Op = static_cast(Operands[0]); + if (!Op->isToken() || Op->getToken().size() > 15) + return true; + + // FIXME: Ideally, we would only attempt suffix matches for things which are + // valid prefixes, and we could just infer the right unambiguous + // type. However, that requires substantially more matcher support than the + // following hack. + + // Change the operand to point to a temporary token. + char Tmp[16]; + StringRef Base = Op->getToken(); + memcpy(Tmp, Base.data(), Base.size()); + Op->setTokenValue(StringRef(Tmp, Base.size() + 1)); + + // Check for the various suffix matches. + Tmp[Base.size()] = 'b'; + bool MatchB = MatchInstructionImpl(Operands, Inst); + Tmp[Base.size()] = 'w'; + bool MatchW = MatchInstructionImpl(Operands, Inst); + Tmp[Base.size()] = 'l'; + bool MatchL = MatchInstructionImpl(Operands, Inst); + + // Restore the old token. + Op->setTokenValue(Base); + + // If exactly one matched, then we treat that as a successful match (and the + // instruction will already have been filled in correctly, since the failing + // matches won't have modified it). + if (MatchB + MatchW + MatchL == 2) + return false; + + // Otherwise, the match failed. + return true; +} + + extern "C" void LLVMInitializeX86AsmLexer(); // Force static initialization. extern "C" void LLVMInitializeX86AsmParser() { RegisterAsmParser X(TheX86_32Target); RegisterAsmParser Y(TheX86_64Target); LLVMInitializeX86AsmLexer(); } #include "X86GenAsmMatcher.inc" diff --git a/lib/Target/X86/X86.td b/lib/Target/X86/X86.td index ec86fc248e3d..a53f973c1c43 100644 --- a/lib/Target/X86/X86.td +++ b/lib/Target/X86/X86.td @@ -1,211 +1,212 @@ //===- X86.td - Target definition file for the Intel X86 ---*- tablegen -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This is a target description file for the Intel i386 architecture, refered to // here as the "X86" architecture. // //===----------------------------------------------------------------------===// // Get the target-independent interfaces which we are implementing... // include "llvm/Target/Target.td" //===----------------------------------------------------------------------===// // X86 Subtarget features. //===----------------------------------------------------------------------===// def FeatureCMOV : SubtargetFeature<"cmov","HasCMov", "true", "Enable conditional move instructions">; def FeatureMMX : SubtargetFeature<"mmx","X86SSELevel", "MMX", "Enable MMX instructions">; def FeatureSSE1 : SubtargetFeature<"sse", "X86SSELevel", "SSE1", "Enable SSE instructions", // SSE codegen depends on cmovs, and all // SSE1+ processors support them. [FeatureMMX, FeatureCMOV]>; def FeatureSSE2 : SubtargetFeature<"sse2", "X86SSELevel", "SSE2", "Enable SSE2 instructions", [FeatureSSE1]>; def FeatureSSE3 : SubtargetFeature<"sse3", "X86SSELevel", "SSE3", "Enable SSE3 instructions", [FeatureSSE2]>; def FeatureSSSE3 : SubtargetFeature<"ssse3", "X86SSELevel", "SSSE3", "Enable SSSE3 instructions", [FeatureSSE3]>; def FeatureSSE41 : SubtargetFeature<"sse41", "X86SSELevel", "SSE41", "Enable SSE 4.1 instructions", [FeatureSSSE3]>; def FeatureSSE42 : SubtargetFeature<"sse42", "X86SSELevel", "SSE42", "Enable SSE 4.2 instructions", [FeatureSSE41]>; def Feature3DNow : SubtargetFeature<"3dnow", "X863DNowLevel", "ThreeDNow", "Enable 3DNow! instructions">; def Feature3DNowA : SubtargetFeature<"3dnowa", "X863DNowLevel", "ThreeDNowA", "Enable 3DNow! Athlon instructions", [Feature3DNow]>; // All x86-64 hardware has SSE2, but we don't mark SSE2 as an implied // feature, because SSE2 can be disabled (e.g. for compiling OS kernels) // without disabling 64-bit mode. def Feature64Bit : SubtargetFeature<"64bit", "HasX86_64", "true", "Support 64-bit instructions", [FeatureCMOV]>; def FeatureSlowBTMem : SubtargetFeature<"slow-bt-mem", "IsBTMemSlow", "true", "Bit testing of memory is slow">; def FeatureFastUAMem : SubtargetFeature<"fast-unaligned-mem", "IsUAMemFast", "true", "Fast unaligned memory access">; def FeatureSSE4A : SubtargetFeature<"sse4a", "HasSSE4A", "true", "Support SSE 4a instructions">; def FeatureAVX : SubtargetFeature<"avx", "HasAVX", "true", "Enable AVX instructions">; def FeatureFMA3 : SubtargetFeature<"fma3", "HasFMA3", "true", "Enable three-operand fused multiple-add">; def FeatureFMA4 : SubtargetFeature<"fma4", "HasFMA4", "true", "Enable four-operand fused multiple-add">; def FeatureVectorUAMem : SubtargetFeature<"vector-unaligned-mem", "HasVectorUAMem", "true", "Allow unaligned memory operands on vector/SIMD instructions">; def FeatureAES : SubtargetFeature<"aes", "HasAES", "true", "Enable AES instructions">; //===----------------------------------------------------------------------===// // X86 processors supported. //===----------------------------------------------------------------------===// class Proc Features> : Processor; def : Proc<"generic", []>; def : Proc<"i386", []>; def : Proc<"i486", []>; def : Proc<"i586", []>; def : Proc<"pentium", []>; def : Proc<"pentium-mmx", [FeatureMMX]>; def : Proc<"i686", []>; def : Proc<"pentiumpro", [FeatureCMOV]>; def : Proc<"pentium2", [FeatureMMX, FeatureCMOV]>; def : Proc<"pentium3", [FeatureSSE1]>; def : Proc<"pentium-m", [FeatureSSE2, FeatureSlowBTMem]>; def : Proc<"pentium4", [FeatureSSE2]>; def : Proc<"x86-64", [FeatureSSE2, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"yonah", [FeatureSSE3, FeatureSlowBTMem]>; def : Proc<"prescott", [FeatureSSE3, FeatureSlowBTMem]>; def : Proc<"nocona", [FeatureSSE3, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"core2", [FeatureSSSE3, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"penryn", [FeatureSSE41, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"atom", [FeatureSSE3, Feature64Bit, FeatureSlowBTMem]>; // "Arrandale" along with corei3 and corei5 def : Proc<"corei7", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem, FeatureFastUAMem, FeatureAES]>; def : Proc<"nehalem", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem, FeatureFastUAMem]>; // Westmere is a similar machine to nehalem with some additional features. // Westmere is the corei3/i5/i7 path from nehalem to sandybridge def : Proc<"westmere", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem, FeatureFastUAMem, FeatureAES]>; // Sandy Bridge does not have FMA // FIXME: Wikipedia says it does... it should have AES as well. def : Proc<"sandybridge", [FeatureSSE42, FeatureAVX, Feature64Bit]>; def : Proc<"k6", [FeatureMMX]>; def : Proc<"k6-2", [FeatureMMX, Feature3DNow]>; def : Proc<"k6-3", [FeatureMMX, Feature3DNow]>; def : Proc<"athlon", [FeatureMMX, Feature3DNowA, FeatureSlowBTMem]>; def : Proc<"athlon-tbird", [FeatureMMX, Feature3DNowA, FeatureSlowBTMem]>; def : Proc<"athlon-4", [FeatureSSE1, Feature3DNowA, FeatureSlowBTMem]>; def : Proc<"athlon-xp", [FeatureSSE1, Feature3DNowA, FeatureSlowBTMem]>; def : Proc<"athlon-mp", [FeatureSSE1, Feature3DNowA, FeatureSlowBTMem]>; def : Proc<"k8", [FeatureSSE2, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"opteron", [FeatureSSE2, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"athlon64", [FeatureSSE2, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"athlon-fx", [FeatureSSE2, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"k8-sse3", [FeatureSSE3, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"opteron-sse3", [FeatureSSE3, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"athlon64-sse3", [FeatureSSE3, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"amdfam10", [FeatureSSE3, FeatureSSE4A, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"barcelona", [FeatureSSE3, FeatureSSE4A, Feature3DNowA, Feature64Bit, FeatureSlowBTMem]>; def : Proc<"istanbul", [Feature3DNowA, Feature64Bit, FeatureSSE4A, Feature3DNowA]>; def : Proc<"shanghai", [Feature3DNowA, Feature64Bit, FeatureSSE4A, Feature3DNowA]>; def : Proc<"winchip-c6", [FeatureMMX]>; def : Proc<"winchip2", [FeatureMMX, Feature3DNow]>; def : Proc<"c3", [FeatureMMX, Feature3DNow]>; def : Proc<"c3-2", [FeatureSSE1]>; //===----------------------------------------------------------------------===// // Register File Description //===----------------------------------------------------------------------===// include "X86RegisterInfo.td" //===----------------------------------------------------------------------===// // Instruction Descriptions //===----------------------------------------------------------------------===// include "X86InstrInfo.td" def X86InstrInfo : InstrInfo; //===----------------------------------------------------------------------===// // Calling Conventions //===----------------------------------------------------------------------===// include "X86CallingConv.td" //===----------------------------------------------------------------------===// // Assembly Printers //===----------------------------------------------------------------------===// // Currently the X86 assembly parser only supports ATT syntax. def ATTAsmParser : AsmParser { - string AsmParserClassName = "ATTAsmParser"; - string AsmParserInstCleanup = "InstructionCleanup"; + string AsmParserClassName = "ATTAsmParser"; + string AsmParserInstCleanup = "InstructionCleanup"; + string MatchInstructionName = "MatchInstructionImpl"; int Variant = 0; // Discard comments in assembly strings. string CommentDelimiter = "#"; // Recognize hard coded registers. string RegisterPrefix = "%"; } // The X86 target supports two different syntaxes for emitting machine code. // This is controlled by the -x86-asm-syntax={att|intel} def ATTAsmWriter : AsmWriter { string AsmWriterClassName = "ATTInstPrinter"; int Variant = 0; } def IntelAsmWriter : AsmWriter { string AsmWriterClassName = "IntelInstPrinter"; int Variant = 1; } def X86 : Target { // Information about the instructions... let InstructionSet = X86InstrInfo; let AssemblyParsers = [ATTAsmParser]; let AssemblyWriters = [ATTAsmWriter, IntelAsmWriter]; } diff --git a/lib/Transforms/Scalar/GVN.cpp b/lib/Transforms/Scalar/GVN.cpp index 321def7eb619..65b34b13dd47 100644 --- a/lib/Transforms/Scalar/GVN.cpp +++ b/lib/Transforms/Scalar/GVN.cpp @@ -1,2310 +1,2311 @@ //===- GVN.cpp - Eliminate redundant values and loads ---------------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This pass performs global value numbering to eliminate fully redundant // instructions. It also performs simple dead load elimination. // // Note that this pass does the value numbering itself; it does not use the // ValueNumbering analysis passes. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "gvn" #include "llvm/Transforms/Scalar.h" #include "llvm/BasicBlock.h" #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/GlobalVariable.h" #include "llvm/Function.h" #include "llvm/IntrinsicInst.h" #include "llvm/LLVMContext.h" #include "llvm/Operator.h" #include "llvm/Value.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/DepthFirstIterator.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/Statistic.h" #include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/Dominators.h" #include "llvm/Analysis/MemoryBuiltins.h" #include "llvm/Analysis/MemoryDependenceAnalysis.h" #include "llvm/Analysis/PHITransAddr.h" #include "llvm/Support/CFG.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/GetElementPtrTypeIterator.h" #include "llvm/Support/IRBuilder.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Target/TargetData.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/SSAUpdater.h" using namespace llvm; STATISTIC(NumGVNInstr, "Number of instructions deleted"); STATISTIC(NumGVNLoad, "Number of loads deleted"); STATISTIC(NumGVNPRE, "Number of instructions PRE'd"); STATISTIC(NumGVNBlocks, "Number of blocks merged"); STATISTIC(NumPRELoad, "Number of loads PRE'd"); static cl::opt EnablePRE("enable-pre", cl::init(true), cl::Hidden); static cl::opt EnableLoadPRE("enable-load-pre", cl::init(true)); static cl::opt EnableFullLoadPRE("enable-full-load-pre", cl::init(false)); //===----------------------------------------------------------------------===// // ValueTable Class //===----------------------------------------------------------------------===// /// This class holds the mapping between values and value numbers. It is used /// as an efficient mechanism to determine the expression-wise equivalence of /// two values. namespace { struct Expression { enum ExpressionOpcode { ADD = Instruction::Add, FADD = Instruction::FAdd, SUB = Instruction::Sub, FSUB = Instruction::FSub, MUL = Instruction::Mul, FMUL = Instruction::FMul, UDIV = Instruction::UDiv, SDIV = Instruction::SDiv, FDIV = Instruction::FDiv, UREM = Instruction::URem, SREM = Instruction::SRem, FREM = Instruction::FRem, SHL = Instruction::Shl, LSHR = Instruction::LShr, ASHR = Instruction::AShr, AND = Instruction::And, OR = Instruction::Or, XOR = Instruction::Xor, TRUNC = Instruction::Trunc, ZEXT = Instruction::ZExt, SEXT = Instruction::SExt, FPTOUI = Instruction::FPToUI, FPTOSI = Instruction::FPToSI, UITOFP = Instruction::UIToFP, SITOFP = Instruction::SIToFP, FPTRUNC = Instruction::FPTrunc, FPEXT = Instruction::FPExt, PTRTOINT = Instruction::PtrToInt, INTTOPTR = Instruction::IntToPtr, BITCAST = Instruction::BitCast, ICMPEQ, ICMPNE, ICMPUGT, ICMPUGE, ICMPULT, ICMPULE, ICMPSGT, ICMPSGE, ICMPSLT, ICMPSLE, FCMPOEQ, FCMPOGT, FCMPOGE, FCMPOLT, FCMPOLE, FCMPONE, FCMPORD, FCMPUNO, FCMPUEQ, FCMPUGT, FCMPUGE, FCMPULT, FCMPULE, FCMPUNE, EXTRACT, INSERT, SHUFFLE, SELECT, GEP, CALL, CONSTANT, INSERTVALUE, EXTRACTVALUE, EMPTY, TOMBSTONE }; ExpressionOpcode opcode; const Type* type; SmallVector varargs; Value *function; Expression() { } Expression(ExpressionOpcode o) : opcode(o) { } bool operator==(const Expression &other) const { if (opcode != other.opcode) return false; else if (opcode == EMPTY || opcode == TOMBSTONE) return true; else if (type != other.type) return false; else if (function != other.function) return false; else { if (varargs.size() != other.varargs.size()) return false; for (size_t i = 0; i < varargs.size(); ++i) if (varargs[i] != other.varargs[i]) return false; return true; } } bool operator!=(const Expression &other) const { return !(*this == other); } }; class ValueTable { private: DenseMap valueNumbering; DenseMap expressionNumbering; AliasAnalysis* AA; MemoryDependenceAnalysis* MD; DominatorTree* DT; uint32_t nextValueNumber; Expression::ExpressionOpcode getOpcode(CmpInst* C); Expression create_expression(BinaryOperator* BO); Expression create_expression(CmpInst* C); Expression create_expression(ShuffleVectorInst* V); Expression create_expression(ExtractElementInst* C); Expression create_expression(InsertElementInst* V); Expression create_expression(SelectInst* V); Expression create_expression(CastInst* C); Expression create_expression(GetElementPtrInst* G); Expression create_expression(CallInst* C); Expression create_expression(Constant* C); Expression create_expression(ExtractValueInst* C); Expression create_expression(InsertValueInst* C); uint32_t lookup_or_add_call(CallInst* C); public: ValueTable() : nextValueNumber(1) { } uint32_t lookup_or_add(Value *V); uint32_t lookup(Value *V) const; void add(Value *V, uint32_t num); void clear(); void erase(Value *v); unsigned size(); void setAliasAnalysis(AliasAnalysis* A) { AA = A; } AliasAnalysis *getAliasAnalysis() const { return AA; } void setMemDep(MemoryDependenceAnalysis* M) { MD = M; } void setDomTree(DominatorTree* D) { DT = D; } uint32_t getNextUnusedValueNumber() { return nextValueNumber; } void verifyRemoved(const Value *) const; }; } namespace llvm { template <> struct DenseMapInfo { static inline Expression getEmptyKey() { return Expression(Expression::EMPTY); } static inline Expression getTombstoneKey() { return Expression(Expression::TOMBSTONE); } static unsigned getHashValue(const Expression e) { unsigned hash = e.opcode; hash = ((unsigned)((uintptr_t)e.type >> 4) ^ (unsigned)((uintptr_t)e.type >> 9)); for (SmallVector::const_iterator I = e.varargs.begin(), E = e.varargs.end(); I != E; ++I) hash = *I + hash * 37; hash = ((unsigned)((uintptr_t)e.function >> 4) ^ (unsigned)((uintptr_t)e.function >> 9)) + hash * 37; return hash; } static bool isEqual(const Expression &LHS, const Expression &RHS) { return LHS == RHS; } }; template <> struct isPodLike { static const bool value = true; }; } //===----------------------------------------------------------------------===// // ValueTable Internal Functions //===----------------------------------------------------------------------===// Expression::ExpressionOpcode ValueTable::getOpcode(CmpInst* C) { if (isa(C)) { switch (C->getPredicate()) { default: // THIS SHOULD NEVER HAPPEN llvm_unreachable("Comparison with unknown predicate?"); case ICmpInst::ICMP_EQ: return Expression::ICMPEQ; case ICmpInst::ICMP_NE: return Expression::ICMPNE; case ICmpInst::ICMP_UGT: return Expression::ICMPUGT; case ICmpInst::ICMP_UGE: return Expression::ICMPUGE; case ICmpInst::ICMP_ULT: return Expression::ICMPULT; case ICmpInst::ICMP_ULE: return Expression::ICMPULE; case ICmpInst::ICMP_SGT: return Expression::ICMPSGT; case ICmpInst::ICMP_SGE: return Expression::ICMPSGE; case ICmpInst::ICMP_SLT: return Expression::ICMPSLT; case ICmpInst::ICMP_SLE: return Expression::ICMPSLE; } } else { switch (C->getPredicate()) { default: // THIS SHOULD NEVER HAPPEN llvm_unreachable("Comparison with unknown predicate?"); case FCmpInst::FCMP_OEQ: return Expression::FCMPOEQ; case FCmpInst::FCMP_OGT: return Expression::FCMPOGT; case FCmpInst::FCMP_OGE: return Expression::FCMPOGE; case FCmpInst::FCMP_OLT: return Expression::FCMPOLT; case FCmpInst::FCMP_OLE: return Expression::FCMPOLE; case FCmpInst::FCMP_ONE: return Expression::FCMPONE; case FCmpInst::FCMP_ORD: return Expression::FCMPORD; case FCmpInst::FCMP_UNO: return Expression::FCMPUNO; case FCmpInst::FCMP_UEQ: return Expression::FCMPUEQ; case FCmpInst::FCMP_UGT: return Expression::FCMPUGT; case FCmpInst::FCMP_UGE: return Expression::FCMPUGE; case FCmpInst::FCMP_ULT: return Expression::FCMPULT; case FCmpInst::FCMP_ULE: return Expression::FCMPULE; case FCmpInst::FCMP_UNE: return Expression::FCMPUNE; } } } Expression ValueTable::create_expression(CallInst* C) { Expression e; e.type = C->getType(); e.function = C->getCalledFunction(); e.opcode = Expression::CALL; for (CallInst::op_iterator I = C->op_begin()+1, E = C->op_end(); I != E; ++I) e.varargs.push_back(lookup_or_add(*I)); return e; } Expression ValueTable::create_expression(BinaryOperator* BO) { Expression e; e.varargs.push_back(lookup_or_add(BO->getOperand(0))); e.varargs.push_back(lookup_or_add(BO->getOperand(1))); e.function = 0; e.type = BO->getType(); e.opcode = static_cast(BO->getOpcode()); return e; } Expression ValueTable::create_expression(CmpInst* C) { Expression e; e.varargs.push_back(lookup_or_add(C->getOperand(0))); e.varargs.push_back(lookup_or_add(C->getOperand(1))); e.function = 0; e.type = C->getType(); e.opcode = getOpcode(C); return e; } Expression ValueTable::create_expression(CastInst* C) { Expression e; e.varargs.push_back(lookup_or_add(C->getOperand(0))); e.function = 0; e.type = C->getType(); e.opcode = static_cast(C->getOpcode()); return e; } Expression ValueTable::create_expression(ShuffleVectorInst* S) { Expression e; e.varargs.push_back(lookup_or_add(S->getOperand(0))); e.varargs.push_back(lookup_or_add(S->getOperand(1))); e.varargs.push_back(lookup_or_add(S->getOperand(2))); e.function = 0; e.type = S->getType(); e.opcode = Expression::SHUFFLE; return e; } Expression ValueTable::create_expression(ExtractElementInst* E) { Expression e; e.varargs.push_back(lookup_or_add(E->getOperand(0))); e.varargs.push_back(lookup_or_add(E->getOperand(1))); e.function = 0; e.type = E->getType(); e.opcode = Expression::EXTRACT; return e; } Expression ValueTable::create_expression(InsertElementInst* I) { Expression e; e.varargs.push_back(lookup_or_add(I->getOperand(0))); e.varargs.push_back(lookup_or_add(I->getOperand(1))); e.varargs.push_back(lookup_or_add(I->getOperand(2))); e.function = 0; e.type = I->getType(); e.opcode = Expression::INSERT; return e; } Expression ValueTable::create_expression(SelectInst* I) { Expression e; e.varargs.push_back(lookup_or_add(I->getCondition())); e.varargs.push_back(lookup_or_add(I->getTrueValue())); e.varargs.push_back(lookup_or_add(I->getFalseValue())); e.function = 0; e.type = I->getType(); e.opcode = Expression::SELECT; return e; } Expression ValueTable::create_expression(GetElementPtrInst* G) { Expression e; e.varargs.push_back(lookup_or_add(G->getPointerOperand())); e.function = 0; e.type = G->getType(); e.opcode = Expression::GEP; for (GetElementPtrInst::op_iterator I = G->idx_begin(), E = G->idx_end(); I != E; ++I) e.varargs.push_back(lookup_or_add(*I)); return e; } Expression ValueTable::create_expression(ExtractValueInst* E) { Expression e; e.varargs.push_back(lookup_or_add(E->getAggregateOperand())); for (ExtractValueInst::idx_iterator II = E->idx_begin(), IE = E->idx_end(); II != IE; ++II) e.varargs.push_back(*II); e.function = 0; e.type = E->getType(); e.opcode = Expression::EXTRACTVALUE; return e; } Expression ValueTable::create_expression(InsertValueInst* E) { Expression e; e.varargs.push_back(lookup_or_add(E->getAggregateOperand())); e.varargs.push_back(lookup_or_add(E->getInsertedValueOperand())); for (InsertValueInst::idx_iterator II = E->idx_begin(), IE = E->idx_end(); II != IE; ++II) e.varargs.push_back(*II); e.function = 0; e.type = E->getType(); e.opcode = Expression::INSERTVALUE; return e; } //===----------------------------------------------------------------------===// // ValueTable External Functions //===----------------------------------------------------------------------===// /// add - Insert a value into the table with a specified value number. void ValueTable::add(Value *V, uint32_t num) { valueNumbering.insert(std::make_pair(V, num)); } uint32_t ValueTable::lookup_or_add_call(CallInst* C) { if (AA->doesNotAccessMemory(C)) { Expression exp = create_expression(C); uint32_t& e = expressionNumbering[exp]; if (!e) e = nextValueNumber++; valueNumbering[C] = e; return e; } else if (AA->onlyReadsMemory(C)) { Expression exp = create_expression(C); uint32_t& e = expressionNumbering[exp]; if (!e) { e = nextValueNumber++; valueNumbering[C] = e; return e; } if (!MD) { e = nextValueNumber++; valueNumbering[C] = e; return e; } MemDepResult local_dep = MD->getDependency(C); if (!local_dep.isDef() && !local_dep.isNonLocal()) { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } if (local_dep.isDef()) { CallInst* local_cdep = cast(local_dep.getInst()); if (local_cdep->getNumOperands() != C->getNumOperands()) { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } for (unsigned i = 1; i < C->getNumOperands(); ++i) { uint32_t c_vn = lookup_or_add(C->getOperand(i)); uint32_t cd_vn = lookup_or_add(local_cdep->getOperand(i)); if (c_vn != cd_vn) { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } } uint32_t v = lookup_or_add(local_cdep); valueNumbering[C] = v; return v; } // Non-local case. const MemoryDependenceAnalysis::NonLocalDepInfo &deps = MD->getNonLocalCallDependency(CallSite(C)); // FIXME: call/call dependencies for readonly calls should return def, not // clobber! Move the checking logic to MemDep! CallInst* cdep = 0; // Check to see if we have a single dominating call instruction that is // identical to C. for (unsigned i = 0, e = deps.size(); i != e; ++i) { const NonLocalDepEntry *I = &deps[i]; // Ignore non-local dependencies. if (I->getResult().isNonLocal()) continue; // We don't handle non-depedencies. If we already have a call, reject // instruction dependencies. if (I->getResult().isClobber() || cdep != 0) { cdep = 0; break; } CallInst *NonLocalDepCall = dyn_cast(I->getResult().getInst()); // FIXME: All duplicated with non-local case. if (NonLocalDepCall && DT->properlyDominates(I->getBB(), C->getParent())){ cdep = NonLocalDepCall; continue; } cdep = 0; break; } if (!cdep) { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } if (cdep->getNumOperands() != C->getNumOperands()) { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } for (unsigned i = 1; i < C->getNumOperands(); ++i) { uint32_t c_vn = lookup_or_add(C->getOperand(i)); uint32_t cd_vn = lookup_or_add(cdep->getOperand(i)); if (c_vn != cd_vn) { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } } uint32_t v = lookup_or_add(cdep); valueNumbering[C] = v; return v; } else { valueNumbering[C] = nextValueNumber; return nextValueNumber++; } } /// lookup_or_add - Returns the value number for the specified value, assigning /// it a new number if it did not have one before. uint32_t ValueTable::lookup_or_add(Value *V) { DenseMap::iterator VI = valueNumbering.find(V); if (VI != valueNumbering.end()) return VI->second; if (!isa(V)) { valueNumbering[V] = nextValueNumber; return nextValueNumber++; } Instruction* I = cast(V); Expression exp; switch (I->getOpcode()) { case Instruction::Call: return lookup_or_add_call(cast(I)); case Instruction::Add: case Instruction::FAdd: case Instruction::Sub: case Instruction::FSub: case Instruction::Mul: case Instruction::FMul: case Instruction::UDiv: case Instruction::SDiv: case Instruction::FDiv: case Instruction::URem: case Instruction::SRem: case Instruction::FRem: case Instruction::Shl: case Instruction::LShr: case Instruction::AShr: case Instruction::And: case Instruction::Or : case Instruction::Xor: exp = create_expression(cast(I)); break; case Instruction::ICmp: case Instruction::FCmp: exp = create_expression(cast(I)); break; case Instruction::Trunc: case Instruction::ZExt: case Instruction::SExt: case Instruction::FPToUI: case Instruction::FPToSI: case Instruction::UIToFP: case Instruction::SIToFP: case Instruction::FPTrunc: case Instruction::FPExt: case Instruction::PtrToInt: case Instruction::IntToPtr: case Instruction::BitCast: exp = create_expression(cast(I)); break; case Instruction::Select: exp = create_expression(cast(I)); break; case Instruction::ExtractElement: exp = create_expression(cast(I)); break; case Instruction::InsertElement: exp = create_expression(cast(I)); break; case Instruction::ShuffleVector: exp = create_expression(cast(I)); break; case Instruction::ExtractValue: exp = create_expression(cast(I)); break; case Instruction::InsertValue: exp = create_expression(cast(I)); break; case Instruction::GetElementPtr: exp = create_expression(cast(I)); break; default: valueNumbering[V] = nextValueNumber; return nextValueNumber++; } uint32_t& e = expressionNumbering[exp]; if (!e) e = nextValueNumber++; valueNumbering[V] = e; return e; } /// lookup - Returns the value number of the specified value. Fails if /// the value has not yet been numbered. uint32_t ValueTable::lookup(Value *V) const { DenseMap::const_iterator VI = valueNumbering.find(V); assert(VI != valueNumbering.end() && "Value not numbered?"); return VI->second; } /// clear - Remove all entries from the ValueTable void ValueTable::clear() { valueNumbering.clear(); expressionNumbering.clear(); nextValueNumber = 1; } /// erase - Remove a value from the value numbering void ValueTable::erase(Value *V) { valueNumbering.erase(V); } /// verifyRemoved - Verify that the value is removed from all internal data /// structures. void ValueTable::verifyRemoved(const Value *V) const { for (DenseMap::const_iterator I = valueNumbering.begin(), E = valueNumbering.end(); I != E; ++I) { assert(I->first != V && "Inst still occurs in value numbering map!"); } } //===----------------------------------------------------------------------===// // GVN Pass //===----------------------------------------------------------------------===// namespace { struct ValueNumberScope { ValueNumberScope* parent; DenseMap table; ValueNumberScope(ValueNumberScope* p) : parent(p) { } }; } namespace { class GVN : public FunctionPass { bool runOnFunction(Function &F); public: static char ID; // Pass identification, replacement for typeid explicit GVN(bool noloads = false) : FunctionPass(&ID), NoLoads(noloads), MD(0) { } private: bool NoLoads; MemoryDependenceAnalysis *MD; DominatorTree *DT; ValueTable VN; DenseMap localAvail; // List of critical edges to be split between iterations. SmallVector, 4> toSplit; // This transformation requires dominator postdominator info virtual void getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequired(); if (!NoLoads) AU.addRequired(); AU.addRequired(); AU.addPreserved(); AU.addPreserved(); } // Helper fuctions // FIXME: eliminate or document these better bool processLoad(LoadInst* L, SmallVectorImpl &toErase); bool processInstruction(Instruction *I, SmallVectorImpl &toErase); bool processNonLocalLoad(LoadInst* L, SmallVectorImpl &toErase); bool processBlock(BasicBlock *BB); void dump(DenseMap& d); bool iterateOnFunction(Function &F); Value *CollapsePhi(PHINode* p); bool performPRE(Function& F); Value *lookupNumber(BasicBlock *BB, uint32_t num); void cleanupGlobalSets(); void verifyRemoved(const Instruction *I) const; bool splitCriticalEdges(); }; char GVN::ID = 0; } // createGVNPass - The public interface to this file... FunctionPass *llvm::createGVNPass(bool NoLoads) { return new GVN(NoLoads); } static RegisterPass X("gvn", "Global Value Numbering"); void GVN::dump(DenseMap& d) { errs() << "{\n"; for (DenseMap::iterator I = d.begin(), E = d.end(); I != E; ++I) { errs() << I->first << "\n"; I->second->dump(); } errs() << "}\n"; } static bool isSafeReplacement(PHINode* p, Instruction *inst) { if (!isa(inst)) return true; for (Instruction::use_iterator UI = p->use_begin(), E = p->use_end(); UI != E; ++UI) if (PHINode* use_phi = dyn_cast(UI)) if (use_phi->getParent() == inst->getParent()) return false; return true; } Value *GVN::CollapsePhi(PHINode *PN) { Value *ConstVal = PN->hasConstantValue(DT); if (!ConstVal) return 0; Instruction *Inst = dyn_cast(ConstVal); if (!Inst) return ConstVal; if (DT->dominates(Inst, PN)) if (isSafeReplacement(PN, Inst)) return Inst; return 0; } /// IsValueFullyAvailableInBlock - Return true if we can prove that the value /// we're analyzing is fully available in the specified block. As we go, keep /// track of which blocks we know are fully alive in FullyAvailableBlocks. This /// map is actually a tri-state map with the following values: /// 0) we know the block *is not* fully available. /// 1) we know the block *is* fully available. /// 2) we do not know whether the block is fully available or not, but we are /// currently speculating that it will be. /// 3) we are speculating for this block and have used that to speculate for /// other blocks. static bool IsValueFullyAvailableInBlock(BasicBlock *BB, DenseMap &FullyAvailableBlocks) { // Optimistically assume that the block is fully available and check to see // if we already know about this block in one lookup. std::pair::iterator, char> IV = FullyAvailableBlocks.insert(std::make_pair(BB, 2)); // If the entry already existed for this block, return the precomputed value. if (!IV.second) { // If this is a speculative "available" value, mark it as being used for // speculation of other blocks. if (IV.first->second == 2) IV.first->second = 3; return IV.first->second != 0; } // Otherwise, see if it is fully available in all predecessors. pred_iterator PI = pred_begin(BB), PE = pred_end(BB); // If this block has no predecessors, it isn't live-in here. if (PI == PE) goto SpeculationFailure; for (; PI != PE; ++PI) // If the value isn't fully available in one of our predecessors, then it // isn't fully available in this block either. Undo our previous // optimistic assumption and bail out. if (!IsValueFullyAvailableInBlock(*PI, FullyAvailableBlocks)) goto SpeculationFailure; return true; // SpeculationFailure - If we get here, we found out that this is not, after // all, a fully-available block. We have a problem if we speculated on this and // used the speculation to mark other blocks as available. SpeculationFailure: char &BBVal = FullyAvailableBlocks[BB]; // If we didn't speculate on this, just return with it set to false. if (BBVal == 2) { BBVal = 0; return false; } // If we did speculate on this value, we could have blocks set to 1 that are // incorrect. Walk the (transitive) successors of this block and mark them as // 0 if set to one. SmallVector BBWorklist; BBWorklist.push_back(BB); do { BasicBlock *Entry = BBWorklist.pop_back_val(); // Note that this sets blocks to 0 (unavailable) if they happen to not // already be in FullyAvailableBlocks. This is safe. char &EntryVal = FullyAvailableBlocks[Entry]; if (EntryVal == 0) continue; // Already unavailable. // Mark as unavailable. EntryVal = 0; for (succ_iterator I = succ_begin(Entry), E = succ_end(Entry); I != E; ++I) BBWorklist.push_back(*I); } while (!BBWorklist.empty()); return false; } /// CanCoerceMustAliasedValueToLoad - Return true if /// CoerceAvailableValueToLoadType will succeed. static bool CanCoerceMustAliasedValueToLoad(Value *StoredVal, const Type *LoadTy, const TargetData &TD) { // If the loaded or stored value is an first class array or struct, don't try // to transform them. We need to be able to bitcast to integer. if (LoadTy->isStructTy() || LoadTy->isArrayTy() || StoredVal->getType()->isStructTy() || StoredVal->getType()->isArrayTy()) return false; // The store has to be at least as big as the load. if (TD.getTypeSizeInBits(StoredVal->getType()) < TD.getTypeSizeInBits(LoadTy)) return false; return true; } /// CoerceAvailableValueToLoadType - If we saw a store of a value to memory, and /// then a load from a must-aliased pointer of a different type, try to coerce /// the stored value. LoadedTy is the type of the load we want to replace and /// InsertPt is the place to insert new instructions. /// /// If we can't do it, return null. static Value *CoerceAvailableValueToLoadType(Value *StoredVal, const Type *LoadedTy, Instruction *InsertPt, const TargetData &TD) { if (!CanCoerceMustAliasedValueToLoad(StoredVal, LoadedTy, TD)) return 0; const Type *StoredValTy = StoredVal->getType(); uint64_t StoreSize = TD.getTypeSizeInBits(StoredValTy); uint64_t LoadSize = TD.getTypeSizeInBits(LoadedTy); // If the store and reload are the same size, we can always reuse it. if (StoreSize == LoadSize) { if (StoredValTy->isPointerTy() && LoadedTy->isPointerTy()) { // Pointer to Pointer -> use bitcast. return new BitCastInst(StoredVal, LoadedTy, "", InsertPt); } // Convert source pointers to integers, which can be bitcast. if (StoredValTy->isPointerTy()) { StoredValTy = TD.getIntPtrType(StoredValTy->getContext()); StoredVal = new PtrToIntInst(StoredVal, StoredValTy, "", InsertPt); } const Type *TypeToCastTo = LoadedTy; if (TypeToCastTo->isPointerTy()) TypeToCastTo = TD.getIntPtrType(StoredValTy->getContext()); if (StoredValTy != TypeToCastTo) StoredVal = new BitCastInst(StoredVal, TypeToCastTo, "", InsertPt); // Cast to pointer if the load needs a pointer type. if (LoadedTy->isPointerTy()) StoredVal = new IntToPtrInst(StoredVal, LoadedTy, "", InsertPt); return StoredVal; } // If the loaded value is smaller than the available value, then we can // extract out a piece from it. If the available value is too small, then we // can't do anything. assert(StoreSize >= LoadSize && "CanCoerceMustAliasedValueToLoad fail"); // Convert source pointers to integers, which can be manipulated. if (StoredValTy->isPointerTy()) { StoredValTy = TD.getIntPtrType(StoredValTy->getContext()); StoredVal = new PtrToIntInst(StoredVal, StoredValTy, "", InsertPt); } // Convert vectors and fp to integer, which can be manipulated. if (!StoredValTy->isIntegerTy()) { StoredValTy = IntegerType::get(StoredValTy->getContext(), StoreSize); StoredVal = new BitCastInst(StoredVal, StoredValTy, "", InsertPt); } // If this is a big-endian system, we need to shift the value down to the low // bits so that a truncate will work. if (TD.isBigEndian()) { Constant *Val = ConstantInt::get(StoredVal->getType(), StoreSize-LoadSize); StoredVal = BinaryOperator::CreateLShr(StoredVal, Val, "tmp", InsertPt); } // Truncate the integer to the right size now. const Type *NewIntTy = IntegerType::get(StoredValTy->getContext(), LoadSize); StoredVal = new TruncInst(StoredVal, NewIntTy, "trunc", InsertPt); if (LoadedTy == NewIntTy) return StoredVal; // If the result is a pointer, inttoptr. if (LoadedTy->isPointerTy()) return new IntToPtrInst(StoredVal, LoadedTy, "inttoptr", InsertPt); // Otherwise, bitcast. return new BitCastInst(StoredVal, LoadedTy, "bitcast", InsertPt); } /// GetBaseWithConstantOffset - Analyze the specified pointer to see if it can /// be expressed as a base pointer plus a constant offset. Return the base and /// offset to the caller. static Value *GetBaseWithConstantOffset(Value *Ptr, int64_t &Offset, const TargetData &TD) { Operator *PtrOp = dyn_cast(Ptr); if (PtrOp == 0) return Ptr; // Just look through bitcasts. if (PtrOp->getOpcode() == Instruction::BitCast) return GetBaseWithConstantOffset(PtrOp->getOperand(0), Offset, TD); // If this is a GEP with constant indices, we can look through it. GEPOperator *GEP = dyn_cast(PtrOp); if (GEP == 0 || !GEP->hasAllConstantIndices()) return Ptr; gep_type_iterator GTI = gep_type_begin(GEP); for (User::op_iterator I = GEP->idx_begin(), E = GEP->idx_end(); I != E; ++I, ++GTI) { ConstantInt *OpC = cast(*I); if (OpC->isZero()) continue; // Handle a struct and array indices which add their offset to the pointer. if (const StructType *STy = dyn_cast(*GTI)) { Offset += TD.getStructLayout(STy)->getElementOffset(OpC->getZExtValue()); } else { uint64_t Size = TD.getTypeAllocSize(GTI.getIndexedType()); Offset += OpC->getSExtValue()*Size; } } // Re-sign extend from the pointer size if needed to get overflow edge cases // right. unsigned PtrSize = TD.getPointerSizeInBits(); if (PtrSize < 64) Offset = (Offset << (64-PtrSize)) >> (64-PtrSize); return GetBaseWithConstantOffset(GEP->getPointerOperand(), Offset, TD); } /// AnalyzeLoadFromClobberingWrite - This function is called when we have a /// memdep query of a load that ends up being a clobbering memory write (store, /// memset, memcpy, memmove). This means that the write *may* provide bits used /// by the load but we can't be sure because the pointers don't mustalias. /// /// Check this case to see if there is anything more we can do before we give /// up. This returns -1 if we have to give up, or a byte number in the stored /// value of the piece that feeds the load. static int AnalyzeLoadFromClobberingWrite(const Type *LoadTy, Value *LoadPtr, Value *WritePtr, uint64_t WriteSizeInBits, const TargetData &TD) { // If the loaded or stored value is an first class array or struct, don't try // to transform them. We need to be able to bitcast to integer. if (LoadTy->isStructTy() || LoadTy->isArrayTy()) return -1; int64_t StoreOffset = 0, LoadOffset = 0; Value *StoreBase = GetBaseWithConstantOffset(WritePtr, StoreOffset, TD); Value *LoadBase = GetBaseWithConstantOffset(LoadPtr, LoadOffset, TD); if (StoreBase != LoadBase) return -1; // If the load and store are to the exact same address, they should have been // a must alias. AA must have gotten confused. // FIXME: Study to see if/when this happens. One case is forwarding a memset // to a load from the base of the memset. #if 0 if (LoadOffset == StoreOffset) { dbgs() << "STORE/LOAD DEP WITH COMMON POINTER MISSED:\n" << "Base = " << *StoreBase << "\n" << "Store Ptr = " << *WritePtr << "\n" << "Store Offs = " << StoreOffset << "\n" << "Load Ptr = " << *LoadPtr << "\n"; abort(); } #endif // If the load and store don't overlap at all, the store doesn't provide // anything to the load. In this case, they really don't alias at all, AA // must have gotten confused. // FIXME: Investigate cases where this bails out, e.g. rdar://7238614. Then // remove this check, as it is duplicated with what we have below. uint64_t LoadSize = TD.getTypeSizeInBits(LoadTy); if ((WriteSizeInBits & 7) | (LoadSize & 7)) return -1; uint64_t StoreSize = WriteSizeInBits >> 3; // Convert to bytes. LoadSize >>= 3; bool isAAFailure = false; if (StoreOffset < LoadOffset) isAAFailure = StoreOffset+int64_t(StoreSize) <= LoadOffset; else isAAFailure = LoadOffset+int64_t(LoadSize) <= StoreOffset; if (isAAFailure) { #if 0 dbgs() << "STORE LOAD DEP WITH COMMON BASE:\n" << "Base = " << *StoreBase << "\n" << "Store Ptr = " << *WritePtr << "\n" << "Store Offs = " << StoreOffset << "\n" << "Load Ptr = " << *LoadPtr << "\n"; abort(); #endif return -1; } // If the Load isn't completely contained within the stored bits, we don't // have all the bits to feed it. We could do something crazy in the future // (issue a smaller load then merge the bits in) but this seems unlikely to be // valuable. if (StoreOffset > LoadOffset || StoreOffset+StoreSize < LoadOffset+LoadSize) return -1; // Okay, we can do this transformation. Return the number of bytes into the // store that the load is. return LoadOffset-StoreOffset; } /// AnalyzeLoadFromClobberingStore - This function is called when we have a /// memdep query of a load that ends up being a clobbering store. static int AnalyzeLoadFromClobberingStore(const Type *LoadTy, Value *LoadPtr, StoreInst *DepSI, const TargetData &TD) { // Cannot handle reading from store of first-class aggregate yet. if (DepSI->getOperand(0)->getType()->isStructTy() || DepSI->getOperand(0)->getType()->isArrayTy()) return -1; Value *StorePtr = DepSI->getPointerOperand(); uint64_t StoreSize = TD.getTypeSizeInBits(DepSI->getOperand(0)->getType()); return AnalyzeLoadFromClobberingWrite(LoadTy, LoadPtr, StorePtr, StoreSize, TD); } static int AnalyzeLoadFromClobberingMemInst(const Type *LoadTy, Value *LoadPtr, MemIntrinsic *MI, const TargetData &TD) { // If the mem operation is a non-constant size, we can't handle it. ConstantInt *SizeCst = dyn_cast(MI->getLength()); if (SizeCst == 0) return -1; uint64_t MemSizeInBits = SizeCst->getZExtValue()*8; // If this is memset, we just need to see if the offset is valid in the size // of the memset.. if (MI->getIntrinsicID() == Intrinsic::memset) return AnalyzeLoadFromClobberingWrite(LoadTy, LoadPtr, MI->getDest(), MemSizeInBits, TD); // If we have a memcpy/memmove, the only case we can handle is if this is a // copy from constant memory. In that case, we can read directly from the // constant memory. MemTransferInst *MTI = cast(MI); Constant *Src = dyn_cast(MTI->getSource()); if (Src == 0) return -1; GlobalVariable *GV = dyn_cast(Src->getUnderlyingObject()); if (GV == 0 || !GV->isConstant()) return -1; // See if the access is within the bounds of the transfer. int Offset = AnalyzeLoadFromClobberingWrite(LoadTy, LoadPtr, MI->getDest(), MemSizeInBits, TD); if (Offset == -1) return Offset; // Otherwise, see if we can constant fold a load from the constant with the // offset applied as appropriate. Src = ConstantExpr::getBitCast(Src, llvm::Type::getInt8PtrTy(Src->getContext())); Constant *OffsetCst = ConstantInt::get(Type::getInt64Ty(Src->getContext()), (unsigned)Offset); Src = ConstantExpr::getGetElementPtr(Src, &OffsetCst, 1); Src = ConstantExpr::getBitCast(Src, PointerType::getUnqual(LoadTy)); if (ConstantFoldLoadFromConstPtr(Src, &TD)) return Offset; return -1; } /// GetStoreValueForLoad - This function is called when we have a /// memdep query of a load that ends up being a clobbering store. This means /// that the store *may* provide bits used by the load but we can't be sure /// because the pointers don't mustalias. Check this case to see if there is /// anything more we can do before we give up. static Value *GetStoreValueForLoad(Value *SrcVal, unsigned Offset, const Type *LoadTy, Instruction *InsertPt, const TargetData &TD){ LLVMContext &Ctx = SrcVal->getType()->getContext(); uint64_t StoreSize = TD.getTypeSizeInBits(SrcVal->getType())/8; uint64_t LoadSize = TD.getTypeSizeInBits(LoadTy)/8; IRBuilder<> Builder(InsertPt->getParent(), InsertPt); // Compute which bits of the stored value are being used by the load. Convert // to an integer type to start with. if (SrcVal->getType()->isPointerTy()) SrcVal = Builder.CreatePtrToInt(SrcVal, TD.getIntPtrType(Ctx), "tmp"); if (!SrcVal->getType()->isIntegerTy()) SrcVal = Builder.CreateBitCast(SrcVal, IntegerType::get(Ctx, StoreSize*8), "tmp"); // Shift the bits to the least significant depending on endianness. unsigned ShiftAmt; if (TD.isLittleEndian()) ShiftAmt = Offset*8; else ShiftAmt = (StoreSize-LoadSize-Offset)*8; if (ShiftAmt) SrcVal = Builder.CreateLShr(SrcVal, ShiftAmt, "tmp"); if (LoadSize != StoreSize) SrcVal = Builder.CreateTrunc(SrcVal, IntegerType::get(Ctx, LoadSize*8), "tmp"); return CoerceAvailableValueToLoadType(SrcVal, LoadTy, InsertPt, TD); } /// GetMemInstValueForLoad - This function is called when we have a /// memdep query of a load that ends up being a clobbering mem intrinsic. static Value *GetMemInstValueForLoad(MemIntrinsic *SrcInst, unsigned Offset, const Type *LoadTy, Instruction *InsertPt, const TargetData &TD){ LLVMContext &Ctx = LoadTy->getContext(); uint64_t LoadSize = TD.getTypeSizeInBits(LoadTy)/8; IRBuilder<> Builder(InsertPt->getParent(), InsertPt); // We know that this method is only called when the mem transfer fully // provides the bits for the load. if (MemSetInst *MSI = dyn_cast(SrcInst)) { // memset(P, 'x', 1234) -> splat('x'), even if x is a variable, and // independently of what the offset is. Value *Val = MSI->getValue(); if (LoadSize != 1) Val = Builder.CreateZExt(Val, IntegerType::get(Ctx, LoadSize*8)); Value *OneElt = Val; // Splat the value out to the right number of bits. for (unsigned NumBytesSet = 1; NumBytesSet != LoadSize; ) { // If we can double the number of bytes set, do it. if (NumBytesSet*2 <= LoadSize) { Value *ShVal = Builder.CreateShl(Val, NumBytesSet*8); Val = Builder.CreateOr(Val, ShVal); NumBytesSet <<= 1; continue; } // Otherwise insert one byte at a time. Value *ShVal = Builder.CreateShl(Val, 1*8); Val = Builder.CreateOr(OneElt, ShVal); ++NumBytesSet; } return CoerceAvailableValueToLoadType(Val, LoadTy, InsertPt, TD); } // Otherwise, this is a memcpy/memmove from a constant global. MemTransferInst *MTI = cast(SrcInst); Constant *Src = cast(MTI->getSource()); // Otherwise, see if we can constant fold a load from the constant with the // offset applied as appropriate. Src = ConstantExpr::getBitCast(Src, llvm::Type::getInt8PtrTy(Src->getContext())); Constant *OffsetCst = ConstantInt::get(Type::getInt64Ty(Src->getContext()), (unsigned)Offset); Src = ConstantExpr::getGetElementPtr(Src, &OffsetCst, 1); Src = ConstantExpr::getBitCast(Src, PointerType::getUnqual(LoadTy)); return ConstantFoldLoadFromConstPtr(Src, &TD); } namespace { struct AvailableValueInBlock { /// BB - The basic block in question. BasicBlock *BB; enum ValType { SimpleVal, // A simple offsetted value that is accessed. MemIntrin // A memory intrinsic which is loaded from. }; /// V - The value that is live out of the block. PointerIntPair Val; /// Offset - The byte offset in Val that is interesting for the load query. unsigned Offset; static AvailableValueInBlock get(BasicBlock *BB, Value *V, unsigned Offset = 0) { AvailableValueInBlock Res; Res.BB = BB; Res.Val.setPointer(V); Res.Val.setInt(SimpleVal); Res.Offset = Offset; return Res; } static AvailableValueInBlock getMI(BasicBlock *BB, MemIntrinsic *MI, unsigned Offset = 0) { AvailableValueInBlock Res; Res.BB = BB; Res.Val.setPointer(MI); Res.Val.setInt(MemIntrin); Res.Offset = Offset; return Res; } bool isSimpleValue() const { return Val.getInt() == SimpleVal; } Value *getSimpleValue() const { assert(isSimpleValue() && "Wrong accessor"); return Val.getPointer(); } MemIntrinsic *getMemIntrinValue() const { assert(!isSimpleValue() && "Wrong accessor"); return cast(Val.getPointer()); } /// MaterializeAdjustedValue - Emit code into this block to adjust the value /// defined here to the specified type. This handles various coercion cases. Value *MaterializeAdjustedValue(const Type *LoadTy, const TargetData *TD) const { Value *Res; if (isSimpleValue()) { Res = getSimpleValue(); if (Res->getType() != LoadTy) { assert(TD && "Need target data to handle type mismatch case"); Res = GetStoreValueForLoad(Res, Offset, LoadTy, BB->getTerminator(), *TD); DEBUG(errs() << "GVN COERCED NONLOCAL VAL:\nOffset: " << Offset << " " << *getSimpleValue() << '\n' << *Res << '\n' << "\n\n\n"); } } else { Res = GetMemInstValueForLoad(getMemIntrinValue(), Offset, LoadTy, BB->getTerminator(), *TD); DEBUG(errs() << "GVN COERCED NONLOCAL MEM INTRIN:\nOffset: " << Offset << " " << *getMemIntrinValue() << '\n' << *Res << '\n' << "\n\n\n"); } return Res; } }; } /// ConstructSSAForLoadSet - Given a set of loads specified by ValuesPerBlock, /// construct SSA form, allowing us to eliminate LI. This returns the value /// that should be used at LI's definition site. static Value *ConstructSSAForLoadSet(LoadInst *LI, SmallVectorImpl &ValuesPerBlock, const TargetData *TD, const DominatorTree &DT, AliasAnalysis *AA) { // Check for the fully redundant, dominating load case. In this case, we can // just use the dominating value directly. if (ValuesPerBlock.size() == 1 && DT.properlyDominates(ValuesPerBlock[0].BB, LI->getParent())) return ValuesPerBlock[0].MaterializeAdjustedValue(LI->getType(), TD); // Otherwise, we have to construct SSA form. SmallVector NewPHIs; SSAUpdater SSAUpdate(&NewPHIs); SSAUpdate.Initialize(LI); const Type *LoadTy = LI->getType(); for (unsigned i = 0, e = ValuesPerBlock.size(); i != e; ++i) { const AvailableValueInBlock &AV = ValuesPerBlock[i]; BasicBlock *BB = AV.BB; if (SSAUpdate.HasValueForBlock(BB)) continue; SSAUpdate.AddAvailableValue(BB, AV.MaterializeAdjustedValue(LoadTy, TD)); } // Perform PHI construction. Value *V = SSAUpdate.GetValueInMiddleOfBlock(LI->getParent()); // If new PHI nodes were created, notify alias analysis. if (V->getType()->isPointerTy()) for (unsigned i = 0, e = NewPHIs.size(); i != e; ++i) AA->copyValue(LI, NewPHIs[i]); return V; } static bool isLifetimeStart(const Instruction *Inst) { if (const IntrinsicInst* II = dyn_cast(Inst)) return II->getIntrinsicID() == Intrinsic::lifetime_start; return false; } /// processNonLocalLoad - Attempt to eliminate a load whose dependencies are /// non-local by performing PHI construction. bool GVN::processNonLocalLoad(LoadInst *LI, SmallVectorImpl &toErase) { // Find the non-local dependencies of the load. SmallVector Deps; MD->getNonLocalPointerDependency(LI->getOperand(0), true, LI->getParent(), Deps); //DEBUG(dbgs() << "INVESTIGATING NONLOCAL LOAD: " // << Deps.size() << *LI << '\n'); // If we had to process more than one hundred blocks to find the // dependencies, this load isn't worth worrying about. Optimizing // it will be too expensive. if (Deps.size() > 100) return false; // If we had a phi translation failure, we'll have a single entry which is a // clobber in the current block. Reject this early. if (Deps.size() == 1 && Deps[0].getResult().isClobber()) { DEBUG( dbgs() << "GVN: non-local load "; WriteAsOperand(dbgs(), LI); dbgs() << " is clobbered by " << *Deps[0].getResult().getInst() << '\n'; ); return false; } // Filter out useless results (non-locals, etc). Keep track of the blocks // where we have a value available in repl, also keep track of whether we see // dependencies that produce an unknown value for the load (such as a call // that could potentially clobber the load). SmallVector ValuesPerBlock; SmallVector UnavailableBlocks; const TargetData *TD = 0; for (unsigned i = 0, e = Deps.size(); i != e; ++i) { BasicBlock *DepBB = Deps[i].getBB(); MemDepResult DepInfo = Deps[i].getResult(); if (DepInfo.isClobber()) { // The address being loaded in this non-local block may not be the same as // the pointer operand of the load if PHI translation occurs. Make sure // to consider the right address. Value *Address = Deps[i].getAddress(); // If the dependence is to a store that writes to a superset of the bits // read by the load, we can extract the bits we need for the load from the // stored value. if (StoreInst *DepSI = dyn_cast(DepInfo.getInst())) { if (TD == 0) TD = getAnalysisIfAvailable(); if (TD && Address) { int Offset = AnalyzeLoadFromClobberingStore(LI->getType(), Address, DepSI, *TD); if (Offset != -1) { ValuesPerBlock.push_back(AvailableValueInBlock::get(DepBB, DepSI->getOperand(0), Offset)); continue; } } } // If the clobbering value is a memset/memcpy/memmove, see if we can // forward a value on from it. if (MemIntrinsic *DepMI = dyn_cast(DepInfo.getInst())) { if (TD == 0) TD = getAnalysisIfAvailable(); if (TD && Address) { int Offset = AnalyzeLoadFromClobberingMemInst(LI->getType(), Address, DepMI, *TD); if (Offset != -1) { ValuesPerBlock.push_back(AvailableValueInBlock::getMI(DepBB, DepMI, Offset)); continue; } } } UnavailableBlocks.push_back(DepBB); continue; } Instruction *DepInst = DepInfo.getInst(); // Loading the allocation -> undef. if (isa(DepInst) || isMalloc(DepInst) || // Loading immediately after lifetime begin -> undef. isLifetimeStart(DepInst)) { ValuesPerBlock.push_back(AvailableValueInBlock::get(DepBB, UndefValue::get(LI->getType()))); continue; } if (StoreInst *S = dyn_cast(DepInst)) { // Reject loads and stores that are to the same address but are of // different types if we have to. if (S->getOperand(0)->getType() != LI->getType()) { if (TD == 0) TD = getAnalysisIfAvailable(); // If the stored value is larger or equal to the loaded value, we can // reuse it. if (TD == 0 || !CanCoerceMustAliasedValueToLoad(S->getOperand(0), LI->getType(), *TD)) { UnavailableBlocks.push_back(DepBB); continue; } } ValuesPerBlock.push_back(AvailableValueInBlock::get(DepBB, S->getOperand(0))); continue; } if (LoadInst *LD = dyn_cast(DepInst)) { // If the types mismatch and we can't handle it, reject reuse of the load. if (LD->getType() != LI->getType()) { if (TD == 0) TD = getAnalysisIfAvailable(); // If the stored value is larger or equal to the loaded value, we can // reuse it. if (TD == 0 || !CanCoerceMustAliasedValueToLoad(LD, LI->getType(),*TD)){ UnavailableBlocks.push_back(DepBB); continue; } } ValuesPerBlock.push_back(AvailableValueInBlock::get(DepBB, LD)); continue; } UnavailableBlocks.push_back(DepBB); continue; } // If we have no predecessors that produce a known value for this load, exit // early. if (ValuesPerBlock.empty()) return false; // If all of the instructions we depend on produce a known value for this // load, then it is fully redundant and we can use PHI insertion to compute // its value. Insert PHIs and remove the fully redundant value now. if (UnavailableBlocks.empty()) { DEBUG(dbgs() << "GVN REMOVING NONLOCAL LOAD: " << *LI << '\n'); // Perform PHI construction. Value *V = ConstructSSAForLoadSet(LI, ValuesPerBlock, TD, *DT, VN.getAliasAnalysis()); LI->replaceAllUsesWith(V); if (isa(V)) V->takeName(LI); if (V->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(V); VN.erase(LI); toErase.push_back(LI); NumGVNLoad++; return true; } if (!EnablePRE || !EnableLoadPRE) return false; // Okay, we have *some* definitions of the value. This means that the value // is available in some of our (transitive) predecessors. Lets think about // doing PRE of this load. This will involve inserting a new load into the // predecessor when it's not available. We could do this in general, but // prefer to not increase code size. As such, we only do this when we know // that we only have to insert *one* load (which means we're basically moving // the load, not inserting a new one). SmallPtrSet Blockers; for (unsigned i = 0, e = UnavailableBlocks.size(); i != e; ++i) Blockers.insert(UnavailableBlocks[i]); // Lets find first basic block with more than one predecessor. Walk backwards // through predecessors if needed. BasicBlock *LoadBB = LI->getParent(); BasicBlock *TmpBB = LoadBB; bool isSinglePred = false; bool allSingleSucc = true; while (TmpBB->getSinglePredecessor()) { isSinglePred = true; TmpBB = TmpBB->getSinglePredecessor(); if (TmpBB == LoadBB) // Infinite (unreachable) loop. return false; if (Blockers.count(TmpBB)) return false; if (TmpBB->getTerminator()->getNumSuccessors() != 1) allSingleSucc = false; } assert(TmpBB); LoadBB = TmpBB; // If we have a repl set with LI itself in it, this means we have a loop where // at least one of the values is LI. Since this means that we won't be able // to eliminate LI even if we insert uses in the other predecessors, we will // end up increasing code size. Reject this by scanning for LI. for (unsigned i = 0, e = ValuesPerBlock.size(); i != e; ++i) { if (ValuesPerBlock[i].isSimpleValue() && ValuesPerBlock[i].getSimpleValue() == LI) { // Skip cases where LI is the only definition, even for EnableFullLoadPRE. if (!EnableFullLoadPRE || e == 1) return false; } } // FIXME: It is extremely unclear what this loop is doing, other than // artificially restricting loadpre. if (isSinglePred) { bool isHot = false; for (unsigned i = 0, e = ValuesPerBlock.size(); i != e; ++i) { const AvailableValueInBlock &AV = ValuesPerBlock[i]; if (AV.isSimpleValue()) // "Hot" Instruction is in some loop (because it dominates its dep. // instruction). if (Instruction *I = dyn_cast(AV.getSimpleValue())) if (DT->dominates(LI, I)) { isHot = true; break; } } // We are interested only in "hot" instructions. We don't want to do any // mis-optimizations here. if (!isHot) return false; } // Check to see how many predecessors have the loaded value fully // available. DenseMap PredLoads; DenseMap FullyAvailableBlocks; for (unsigned i = 0, e = ValuesPerBlock.size(); i != e; ++i) FullyAvailableBlocks[ValuesPerBlock[i].BB] = true; for (unsigned i = 0, e = UnavailableBlocks.size(); i != e; ++i) FullyAvailableBlocks[UnavailableBlocks[i]] = false; - bool NeedToSplitEdges = false; + SmallVector, 4> NeedToSplit; for (pred_iterator PI = pred_begin(LoadBB), E = pred_end(LoadBB); PI != E; ++PI) { BasicBlock *Pred = *PI; if (IsValueFullyAvailableInBlock(Pred, FullyAvailableBlocks)) { continue; } PredLoads[Pred] = 0; if (Pred->getTerminator()->getNumSuccessors() != 1) { if (isa(Pred->getTerminator())) { DEBUG(dbgs() << "COULD NOT PRE LOAD BECAUSE OF INDBR CRITICAL EDGE '" << Pred->getName() << "': " << *LI << '\n'); return false; } unsigned SuccNum = GetSuccessorNumber(Pred, LoadBB); - toSplit.push_back(std::make_pair(Pred->getTerminator(), SuccNum)); - NeedToSplitEdges = true; + NeedToSplit.push_back(std::make_pair(Pred->getTerminator(), SuccNum)); } } - if (NeedToSplitEdges) + if (!NeedToSplit.empty()) { + toSplit.append(NeedToSplit.size(), NeedToSplit.front()); return false; + } // Decide whether PRE is profitable for this load. unsigned NumUnavailablePreds = PredLoads.size(); assert(NumUnavailablePreds != 0 && "Fully available value should be eliminated above!"); if (!EnableFullLoadPRE) { // If this load is unavailable in multiple predecessors, reject it. // FIXME: If we could restructure the CFG, we could make a common pred with // all the preds that don't have an available LI and insert a new load into // that one block. if (NumUnavailablePreds != 1) return false; } // Check if the load can safely be moved to all the unavailable predecessors. bool CanDoPRE = true; SmallVector NewInsts; for (DenseMap::iterator I = PredLoads.begin(), E = PredLoads.end(); I != E; ++I) { BasicBlock *UnavailablePred = I->first; // Do PHI translation to get its value in the predecessor if necessary. The // returned pointer (if non-null) is guaranteed to dominate UnavailablePred. // If all preds have a single successor, then we know it is safe to insert // the load on the pred (?!?), so we can insert code to materialize the // pointer if it is not available. PHITransAddr Address(LI->getOperand(0), TD); Value *LoadPtr = 0; if (allSingleSucc) { LoadPtr = Address.PHITranslateWithInsertion(LoadBB, UnavailablePred, *DT, NewInsts); } else { Address.PHITranslateValue(LoadBB, UnavailablePred, DT); LoadPtr = Address.getAddr(); } // If we couldn't find or insert a computation of this phi translated value, // we fail PRE. if (LoadPtr == 0) { DEBUG(dbgs() << "COULDN'T INSERT PHI TRANSLATED VALUE OF: " << *LI->getOperand(0) << "\n"); CanDoPRE = false; break; } // Make sure it is valid to move this load here. We have to watch out for: // @1 = getelementptr (i8* p, ... // test p and branch if == 0 // load @1 // It is valid to have the getelementptr before the test, even if p can be 0, // as getelementptr only does address arithmetic. // If we are not pushing the value through any multiple-successor blocks // we do not have this case. Otherwise, check that the load is safe to // put anywhere; this can be improved, but should be conservatively safe. if (!allSingleSucc && // FIXME: REEVALUTE THIS. !isSafeToLoadUnconditionally(LoadPtr, UnavailablePred->getTerminator(), LI->getAlignment(), TD)) { CanDoPRE = false; break; } I->second = LoadPtr; } if (!CanDoPRE) { while (!NewInsts.empty()) NewInsts.pop_back_val()->eraseFromParent(); return false; } // Okay, we can eliminate this load by inserting a reload in the predecessor // and using PHI construction to get the value in the other predecessors, do // it. DEBUG(dbgs() << "GVN REMOVING PRE LOAD: " << *LI << '\n'); DEBUG(if (!NewInsts.empty()) dbgs() << "INSERTED " << NewInsts.size() << " INSTS: " << *NewInsts.back() << '\n'); // Assign value numbers to the new instructions. for (unsigned i = 0, e = NewInsts.size(); i != e; ++i) { // FIXME: We really _ought_ to insert these value numbers into their // parent's availability map. However, in doing so, we risk getting into // ordering issues. If a block hasn't been processed yet, we would be // marking a value as AVAIL-IN, which isn't what we intend. VN.lookup_or_add(NewInsts[i]); } for (DenseMap::iterator I = PredLoads.begin(), E = PredLoads.end(); I != E; ++I) { BasicBlock *UnavailablePred = I->first; Value *LoadPtr = I->second; Value *NewLoad = new LoadInst(LoadPtr, LI->getName()+".pre", false, LI->getAlignment(), UnavailablePred->getTerminator()); // Add the newly created load. ValuesPerBlock.push_back(AvailableValueInBlock::get(UnavailablePred, NewLoad)); MD->invalidateCachedPointerInfo(LoadPtr); DEBUG(dbgs() << "GVN INSERTED " << *NewLoad << '\n'); } // Perform PHI construction. Value *V = ConstructSSAForLoadSet(LI, ValuesPerBlock, TD, *DT, VN.getAliasAnalysis()); LI->replaceAllUsesWith(V); if (isa(V)) V->takeName(LI); if (V->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(V); VN.erase(LI); toErase.push_back(LI); NumPRELoad++; return true; } /// processLoad - Attempt to eliminate a load, first by eliminating it /// locally, and then attempting non-local elimination if that fails. bool GVN::processLoad(LoadInst *L, SmallVectorImpl &toErase) { if (!MD) return false; if (L->isVolatile()) return false; // ... to a pointer that has been loaded from before... MemDepResult Dep = MD->getDependency(L); // If the value isn't available, don't do anything! if (Dep.isClobber()) { // Check to see if we have something like this: // store i32 123, i32* %P // %A = bitcast i32* %P to i8* // %B = gep i8* %A, i32 1 // %C = load i8* %B // // We could do that by recognizing if the clobber instructions are obviously // a common base + constant offset, and if the previous store (or memset) // completely covers this load. This sort of thing can happen in bitfield // access code. Value *AvailVal = 0; if (StoreInst *DepSI = dyn_cast(Dep.getInst())) if (const TargetData *TD = getAnalysisIfAvailable()) { int Offset = AnalyzeLoadFromClobberingStore(L->getType(), L->getPointerOperand(), DepSI, *TD); if (Offset != -1) AvailVal = GetStoreValueForLoad(DepSI->getOperand(0), Offset, L->getType(), L, *TD); } // If the clobbering value is a memset/memcpy/memmove, see if we can forward // a value on from it. if (MemIntrinsic *DepMI = dyn_cast(Dep.getInst())) { if (const TargetData *TD = getAnalysisIfAvailable()) { int Offset = AnalyzeLoadFromClobberingMemInst(L->getType(), L->getPointerOperand(), DepMI, *TD); if (Offset != -1) AvailVal = GetMemInstValueForLoad(DepMI, Offset, L->getType(), L,*TD); } } if (AvailVal) { DEBUG(dbgs() << "GVN COERCED INST:\n" << *Dep.getInst() << '\n' << *AvailVal << '\n' << *L << "\n\n\n"); // Replace the load! L->replaceAllUsesWith(AvailVal); if (AvailVal->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(AvailVal); VN.erase(L); toErase.push_back(L); NumGVNLoad++; return true; } DEBUG( // fast print dep, using operator<< on instruction would be too slow dbgs() << "GVN: load "; WriteAsOperand(dbgs(), L); Instruction *I = Dep.getInst(); dbgs() << " is clobbered by " << *I << '\n'; ); return false; } // If it is defined in another block, try harder. if (Dep.isNonLocal()) return processNonLocalLoad(L, toErase); Instruction *DepInst = Dep.getInst(); if (StoreInst *DepSI = dyn_cast(DepInst)) { Value *StoredVal = DepSI->getOperand(0); // The store and load are to a must-aliased pointer, but they may not // actually have the same type. See if we know how to reuse the stored // value (depending on its type). const TargetData *TD = 0; if (StoredVal->getType() != L->getType()) { if ((TD = getAnalysisIfAvailable())) { StoredVal = CoerceAvailableValueToLoadType(StoredVal, L->getType(), L, *TD); if (StoredVal == 0) return false; DEBUG(dbgs() << "GVN COERCED STORE:\n" << *DepSI << '\n' << *StoredVal << '\n' << *L << "\n\n\n"); } else return false; } // Remove it! L->replaceAllUsesWith(StoredVal); if (StoredVal->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(StoredVal); VN.erase(L); toErase.push_back(L); NumGVNLoad++; return true; } if (LoadInst *DepLI = dyn_cast(DepInst)) { Value *AvailableVal = DepLI; // The loads are of a must-aliased pointer, but they may not actually have // the same type. See if we know how to reuse the previously loaded value // (depending on its type). const TargetData *TD = 0; if (DepLI->getType() != L->getType()) { if ((TD = getAnalysisIfAvailable())) { AvailableVal = CoerceAvailableValueToLoadType(DepLI, L->getType(), L,*TD); if (AvailableVal == 0) return false; DEBUG(dbgs() << "GVN COERCED LOAD:\n" << *DepLI << "\n" << *AvailableVal << "\n" << *L << "\n\n\n"); } else return false; } // Remove it! L->replaceAllUsesWith(AvailableVal); if (DepLI->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(DepLI); VN.erase(L); toErase.push_back(L); NumGVNLoad++; return true; } // If this load really doesn't depend on anything, then we must be loading an // undef value. This can happen when loading for a fresh allocation with no // intervening stores, for example. if (isa(DepInst) || isMalloc(DepInst)) { L->replaceAllUsesWith(UndefValue::get(L->getType())); VN.erase(L); toErase.push_back(L); NumGVNLoad++; return true; } // If this load occurs either right after a lifetime begin, // then the loaded value is undefined. if (IntrinsicInst* II = dyn_cast(DepInst)) { if (II->getIntrinsicID() == Intrinsic::lifetime_start) { L->replaceAllUsesWith(UndefValue::get(L->getType())); VN.erase(L); toErase.push_back(L); NumGVNLoad++; return true; } } return false; } Value *GVN::lookupNumber(BasicBlock *BB, uint32_t num) { DenseMap::iterator I = localAvail.find(BB); if (I == localAvail.end()) return 0; ValueNumberScope *Locals = I->second; while (Locals) { DenseMap::iterator I = Locals->table.find(num); if (I != Locals->table.end()) return I->second; Locals = Locals->parent; } return 0; } /// processInstruction - When calculating availability, handle an instruction /// by inserting it into the appropriate sets bool GVN::processInstruction(Instruction *I, SmallVectorImpl &toErase) { // Ignore dbg info intrinsics. if (isa(I)) return false; if (LoadInst *LI = dyn_cast(I)) { bool Changed = processLoad(LI, toErase); if (!Changed) { unsigned Num = VN.lookup_or_add(LI); localAvail[I->getParent()]->table.insert(std::make_pair(Num, LI)); } return Changed; } uint32_t NextNum = VN.getNextUnusedValueNumber(); unsigned Num = VN.lookup_or_add(I); if (BranchInst *BI = dyn_cast(I)) { localAvail[I->getParent()]->table.insert(std::make_pair(Num, I)); if (!BI->isConditional() || isa(BI->getCondition())) return false; Value *BranchCond = BI->getCondition(); uint32_t CondVN = VN.lookup_or_add(BranchCond); BasicBlock *TrueSucc = BI->getSuccessor(0); BasicBlock *FalseSucc = BI->getSuccessor(1); if (TrueSucc->getSinglePredecessor()) localAvail[TrueSucc]->table[CondVN] = ConstantInt::getTrue(TrueSucc->getContext()); if (FalseSucc->getSinglePredecessor()) localAvail[FalseSucc]->table[CondVN] = ConstantInt::getFalse(TrueSucc->getContext()); return false; // Allocations are always uniquely numbered, so we can save time and memory // by fast failing them. } else if (isa(I) || isa(I)) { localAvail[I->getParent()]->table.insert(std::make_pair(Num, I)); return false; } // Collapse PHI nodes if (PHINode* p = dyn_cast(I)) { Value *constVal = CollapsePhi(p); if (constVal) { p->replaceAllUsesWith(constVal); if (MD && constVal->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(constVal); VN.erase(p); toErase.push_back(p); } else { localAvail[I->getParent()]->table.insert(std::make_pair(Num, I)); } // If the number we were assigned was a brand new VN, then we don't // need to do a lookup to see if the number already exists // somewhere in the domtree: it can't! } else if (Num == NextNum) { localAvail[I->getParent()]->table.insert(std::make_pair(Num, I)); // Perform fast-path value-number based elimination of values inherited from // dominators. } else if (Value *repl = lookupNumber(I->getParent(), Num)) { // Remove it! VN.erase(I); I->replaceAllUsesWith(repl); if (MD && repl->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(repl); toErase.push_back(I); return true; } else { localAvail[I->getParent()]->table.insert(std::make_pair(Num, I)); } return false; } /// runOnFunction - This is the main transformation entry point for a function. bool GVN::runOnFunction(Function& F) { if (!NoLoads) MD = &getAnalysis(); DT = &getAnalysis(); VN.setAliasAnalysis(&getAnalysis()); VN.setMemDep(MD); VN.setDomTree(DT); bool Changed = false; bool ShouldContinue = true; // Merge unconditional branches, allowing PRE to catch more // optimization opportunities. for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; ) { BasicBlock *BB = FI; ++FI; bool removedBlock = MergeBlockIntoPredecessor(BB, this); if (removedBlock) NumGVNBlocks++; Changed |= removedBlock; } unsigned Iteration = 0; while (ShouldContinue) { DEBUG(dbgs() << "GVN iteration: " << Iteration << "\n"); ShouldContinue = iterateOnFunction(F); if (splitCriticalEdges()) ShouldContinue = true; Changed |= ShouldContinue; ++Iteration; } if (EnablePRE) { bool PREChanged = true; while (PREChanged) { PREChanged = performPRE(F); Changed |= PREChanged; } } // FIXME: Should perform GVN again after PRE does something. PRE can move // computations into blocks where they become fully redundant. Note that // we can't do this until PRE's critical edge splitting updates memdep. // Actually, when this happens, we should just fully integrate PRE into GVN. cleanupGlobalSets(); return Changed; } bool GVN::processBlock(BasicBlock *BB) { // FIXME: Kill off toErase by doing erasing eagerly in a helper function (and // incrementing BI before processing an instruction). SmallVector toErase; bool ChangedFunction = false; for (BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI != BE;) { ChangedFunction |= processInstruction(BI, toErase); if (toErase.empty()) { ++BI; continue; } // If we need some instructions deleted, do it now. NumGVNInstr += toErase.size(); // Avoid iterator invalidation. bool AtStart = BI == BB->begin(); if (!AtStart) --BI; for (SmallVector::iterator I = toErase.begin(), E = toErase.end(); I != E; ++I) { DEBUG(dbgs() << "GVN removed: " << **I << '\n'); if (MD) MD->removeInstruction(*I); (*I)->eraseFromParent(); DEBUG(verifyRemoved(*I)); } toErase.clear(); if (AtStart) BI = BB->begin(); else ++BI; } return ChangedFunction; } /// performPRE - Perform a purely local form of PRE that looks for diamond /// control flow patterns and attempts to perform simple PRE at the join point. bool GVN::performPRE(Function &F) { bool Changed = false; DenseMap predMap; for (df_iterator DI = df_begin(&F.getEntryBlock()), DE = df_end(&F.getEntryBlock()); DI != DE; ++DI) { BasicBlock *CurrentBlock = *DI; // Nothing to PRE in the entry block. if (CurrentBlock == &F.getEntryBlock()) continue; for (BasicBlock::iterator BI = CurrentBlock->begin(), BE = CurrentBlock->end(); BI != BE; ) { Instruction *CurInst = BI++; if (isa(CurInst) || isa(CurInst) || isa(CurInst) || CurInst->getType()->isVoidTy() || CurInst->mayReadFromMemory() || CurInst->mayHaveSideEffects() || isa(CurInst)) continue; uint32_t ValNo = VN.lookup(CurInst); // Look for the predecessors for PRE opportunities. We're // only trying to solve the basic diamond case, where // a value is computed in the successor and one predecessor, // but not the other. We also explicitly disallow cases // where the successor is its own predecessor, because they're // more complicated to get right. unsigned NumWith = 0; unsigned NumWithout = 0; BasicBlock *PREPred = 0; predMap.clear(); for (pred_iterator PI = pred_begin(CurrentBlock), PE = pred_end(CurrentBlock); PI != PE; ++PI) { // We're not interested in PRE where the block is its // own predecessor, or in blocks with predecessors // that are not reachable. if (*PI == CurrentBlock) { NumWithout = 2; break; } else if (!localAvail.count(*PI)) { NumWithout = 2; break; } DenseMap::iterator predV = localAvail[*PI]->table.find(ValNo); if (predV == localAvail[*PI]->table.end()) { PREPred = *PI; NumWithout++; } else if (predV->second == CurInst) { NumWithout = 2; } else { predMap[*PI] = predV->second; NumWith++; } } // Don't do PRE when it might increase code size, i.e. when // we would need to insert instructions in more than one pred. if (NumWithout != 1 || NumWith == 0) continue; // Don't do PRE across indirect branch. if (isa(PREPred->getTerminator())) continue; // We can't do PRE safely on a critical edge, so instead we schedule // the edge to be split and perform the PRE the next time we iterate // on the function. unsigned SuccNum = GetSuccessorNumber(PREPred, CurrentBlock); if (isCriticalEdge(PREPred->getTerminator(), SuccNum)) { toSplit.push_back(std::make_pair(PREPred->getTerminator(), SuccNum)); continue; } // Instantiate the expression in the predecessor that lacked it. // Because we are going top-down through the block, all value numbers // will be available in the predecessor by the time we need them. Any // that weren't originally present will have been instantiated earlier // in this loop. Instruction *PREInstr = CurInst->clone(); bool success = true; for (unsigned i = 0, e = CurInst->getNumOperands(); i != e; ++i) { Value *Op = PREInstr->getOperand(i); if (isa(Op) || isa(Op) || isa(Op)) continue; if (Value *V = lookupNumber(PREPred, VN.lookup(Op))) { PREInstr->setOperand(i, V); } else { success = false; break; } } // Fail out if we encounter an operand that is not available in // the PRE predecessor. This is typically because of loads which // are not value numbered precisely. if (!success) { delete PREInstr; DEBUG(verifyRemoved(PREInstr)); continue; } PREInstr->insertBefore(PREPred->getTerminator()); PREInstr->setName(CurInst->getName() + ".pre"); predMap[PREPred] = PREInstr; VN.add(PREInstr, ValNo); NumGVNPRE++; // Update the availability map to include the new instruction. localAvail[PREPred]->table.insert(std::make_pair(ValNo, PREInstr)); // Create a PHI to make the value available in this block. PHINode* Phi = PHINode::Create(CurInst->getType(), CurInst->getName() + ".pre-phi", CurrentBlock->begin()); for (pred_iterator PI = pred_begin(CurrentBlock), PE = pred_end(CurrentBlock); PI != PE; ++PI) Phi->addIncoming(predMap[*PI], *PI); VN.add(Phi, ValNo); localAvail[CurrentBlock]->table[ValNo] = Phi; CurInst->replaceAllUsesWith(Phi); if (MD && Phi->getType()->isPointerTy()) MD->invalidateCachedPointerInfo(Phi); VN.erase(CurInst); DEBUG(dbgs() << "GVN PRE removed: " << *CurInst << '\n'); if (MD) MD->removeInstruction(CurInst); CurInst->eraseFromParent(); DEBUG(verifyRemoved(CurInst)); Changed = true; } } if (splitCriticalEdges()) Changed = true; return Changed; } /// splitCriticalEdges - Split critical edges found during the previous /// iteration that may enable further optimization. bool GVN::splitCriticalEdges() { if (toSplit.empty()) return false; do { std::pair Edge = toSplit.pop_back_val(); SplitCriticalEdge(Edge.first, Edge.second, this); } while (!toSplit.empty()); if (MD) MD->invalidateCachedPredecessors(); return true; } /// iterateOnFunction - Executes one iteration of GVN bool GVN::iterateOnFunction(Function &F) { cleanupGlobalSets(); for (df_iterator DI = df_begin(DT->getRootNode()), DE = df_end(DT->getRootNode()); DI != DE; ++DI) { if (DI->getIDom()) localAvail[DI->getBlock()] = new ValueNumberScope(localAvail[DI->getIDom()->getBlock()]); else localAvail[DI->getBlock()] = new ValueNumberScope(0); } // Top-down walk of the dominator tree bool Changed = false; #if 0 // Needed for value numbering with phi construction to work. ReversePostOrderTraversal RPOT(&F); for (ReversePostOrderTraversal::rpo_iterator RI = RPOT.begin(), RE = RPOT.end(); RI != RE; ++RI) Changed |= processBlock(*RI); #else for (df_iterator DI = df_begin(DT->getRootNode()), DE = df_end(DT->getRootNode()); DI != DE; ++DI) Changed |= processBlock(DI->getBlock()); #endif return Changed; } void GVN::cleanupGlobalSets() { VN.clear(); for (DenseMap::iterator I = localAvail.begin(), E = localAvail.end(); I != E; ++I) delete I->second; localAvail.clear(); } /// verifyRemoved - Verify that the specified instruction does not occur in our /// internal data structures. void GVN::verifyRemoved(const Instruction *Inst) const { VN.verifyRemoved(Inst); // Walk through the value number scope to make sure the instruction isn't // ferreted away in it. for (DenseMap::const_iterator I = localAvail.begin(), E = localAvail.end(); I != E; ++I) { const ValueNumberScope *VNS = I->second; while (VNS) { for (DenseMap::const_iterator II = VNS->table.begin(), IE = VNS->table.end(); II != IE; ++II) { assert(II->second != Inst && "Inst still in value numbering scope!"); } VNS = VNS->parent; } } } diff --git a/lib/VMCore/Metadata.cpp b/lib/VMCore/Metadata.cpp index 092fe00a5369..b894ea30aa98 100644 --- a/lib/VMCore/Metadata.cpp +++ b/lib/VMCore/Metadata.cpp @@ -1,571 +1,577 @@ //===-- Metadata.cpp - Implement Metadata classes -------------------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file implements the Metadata classes. // //===----------------------------------------------------------------------===// #include "llvm/Metadata.h" #include "LLVMContextImpl.h" #include "llvm/LLVMContext.h" #include "llvm/Module.h" #include "llvm/Instruction.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/StringMap.h" #include "llvm/ADT/SmallString.h" #include "SymbolTableListTraitsImpl.h" #include "llvm/Support/ValueHandle.h" using namespace llvm; //===----------------------------------------------------------------------===// // MDString implementation. // MDString::MDString(LLVMContext &C, StringRef S) : Value(Type::getMetadataTy(C), Value::MDStringVal), Str(S) {} MDString *MDString::get(LLVMContext &Context, StringRef Str) { LLVMContextImpl *pImpl = Context.pImpl; StringMapEntry &Entry = pImpl->MDStringCache.GetOrCreateValue(Str); MDString *&S = Entry.getValue(); if (!S) S = new MDString(Context, Entry.getKey()); return S; } //===----------------------------------------------------------------------===// // MDNodeOperand implementation. // // Use CallbackVH to hold MDNode operands. namespace llvm { class MDNodeOperand : public CallbackVH { MDNode *Parent; public: MDNodeOperand(Value *V, MDNode *P) : CallbackVH(V), Parent(P) {} ~MDNodeOperand() {} void set(Value *V) { setValPtr(V); } virtual void deleted(); virtual void allUsesReplacedWith(Value *NV); }; } // end namespace llvm. void MDNodeOperand::deleted() { Parent->replaceOperand(this, 0); } void MDNodeOperand::allUsesReplacedWith(Value *NV) { Parent->replaceOperand(this, NV); } //===----------------------------------------------------------------------===// // MDNode implementation. // /// getOperandPtr - Helper function to get the MDNodeOperand's coallocated on /// the end of the MDNode. static MDNodeOperand *getOperandPtr(MDNode *N, unsigned Op) { assert(Op < N->getNumOperands() && "Invalid operand number"); return reinterpret_cast(N+1)+Op; } MDNode::MDNode(LLVMContext &C, Value *const *Vals, unsigned NumVals, bool isFunctionLocal) : Value(Type::getMetadataTy(C), Value::MDNodeVal) { NumOperands = NumVals; if (isFunctionLocal) setValueSubclassData(getSubclassDataFromValue() | FunctionLocalBit); // Initialize the operand list, which is co-allocated on the end of the node. for (MDNodeOperand *Op = getOperandPtr(this, 0), *E = Op+NumOperands; Op != E; ++Op, ++Vals) new (Op) MDNodeOperand(*Vals, this); } /// ~MDNode - Destroy MDNode. MDNode::~MDNode() { assert((getSubclassDataFromValue() & DestroyFlag) != 0 && "Not being destroyed through destroy()?"); LLVMContextImpl *pImpl = getType()->getContext().pImpl; if (isNotUniqued()) { pImpl->NonUniquedMDNodes.erase(this); } else { pImpl->MDNodeSet.RemoveNode(this); } // Destroy the operands. for (MDNodeOperand *Op = getOperandPtr(this, 0), *E = Op+NumOperands; Op != E; ++Op) Op->~MDNodeOperand(); } static const Function *getFunctionForValue(Value *V) { - assert(!isa(V) && "does not iterate over metadata operands"); if (!V) return NULL; - if (Instruction *I = dyn_cast(V)) - return I->getParent()->getParent(); - if (BasicBlock *BB = dyn_cast(V)) - return BB->getParent(); + if (Instruction *I = dyn_cast(V)) { + BasicBlock *BB = I->getParent(); + return BB ? BB->getParent() : 0; + } if (Argument *A = dyn_cast(V)) return A->getParent(); + if (BasicBlock *BB = dyn_cast(V)) + return BB->getParent(); + if (MDNode *MD = dyn_cast(V)) + return MD->getFunction(); return NULL; } #ifndef NDEBUG static const Function *assertLocalFunction(const MDNode *N) { if (!N->isFunctionLocal()) return 0; const Function *F = 0, *NewF = 0; for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) { if (Value *V = N->getOperand(i)) { if (MDNode *MD = dyn_cast(V)) NewF = assertLocalFunction(MD); else NewF = getFunctionForValue(V); } if (F == 0) F = NewF; else assert((NewF == 0 || F == NewF) &&"inconsistent function-local metadata"); } return F; } #endif // getFunction - If this metadata is function-local and recursively has a // function-local operand, return the first such operand's parent function. // Otherwise, return null. getFunction() should not be used for performance- // critical code because it recursively visits all the MDNode's operands. const Function *MDNode::getFunction() const { #ifndef NDEBUG return assertLocalFunction(this); #endif if (!isFunctionLocal()) return NULL; - - for (unsigned i = 0, e = getNumOperands(); i != e; ++i) { - if (Value *V = getOperand(i)) { - if (MDNode *MD = dyn_cast(V)) { - if (const Function *F = MD->getFunction()) - return F; - } else { - return getFunctionForValue(V); - } - } - } + for (unsigned i = 0, e = getNumOperands(); i != e; ++i) + if (const Function *F = getFunctionForValue(getOperand(i))) + return F; return NULL; } // destroy - Delete this node. Only when there are no uses. void MDNode::destroy() { setValueSubclassData(getSubclassDataFromValue() | DestroyFlag); // Placement delete, the free the memory. this->~MDNode(); free(this); } /// isFunctionLocalValue - Return true if this is a value that would require a /// function-local MDNode. static bool isFunctionLocalValue(Value *V) { return isa(V) || isa(V) || isa(V) || (isa(V) && cast(V)->isFunctionLocal()); } MDNode *MDNode::getMDNode(LLVMContext &Context, Value *const *Vals, unsigned NumVals, FunctionLocalness FL, bool Insert) { LLVMContextImpl *pImpl = Context.pImpl; bool isFunctionLocal = false; switch (FL) { case FL_Unknown: for (unsigned i = 0; i != NumVals; ++i) { Value *V = Vals[i]; if (!V) continue; if (isFunctionLocalValue(V)) { isFunctionLocal = true; break; } } break; case FL_No: isFunctionLocal = false; break; case FL_Yes: isFunctionLocal = true; break; } FoldingSetNodeID ID; for (unsigned i = 0; i != NumVals; ++i) ID.AddPointer(Vals[i]); ID.AddBoolean(isFunctionLocal); void *InsertPoint; MDNode *N = NULL; if ((N = pImpl->MDNodeSet.FindNodeOrInsertPos(ID, InsertPoint))) return N; if (!Insert) return NULL; // Coallocate space for the node and Operands together, then placement new. void *Ptr = malloc(sizeof(MDNode)+NumVals*sizeof(MDNodeOperand)); N = new (Ptr) MDNode(Context, Vals, NumVals, isFunctionLocal); // InsertPoint will have been set by the FindNodeOrInsertPos call. pImpl->MDNodeSet.InsertNode(N, InsertPoint); return N; } MDNode *MDNode::get(LLVMContext &Context, Value*const* Vals, unsigned NumVals) { return getMDNode(Context, Vals, NumVals, FL_Unknown); } MDNode *MDNode::getWhenValsUnresolved(LLVMContext &Context, Value *const *Vals, unsigned NumVals, bool isFunctionLocal) { return getMDNode(Context, Vals, NumVals, isFunctionLocal ? FL_Yes : FL_No); } MDNode *MDNode::getIfExists(LLVMContext &Context, Value *const *Vals, unsigned NumVals) { return getMDNode(Context, Vals, NumVals, FL_Unknown, false); } /// getOperand - Return specified operand. Value *MDNode::getOperand(unsigned i) const { return *getOperandPtr(const_cast(this), i); } void MDNode::Profile(FoldingSetNodeID &ID) const { for (unsigned i = 0, e = getNumOperands(); i != e; ++i) ID.AddPointer(getOperand(i)); ID.AddBoolean(isFunctionLocal()); } void MDNode::setIsNotUniqued() { setValueSubclassData(getSubclassDataFromValue() | NotUniquedBit); LLVMContextImpl *pImpl = getType()->getContext().pImpl; pImpl->NonUniquedMDNodes.insert(this); } // Replace value from this node's operand list. void MDNode::replaceOperand(MDNodeOperand *Op, Value *To) { Value *From = *Op; // If is possible that someone did GV->RAUW(inst), replacing a global variable // with an instruction or some other function-local object. If this is a // non-function-local MDNode, it can't point to a function-local object. // Handle this case by implicitly dropping the MDNode reference to null. - if (!isFunctionLocal() && To && isFunctionLocalValue(To)) - To = 0; + // Likewise if the MDNode is function-local but for a different function. + if (To && isFunctionLocalValue(To)) { + if (!isFunctionLocal()) + To = 0; + else { + const Function *F = getFunction(); + const Function *FV = getFunctionForValue(To); + // Metadata can be function-local without having an associated function. + // So only consider functions to have changed if non-null. + if (F && FV && F != FV) + To = 0; + } + } if (From == To) return; // Update the operand. Op->set(To); // If this node is already not being uniqued (because one of the operands // already went to null), then there is nothing else to do here. if (isNotUniqued()) return; LLVMContextImpl *pImpl = getType()->getContext().pImpl; // Remove "this" from the context map. FoldingSet doesn't have to reprofile // this node to remove it, so we don't care what state the operands are in. pImpl->MDNodeSet.RemoveNode(this); // If we are dropping an argument to null, we choose to not unique the MDNode // anymore. This commonly occurs during destruction, and uniquing these // brings little reuse. if (To == 0) { setIsNotUniqued(); return; } // Now that the node is out of the folding set, get ready to reinsert it. // First, check to see if another node with the same operands already exists // in the set. If it doesn't exist, this returns the position to insert it. FoldingSetNodeID ID; Profile(ID); void *InsertPoint; MDNode *N = pImpl->MDNodeSet.FindNodeOrInsertPos(ID, InsertPoint); if (N) { N->replaceAllUsesWith(this); N->destroy(); N = pImpl->MDNodeSet.FindNodeOrInsertPos(ID, InsertPoint); assert(N == 0 && "shouldn't be in the map now!"); (void)N; } // InsertPoint will have been set by the FindNodeOrInsertPos call. pImpl->MDNodeSet.InsertNode(this, InsertPoint); } //===----------------------------------------------------------------------===// // NamedMDNode implementation. // namespace llvm { // SymbolTableListTraits specialization for MDSymbolTable. void ilist_traits ::addNodeToList(NamedMDNode *N) { assert(N->getParent() == 0 && "Value already in a container!!"); Module *Owner = getListOwner(); N->setParent(Owner); MDSymbolTable &ST = Owner->getMDSymbolTable(); ST.insert(N->getName(), N); } void ilist_traits::removeNodeFromList(NamedMDNode *N) { N->setParent(0); Module *Owner = getListOwner(); MDSymbolTable &ST = Owner->getMDSymbolTable(); ST.remove(N->getName()); } } static SmallVector &getNMDOps(void *Operands) { return *(SmallVector*)Operands; } NamedMDNode::NamedMDNode(LLVMContext &C, const Twine &N, MDNode *const *MDs, unsigned NumMDs, Module *ParentModule) : Value(Type::getMetadataTy(C), Value::NamedMDNodeVal), Parent(0) { setName(N); Operands = new SmallVector(); SmallVector &Node = getNMDOps(Operands); for (unsigned i = 0; i != NumMDs; ++i) Node.push_back(WeakVH(MDs[i])); if (ParentModule) ParentModule->getNamedMDList().push_back(this); } NamedMDNode *NamedMDNode::Create(const NamedMDNode *NMD, Module *M) { assert(NMD && "Invalid source NamedMDNode!"); SmallVector Elems; Elems.reserve(NMD->getNumOperands()); for (unsigned i = 0, e = NMD->getNumOperands(); i != e; ++i) Elems.push_back(NMD->getOperand(i)); return new NamedMDNode(NMD->getContext(), NMD->getName().data(), Elems.data(), Elems.size(), M); } NamedMDNode::~NamedMDNode() { dropAllReferences(); delete &getNMDOps(Operands); } /// getNumOperands - Return number of NamedMDNode operands. unsigned NamedMDNode::getNumOperands() const { return (unsigned)getNMDOps(Operands).size(); } /// getOperand - Return specified operand. MDNode *NamedMDNode::getOperand(unsigned i) const { assert(i < getNumOperands() && "Invalid Operand number!"); return dyn_cast_or_null(getNMDOps(Operands)[i]); } /// addOperand - Add metadata Operand. void NamedMDNode::addOperand(MDNode *M) { getNMDOps(Operands).push_back(WeakVH(M)); } /// eraseFromParent - Drop all references and remove the node from parent /// module. void NamedMDNode::eraseFromParent() { getParent()->getNamedMDList().erase(this); } /// dropAllReferences - Remove all uses and clear node vector. void NamedMDNode::dropAllReferences() { getNMDOps(Operands).clear(); } /// setName - Set the name of this named metadata. void NamedMDNode::setName(const Twine &NewName) { assert (!NewName.isTriviallyEmpty() && "Invalid named metadata name!"); SmallString<256> NameData; StringRef NameRef = NewName.toStringRef(NameData); // Name isn't changing? if (getName() == NameRef) return; Name = NameRef.str(); if (Parent) Parent->getMDSymbolTable().insert(NameRef, this); } /// getName - Return a constant reference to this named metadata's name. StringRef NamedMDNode::getName() const { return StringRef(Name); } //===----------------------------------------------------------------------===// // Instruction Metadata method implementations. // void Instruction::setMetadata(const char *Kind, MDNode *Node) { if (Node == 0 && !hasMetadata()) return; setMetadata(getContext().getMDKindID(Kind), Node); } MDNode *Instruction::getMetadataImpl(const char *Kind) const { return getMetadataImpl(getContext().getMDKindID(Kind)); } void Instruction::setDbgMetadata(MDNode *Node) { DbgLoc = DebugLoc::getFromDILocation(Node); } /// setMetadata - Set the metadata of of the specified kind to the specified /// node. This updates/replaces metadata if already present, or removes it if /// Node is null. void Instruction::setMetadata(unsigned KindID, MDNode *Node) { if (Node == 0 && !hasMetadata()) return; // Handle 'dbg' as a special case since it is not stored in the hash table. if (KindID == LLVMContext::MD_dbg) { DbgLoc = DebugLoc::getFromDILocation(Node); return; } // Handle the case when we're adding/updating metadata on an instruction. if (Node) { LLVMContextImpl::MDMapTy &Info = getContext().pImpl->MetadataStore[this]; assert(!Info.empty() == hasMetadataHashEntry() && "HasMetadata bit is wonked"); if (Info.empty()) { setHasMetadataHashEntry(true); } else { // Handle replacement of an existing value. for (unsigned i = 0, e = Info.size(); i != e; ++i) if (Info[i].first == KindID) { Info[i].second = Node; return; } } // No replacement, just add it to the list. Info.push_back(std::make_pair(KindID, Node)); return; } // Otherwise, we're removing metadata from an instruction. assert(hasMetadataHashEntry() && getContext().pImpl->MetadataStore.count(this) && "HasMetadata bit out of date!"); LLVMContextImpl::MDMapTy &Info = getContext().pImpl->MetadataStore[this]; // Common case is removing the only entry. if (Info.size() == 1 && Info[0].first == KindID) { getContext().pImpl->MetadataStore.erase(this); setHasMetadataHashEntry(false); return; } // Handle removal of an existing value. for (unsigned i = 0, e = Info.size(); i != e; ++i) if (Info[i].first == KindID) { Info[i] = Info.back(); Info.pop_back(); assert(!Info.empty() && "Removing last entry should be handled above"); return; } // Otherwise, removing an entry that doesn't exist on the instruction. } MDNode *Instruction::getMetadataImpl(unsigned KindID) const { // Handle 'dbg' as a special case since it is not stored in the hash table. if (KindID == LLVMContext::MD_dbg) return DbgLoc.getAsMDNode(getContext()); if (!hasMetadataHashEntry()) return 0; LLVMContextImpl::MDMapTy &Info = getContext().pImpl->MetadataStore[this]; assert(!Info.empty() && "bit out of sync with hash table"); for (LLVMContextImpl::MDMapTy::iterator I = Info.begin(), E = Info.end(); I != E; ++I) if (I->first == KindID) return I->second; return 0; } void Instruction::getAllMetadataImpl(SmallVectorImpl > &Result) const { Result.clear(); // Handle 'dbg' as a special case since it is not stored in the hash table. if (!DbgLoc.isUnknown()) { Result.push_back(std::make_pair((unsigned)LLVMContext::MD_dbg, DbgLoc.getAsMDNode(getContext()))); if (!hasMetadataHashEntry()) return; } assert(hasMetadataHashEntry() && getContext().pImpl->MetadataStore.count(this) && "Shouldn't have called this"); const LLVMContextImpl::MDMapTy &Info = getContext().pImpl->MetadataStore.find(this)->second; assert(!Info.empty() && "Shouldn't have called this"); Result.append(Info.begin(), Info.end()); // Sort the resulting array so it is stable. if (Result.size() > 1) array_pod_sort(Result.begin(), Result.end()); } void Instruction:: getAllMetadataOtherThanDebugLocImpl(SmallVectorImpl > &Result) const { Result.clear(); assert(hasMetadataHashEntry() && getContext().pImpl->MetadataStore.count(this) && "Shouldn't have called this"); const LLVMContextImpl::MDMapTy &Info = getContext().pImpl->MetadataStore.find(this)->second; assert(!Info.empty() && "Shouldn't have called this"); Result.append(Info.begin(), Info.end()); // Sort the resulting array so it is stable. if (Result.size() > 1) array_pod_sort(Result.begin(), Result.end()); } /// removeAllMetadata - Remove all metadata from this instruction. void Instruction::removeAllMetadata() { assert(hasMetadata() && "Caller should check"); DbgLoc = DebugLoc(); if (hasMetadataHashEntry()) { getContext().pImpl->MetadataStore.erase(this); setHasMetadataHashEntry(false); } } diff --git a/test/CodeGen/CellSPU/storestruct.ll b/test/CodeGen/CellSPU/storestruct.ll new file mode 100644 index 000000000000..47185e829661 --- /dev/null +++ b/test/CodeGen/CellSPU/storestruct.ll @@ -0,0 +1,13 @@ +; RUN: llc < %s -march=cellspu | FileCheck %s + +%0 = type {i32, i32} +@buffer = global [ 72 x %0 ] zeroinitializer + +define void@test( ) { +; Check that there is no illegal "a rt, ra, imm" instruction +; CHECK-NOT: a {{\$., \$., 5..}} +; CHECK: a {{\$., \$., \$.}} + store %0 {i32 1, i32 2} , + %0* getelementptr ([72 x %0]* @buffer, i32 0, i32 71) + ret void +} diff --git a/test/MC/AsmParser/X86/x86_64-suffix-matching.s b/test/MC/AsmParser/X86/x86_64-suffix-matching.s new file mode 100644 index 000000000000..c4f0be2c6eab --- /dev/null +++ b/test/MC/AsmParser/X86/x86_64-suffix-matching.s @@ -0,0 +1,6 @@ +// RUN: llvm-mc -triple x86_64 -o - %s | FileCheck %s + +// CHECK: addl $0, %eax + add $0, %eax +// CHECK: addb $255, %al + add $0xFF, %al diff --git a/test/Transforms/GlobalOpt/metadata.ll b/test/Transforms/GlobalOpt/metadata.ll index a09ba72439fc..730e2b080236 100644 --- a/test/Transforms/GlobalOpt/metadata.ll +++ b/test/Transforms/GlobalOpt/metadata.ll @@ -1,19 +1,26 @@ ; RUN: opt -S -globalopt < %s | FileCheck %s ; PR6112 - When globalopt does RAUW(@G, %G), the metadata reference should drop -; to null. +; to null. Function local metadata that references @G from a different function +; to that containing %G should likewise drop to null. @G = internal global i8** null define i32 @main(i32 %argc, i8** %argv) { ; CHECK: @main ; CHECK: %G = alloca store i8** %argv, i8*** @G ret i32 0 } -!named = !{!0} +define void @foo(i32 %x) { + call void @llvm.foo(metadata !{i8*** @G, i32 %x}) +; CHECK: call void @llvm.foo(metadata !{null, i32 %x}) + ret void +} -; CHECK: !0 = metadata !{null} -!0 = metadata !{i8*** @G} +declare void @llvm.foo(metadata) nounwind readnone +!named = !{!0} +!0 = metadata !{i8*** @G} +; CHECK: !0 = metadata !{null} diff --git a/utils/TableGen/ClangDiagnosticsEmitter.cpp b/utils/TableGen/ClangDiagnosticsEmitter.cpp index 27b16544ce1d..d0e813bc2733 100644 --- a/utils/TableGen/ClangDiagnosticsEmitter.cpp +++ b/utils/TableGen/ClangDiagnosticsEmitter.cpp @@ -1,170 +1,285 @@ //=- ClangDiagnosticsEmitter.cpp - Generate Clang diagnostics tables -*- C++ -*- // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // These tablegen backends emit Clang diagnostics tables. // //===----------------------------------------------------------------------===// #include "ClangDiagnosticsEmitter.h" #include "Record.h" #include "llvm/Support/Debug.h" #include "llvm/Support/Compiler.h" #include "llvm/ADT/DenseSet.h" #include "llvm/ADT/StringExtras.h" +#include "llvm/ADT/StringMap.h" #include "llvm/ADT/VectorExtras.h" #include #include using namespace llvm; +//===----------------------------------------------------------------------===// +// Diagnostic category computation code. +//===----------------------------------------------------------------------===// + +namespace { +class DiagGroupParentMap { + std::map > Mapping; +public: + DiagGroupParentMap() { + std::vector DiagGroups + = Records.getAllDerivedDefinitions("DiagGroup"); + for (unsigned i = 0, e = DiagGroups.size(); i != e; ++i) { + std::vector SubGroups = + DiagGroups[i]->getValueAsListOfDefs("SubGroups"); + for (unsigned j = 0, e = SubGroups.size(); j != e; ++j) + Mapping[SubGroups[j]].push_back(DiagGroups[i]); + } + } + + const std::vector &getParents(const Record *Group) { + return Mapping[Group]; + } +}; +} // end anonymous namespace. + + +static std::string +getCategoryFromDiagGroup(const Record *Group, + DiagGroupParentMap &DiagGroupParents) { + // If the DiagGroup has a category, return it. + std::string CatName = Group->getValueAsString("CategoryName"); + if (!CatName.empty()) return CatName; + + // The diag group may the subgroup of one or more other diagnostic groups, + // check these for a category as well. + const std::vector &Parents = DiagGroupParents.getParents(Group); + for (unsigned i = 0, e = Parents.size(); i != e; ++i) { + CatName = getCategoryFromDiagGroup(Parents[i], DiagGroupParents); + if (!CatName.empty()) return CatName; + } + return ""; +} + +/// getDiagnosticCategory - Return the category that the specified diagnostic +/// lives in. +static std::string getDiagnosticCategory(const Record *R, + DiagGroupParentMap &DiagGroupParents) { + // If the diagnostic itself has a category, get it. + std::string CatName = R->getValueAsString("CategoryName"); + if (!CatName.empty()) return CatName; + + DefInit *Group = dynamic_cast(R->getValueInit("Group")); + if (Group == 0) return ""; + + // Check the diagnostic's diag group for a category. + return getCategoryFromDiagGroup(Group->getDef(), DiagGroupParents); +} + +namespace { + class DiagCategoryIDMap { + StringMap CategoryIDs; + std::vector CategoryStrings; + public: + DiagCategoryIDMap() { + DiagGroupParentMap ParentInfo; + + // The zero'th category is "". + CategoryStrings.push_back(""); + CategoryIDs[""] = 0; + + std::vector Diags = + Records.getAllDerivedDefinitions("Diagnostic"); + for (unsigned i = 0, e = Diags.size(); i != e; ++i) { + std::string Category = getDiagnosticCategory(Diags[i], ParentInfo); + if (Category.empty()) continue; // Skip diags with no category. + + unsigned &ID = CategoryIDs[Category]; + if (ID != 0) continue; // Already seen. + + ID = CategoryStrings.size(); + CategoryStrings.push_back(Category); + } + } + + unsigned getID(StringRef CategoryString) { + return CategoryIDs[CategoryString]; + } + + typedef std::vector::iterator iterator; + iterator begin() { return CategoryStrings.begin(); } + iterator end() { return CategoryStrings.end(); } + }; +} // end anonymous namespace. + + + //===----------------------------------------------------------------------===// // Warning Tables (.inc file) generation. //===----------------------------------------------------------------------===// void ClangDiagsDefsEmitter::run(raw_ostream &OS) { // Write the #if guard if (!Component.empty()) { std::string ComponentName = UppercaseString(Component); OS << "#ifdef " << ComponentName << "START\n"; OS << "__" << ComponentName << "START = DIAG_START_" << ComponentName << ",\n"; OS << "#undef " << ComponentName << "START\n"; OS << "#endif\n\n"; } const std::vector