Mill Computing, Inc. Forums The Mill Tools Compilers Creating an LLVM backend for the Mill

  • Author
    Posts
  • mseaborn
    Participant
    Post count: 3
    #1188 |

    I am curious whether you are going to try to adapt LLVM’s SelectionDAG/TableGen-based backend to target the Mill, or whether you’re going to write an LLVM backend from scratch.

    I imagine the latter would be easier. I expect it would be fairly simple to convert LLVM IR to your abstract-Mill load-module format. I expect that conversion wouldn’t benefit much from LLVM’s SelectionDAG/TableGen infrastructure.

    I’m not very familiar with the SelectionDAG/TableGen infrastructure, but I hear it’s somewhat difficult to understand, and even people who’ve added a new target architecture to it say they don’t really understand it. :-) I suspect it might make assumptions that aren’t suitable for the Mill, such as assuming a register machine.

    LLVM’s backend normally performs some lowering, such as legalising integer types and expanding GEPs, as part of creating a SelectionDAG. If you do choose to create an LLVM backend from scratch, you might find PNaCl’s LLVM IR simplification passes useful for doing that lowering.

    These passes lower complex IR features to simpler IR features. Some examples are:

    * ExpandVarArgs and ExpandByVal lower varargs and by-value argument passing respectively.

    * ExpandStructRegs splits up struct values into scalars, removing the “insertvalue” and “extractvalue” instructions.

    * PromoteIntegers legalizes integer types. For example, i30 is converted to i32.

    We created them so that the more-complex LLVM IR features wouldn’t need to be part of PNaCl’s long-term-stable bitcode format. Emscripten, which compiles LLVM IR to Javascript, is currently using them too (see http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-March/070845.html).

    You can find the code here:

    https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/

  • Ivan Godard
    Keymaster
    Post count: 689

    The current LLVM port effort does not use the LLVM back end as such, nor does it accept the LLVM middle-to-backend IR, which discards information that the specializer needs. The replacement for that IR is a serialization of the middle-end dataflow graph structure, or at least we think so; work is still to early to be sure.

    The intent is that the input to the specializer is of a form that permits a very fast and simple specialize step. Operation selection has been done in the compiler, using an abstract Mill with all operations supported.

    We also will be adding a few passes to the middle-end, primarily to do operation selection. It’s good news if you have done some of these passes already. Type regularizing is certainly applicable. It’s not clear to me whether VARARGS can be handled in the compiler for us, because until we know what the belt length is (at specialize time) we don’t know how many can be passed in the belt. Of course, we might ignore that and pass all VARARGS on the stack; it’s not like performance matters much for that.

    Large arguments by value is an interesting problem for us because the Mill call may cross a protection boundary. It is necessary to have such arguments addressable by both the caller (to pass them) and the callee (to use them), and there are semantic and security issues involved. For example, can the caller pass a struct and put a pointer to the passed struct in a global variable, make the call, and then another thread of the same process modifies the passed struct while the callee is working on it? Such things are hard to get right; it much easier to ignore such problems, and say that exploits are an application problem, but that’s not Mill.

    It sounds like your expandStruct converts structs into tuples. That too is something we want to do, in part because we have to use tuples to be able to support Mill multi-result call operations. Although we are very tempted to add pass-by-copy to C/C++ as an extension as part of our work; IN, OUT, and INOUT parameters are a much more natural way to express multiple results than tuples IMO.

    If you and your team are Bay Area, I’d be happy to buy you a coffee and gain what we can from your experience with LLVM. Likewise any other Forum participant with LLVM internals experience who would like to help. You can reach me at ivan@millComputing.com.

You must be logged in to reply to this topic.