Mill Computing, Inc. Forums The Mill Tools Compilers Inlining functions vs. call/return

  • Author
    Posts
  • LarryP
    Participant
    Post count: 78
    #1201 |

    Greetings all,

    If I understand it right, nontrivial function calls

    (e.g. beyond int triv(int x) { return x;} ),

    will take at least three cycles on a Mill:

    one for the call,

    at least one do the function’s real work,

    and one for the return.

    While some functions are involved enough that a true function call is warranted or required, many functions’ real work may be doable in fewer than three cycles, especially on a wide Mill. This situation suggests that inlining functions may be important for getting the best performance on a Mill.

    Perhaps this situation (choosing whether or not to inline a particular function invocation) can make use of the previously-discussed ability of the compiler to supply alternative sequences. It seems to me that the choice of inlining a function on a Mill will depend substantially on (a) whether the function’s arguments will stay on the caller’s belt long enough for the inlined function to use them and (b) whether the inlined instructions result in too many additional spills/fills to be worth the trouble of inlining the function. My sense is that such inlining could be a real win, especially for leaf calls.

    Thoughts?

  • LarryP
    Participant
    Post count: 78

    For simple, non-iterative functions, such as getter/setter methods on objects, I suspect there can be substantial performance gains from inlining vs. calling them. By inlining such function invocations, most (if not all) of the function’s load operations can execute earlier (be raised within the calling EBB) than would likely happen if the function were to be called. So I suspect that inlining such simple functions will reduce the number of unused/no-op-ed slots executed, thus improving both speed and code density. In my experience, the sequence of:

    1. get (some property of an object),

    2. make a simple change,

    3. set the property to the changed value

    happens so often in object-oriented code that I think aggressive inlining of such sequences will make (compiled) OO-language code really fly on a Mill.

    —-

    [1] By simple, here I mean a function whose real work could be done in fewer operations than the target Mill could do in three full cycles, if the load latency were minimal (result available in the next cycle after the load.) For most getter methods, the real work is just a single load (usually from a compile-time known offset) from a base address. Likewise, a setter method is often just a store (again, frequently with a compile-time-known offset) to an address.

    So an inlined version of a simple getter/setter function could take up merely a single load/store-capable slot in a much wider instruction. A fraction of a Mill instruction is faster and more compact than three full instructions (call, at least one instruction for the function body, return.) And IMHO, simple methods are called frequently in much object-oriented code.

    • Will_Edwards
      Moderator
      Post count: 98

      Yes, we fully expect well weighed inlining. The specializer has all the necessary data and can inline aggressively. The Mill’s metadata allows speculation so we can even inline many conditional calls.

      In many of the talks Ivan mentions in passing that we plan to do these kinds of optimization, but the references may be quite obscure.

      • Ivan Godard
        Keymaster
        Post count: 689

        Addendum to Will’s comment:

        By far the greatest value to inlining on the Mill is that it exposes opportunities for hoisting loads. As you point out, it is common that the first thing a method does is to load a data member. If the function is inlined then the load’s retire point can be left at the inline position, but the load itself can be turned into a deferred load, hoisted enough to hide cache latency, with the deferred retire preserving proper retire ordering. Without the inline, the very first thing that the dunction will do is to stall for a cache miss.

You must be logged in to reply to this topic.