alloca (dynamic stack allocation) on Mill

Tagged: alloca, calls, scratchpad, stack

Author
Posts
Witold Baryluk
Participant
January 8, 2021 at 7:17 am
Post count: 33
#3659 |
Hi,
I was wondering how Mill with most of the function local data not on real stack, would approach implementing C alloca, or C99 VLA (Variable Length Arrays).
For example:
```
#include <stdlib.h>

struct S {
    int a;
    float b;
};

void enumerate(int* count, struct S* data1, struct S* data2);

int example(int k) {
    int count = 0;
    enumerate(&count, NULL, NULL);
    struct S* data1 = alloca(count * sizeof(struct S));
    struct S* data2 = alloca(count * sizeof(struct S));
    enumerate(&count, data1, data2);
    int ret = 0;
    for (int i = 0; i < count; i++) {
        ret += (data1[i].a + k) * (data2[i].a - k);
    }
    return ret;
}
```
I guess, it might depend on calling conventions, but I don’t expect for every function to have real stack, as this could be slightly deterimental to performance. Or would having functions that use zero stack, be fine, and just not advance the in-memory stack?
Or some other tricks to do it? Maybe for small dynamic stack allocations a scratchpad can be used somehow dynamically?
Ivan Godard
Keymaster
January 9, 2021 at 1:26 am
Post count: 689
#3663
There is an alloca operation in the Mill ISA, used for both alloca() and VLAs. It changes the internal specRegs to dynamically allocate space in the current stacklet, which space implicitly deallocates on function return. There are some issues that must be addressed by the implementation:
what happens when there is more than one alloca in an instruction bundle?
The order of effect is not defined, nor is the order between bundles in the same EBB. The only way the order could make a difference is if the program compared the addresses returned by the two alloca invokes, which comparison is illegal per the C and C++ standards.
What happens if the requested allocation does not fit in the current partially occupied stacklet?
This is handled the same way that stacklet overflow for ordinary locals is handled: the hardware (directly or via a trap, depending on the member implementation) dynamically allocates a new stacklet in the address space, allocates the desired range, makes it visible in the PLB to the current turf, and arranges that the stack cutback at return will reverse the allocation. There is some magic that prevent excessive cost if the allocation repeatedly crosses a stacklet with an iterative allocate/deallocate sequence; NYF.
What happens if the allocation is too large to fit in an empty standard stacklet?
There has been some sentiment in favor of simply banning (and faulting) this usage; after all, other systems do have upper bounds on the size of alloca’s. However, currently the ISA defines that this condition will be supported by the hardware, typically via traps to system software for both allocation and deallocation. The upper bound is defined by the application’s memory quantum, as with other allocations of raw address space like mmap().
- Witold Baryluk
  Participant
  January 9, 2021 at 4:51 am
  Post count: 33
  #3665
  Hi Ivan.
  Thank you for quick and detailed elaboration!
  That is really fascinating, and interesting approach. Definitively ultimately better, but harder to implement. Having it in abstract ISA, and then defer to software emulation in cAsm or via traps, as you said is definitively interesting and better long term option. I do agree with all what you have said really, but I don’t know how common are these things in real code to determine if it is worth doing as optimal as possible. But it looks you are pretty close, and by having this in ISA, you actual open many interesting new options in language and compiler design.
  The NYF part is definitively interesting to learn one day. I have trouble imagining how one would implement this in this scheme, but I didn’t think long enough about it yet 😀
  Cheers.
  PS. I hope you and the team are doing fine, and there are no major roadblocks beyond time. Hope to hear some updates or talks in 2021.
Findecanor
Participant
September 3, 2023 at 2:35 pm
Post count: 34
#3954
Are there still saveStack and restoreStack instructions as in the old Wiki, so that you could do
saveStack(); allocStack(…); … ; restoreStack() ?
Do they have the same semantics as LLVM’s alloca, llvm.stacksave() and llvm.stackrestore() ?
And will the memory be reused and reset to zero if I do a new allocStack() after a restoreStack()?
- This reply was modified 10 months, 3 weeks ago by Findecanor.
Author
Posts