Mill Computing, Inc. Forums The Mill Architecture LLVM and pointers Reply To: LLVM and pointers

Ivan Godard
Keymaster
Post count: 689

Sorry for the delay in response; several posts here got blackholed by my mailer somehow 🙁

To answer your question: It depends on whether the program is using bounded pointers or not.

Bounded pointers constrain address arithmetic to the rules of C and C++: you can only use pointer arithmetic to move within an allocation or to one past the end of an array. This catches wild addresses that wind up pointing to areas for which the program has valid permissions; segv checking via page table or the Mill’s PLB can’t catch such wild addresses. The effect of bounded pointers is similar to what tools like Purify or valgrind do, but at hardware speeds and always on. Bounded pointers are the default, but you can override all or parts of legacy programs to let them do nasties to what they think is a flat address.

Bounds checking can only be done when the hardware knows that the value is a pointer, not an integer. So if you turn your pointer into an integer, diddle it, turn it back to a pointer, and dereference it, then the hardware will interpret its diddleship as an address. What happens then depends on the bit value, but is unlikely to be very useful. However, if you first convert the bounded pointer to an unbounded one, before diddling it, and do not overflow, then the resulting address will act like the equivalent address on a conventional legacy machine that doesn’t have the Mill security/reliability features. Of course it is an unbound pointer at that point; if you want it be again bounded for further use then you have to tell it what allocation you want to bind it to. There are ops for all of that.

However, the largest problem with LLVM’s conflation of pointer and integer is not pointer arithmetic, it is pointer comparisons. It is quite possible for two pointers to refer to the same object but not be bitwise identical. Bounded pointers provide an example: one pointer may be bound to a larger allocation, while another is bound to a contained sub-object. Both may refer to the same byte address, but the bounds information is different. Integer equal doesn’t work for that.