LLVM and pointers

Author
Posts
mikeakers
Participant
August 25, 2015 at 3:41 pm
Post count: 2
#1957 |
Maybe this is a question to ask on a LLVM mailing list, but…
I’m a little confused about how the “pointedness” of pointers is getting lost in LLVM. The Mill can’t be the only architecture that encodes extra data in the high bits of pointers right? Apple’s 64 bit Objective-C runtime and Xbox 360 come to mind as systems where this is done. Clearly LLVM can generate good code 64 bit Objective-C… Am I missing something?
Ivan, great presentation as usual. I always learn something new, and see things a little differently after watching one of your videos.
- This topic was modified 8 years, 11 months ago by mikeakers.
Ivan Godard
Keymaster
August 25, 2015 at 5:48 pm
Post count: 689
#1959
Yes others have had this problem too. We have a work-around that works for us on simple cases. Most of the trouble seemed to be in the back-end framework, so we have abandoned that (at unfortunate schedule cost) and are working direct from the IR from the middle-end. However, we are reasonably confident that there remain pointerhood-losing gotchas in passes that we haven’t exercised yet. We know that others are cleaning up the delinquent code, so it’s a race – do they get the holes filled before we fall into them?
Zoxc
Participant
May 1, 2017 at 8:06 pm
Post count: 7
#2829
Is preserving pointers required for C/C++ code or this is only useful to catch overflows before they clobber the reserved bits? What would happen if the source code converts a pointer to a integer, does some arithmetic and converts it back to a pointer?
Ivan Godard
Keymaster
May 15, 2017 at 9:51 pm
Post count: 689
#2835
Sorry for the delay in response; several posts here got blackholed by my mailer somehow 🙁
To answer your question: It depends on whether the program is using bounded pointers or not.
Bounded pointers constrain address arithmetic to the rules of C and C++: you can only use pointer arithmetic to move within an allocation or to one past the end of an array. This catches wild addresses that wind up pointing to areas for which the program has valid permissions; segv checking via page table or the Mill’s PLB can’t catch such wild addresses. The effect of bounded pointers is similar to what tools like Purify or valgrind do, but at hardware speeds and always on. Bounded pointers are the default, but you can override all or parts of legacy programs to let them do nasties to what they think is a flat address.
Bounds checking can only be done when the hardware knows that the value is a pointer, not an integer. So if you turn your pointer into an integer, diddle it, turn it back to a pointer, and dereference it, then the hardware will interpret its diddleship as an address. What happens then depends on the bit value, but is unlikely to be very useful. However, if you first convert the bounded pointer to an unbounded one, before diddling it, and do not overflow, then the resulting address will act like the equivalent address on a conventional legacy machine that doesn’t have the Mill security/reliability features. Of course it is an unbound pointer at that point; if you want it be again bounded for further use then you have to tell it what allocation you want to bind it to. There are ops for all of that.
However, the largest problem with LLVM’s conflation of pointer and integer is not pointer arithmetic, it is pointer comparisons. It is quite possible for two pointers to refer to the same object but not be bitwise identical. Bounded pointers provide an example: one pointer may be bound to a larger allocation, while another is bound to a contained sub-object. Both may refer to the same byte address, but the bounds information is different. Integer equal doesn’t work for that.
Author
Posts

You must be logged in to reply to this topic.