I very much agree that there is a real need for better support for call-unwinding on exception conditions. We actually had a facility several years ago that was somewhat like what you suggest. However, there were problems.
One problem was that such a convention requires the callee to know details about the caller’s expected protocol, breaking function isolation. Alternatively, we could define a kind of universal protocol, but that would have costs for calls that didn’t need the full generality. There were encoding issues too – remember that a single Mill instruction can contain several calls, and what with needing to represent the target address and the argument list, adding another whole address did not fit well. However, what finally caused us to drop the idea was the realization that we didn’t need a call variant anyway.
The chosen resolution takes advantage of the ability of Mill calls to return more than one result, cheaply. That is, your myFunction is directly expressible in Mill hardware. So the cost of the semantics is the branch to test the error condition. With phasing (you did see the Execution talk?) the branch can be in the same instruction as the call and still see the result of the call as its predicate. The branch operation naturally carries the error-handling address that would have to be present in the fancy call operation anyway, but does not require any encoding contortions or special call semantics. So making the error handling explicit puts it in the caller (where it belongs on isolation grounds) and has zero latency cost and no power or encoding cost beyond what having the callee do the branch would need anyway.
This also gets us out of the protocol business; protocols should be app- or language-level, not dictated by the hardware. For example, a caller/callee could agree to have several possible post-call continuations reflecting various detected conditions and represented by a returned enum that the caller switches on.
You separately mention the possibility of passing arguments as addresses after the call operation. Besides such an approach not being encodable on a Mill, it also does not work on any modern Harvard-architecture CPU (Harvard has separate i- and d-caches with unique datapaths for each). In general data accesses to code would have to be satisfied from below the join point of the hierarchy, which is at least a painful 10+ cycles away from the CPU. In contrast, having the caller use a LEA operation to drop a pointer onto the belt is a one-cycle operation in the caller, and will almost certainly be overlapped with other operation in the Mill’s wide issue. When the callee dereferences the pointer (if it needs to) then it is quite likely to find the data in the d$1 cache, the latency of which can be hidden using the Mill deferred load facility.
There’s another issue that will be touched on in the upcoming talk on Security and Reliability: the caller and callee may not be in the same security domain (called a Turf in Mill-speak), so the callee may not have rights to access the code of the caller in the first place, even if the address is to a data location that it does have rights to.
We are a long way from the Z80 🙂