Forum Replies Created
- AuthorPosts
Some questions:
This wouldn’t be a Mill architecture, but I wonder if there would be a problem with adding more phases and having different otherwise identical functional units do their work in different phases. For example, maybe read, pick1, op1, call, op2, pick2, write. I suppose that would induce limitations as to how quickly you could schedule instructions back to back, but it would make decoding easier for a given width. Were there other important benefits and limitations I’m not thinking of? Wow much of the selection of the current phasing setup was a result of the way the hardware works, and how much profiling actual code to see what would be useful in practice?
With regard to calls, how many cycles do they normally take in overhead?
What happens if you have an FMA instruction but fail to also issue an Args? Or is it that the existence of an Args in an instruction turns the multiply operation into a FMA operation?
Am I right in guessing that which operations are connected to which in ganging is a result of which slot in the instruction the ops occupy?
So, the Mill has memory acccess that are the direct result of the execution of instructions but also implicit loads and stores that happen as a result of function calls as the belt and and outstanding memory accesses overspill their buffers and are stored in the general memory heirarchy. I think this would be infeasible on an architecture where any load or store might generate a fault immediately, but on the Mill it’ll just make it’s way through the memory heirarchy the same way anything else would and most interesting things will happen at the TLB stage in the same manner and with the same level of decoupling as any other write or read.
How do you handle memory protection and synchonization with implicit operations? Just the same as with normal operaitons, possibly with a duplicated PLB and with the addresses broadcast to the retire stations?
- in reply to: Instruction Encoding #577
Is the Mill ISA non-self-synchronizing? That is, is it possible that you could jump to memory address foo and get one valid stream of execution, but also be able to jump to address foo+1 and get an alternative but equally valid stream of execution? I imagine it works this way in the Mill, as it does with x86.
How do operations that have multiple (always 2?) return values interact with the output registers for the pipelines? Does the extra get bumped to a higher latency result register? Do you use the extra buffer registers that were mentioned? Or are there special pipelines for dual-result operaitons that have pairs of registers?
How is the metadata perserved when the belts are spilled to memory? Is there generally an extra byte for each belt entry or is there some sort of coelescing that happens?
Also, is it just me or do the values on the belt with their metadata function as monads? In fact, it looks just like Maybe in Haskell except instead of being
Maybe a = Just a | Nothing
instead it’sBelt a = Just a | Nothing | Error
and binding gives Error precedence over Nothing. When I was watching the talk I thought to myself “Oh, it’s just like Haskell” in the same way the Belt talk made me think “Oh, its just like Static Single Assignment form”- in reply to: Prediction #542
I was re-watching the prediction video this morning. It seems like there are some serious advantages to the EBB chaining prediction approach, but it also seems like one drawback of it is that you completely lose the ability to make predictions about secondary or tertiary exits from an EBB, such as you might find in a loop within a loop. It seems like this makes the mill better at branchy or unstructred code at the expense of making it worse at the sort of code DSPs would typically excel at. Am I missing something?
- in reply to: Instruction Encoding #579
Finding these isn’t neccesarily easy, but the worry is that they allow certain classes of security vulnerabilities like so:
http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.htmlBut again, this is at worst an area where Mill fails to improve on x86.
- AuthorPosts