Forum Replies Created
- AuthorPosts
- in reply to: Scratchpad design decision #3476
Blushes 🙂
Thanks again for your clear answer.
I am looking forward to more videos on the architecture, in there are many clever solutions to typical problems. It sometimes blows me away. Especially the code stream in two directions and associated split caches and the concept of NaR which is very powerful. But it doesn’t stop there…it all around pure innovation and that I like a lot! I hope to see it in action one day! - in reply to: Scratchpad design decision #3464
Thanks for your quick reply.
I seems that after editing my previous response a few times to correct spelling and improve wording, it vanished. This is probably some automated action. With this response I let you know it exists and focus a bit more.
After looking at the instruction set, I realized that what I suggested in my response is essentially what is currently the “rescue” operation :).
It seems my line of though regarding addressing is apparently converging with what is already happening. Now I wonder why the “rescue” operation would not be made to operate on a larger belt than all other operations. Is having a larger belt very hardware intensive even if only rescue can operate on the oldest half (or even just +8 or +16 positions)?
- in reply to: Scratchpad design decision #3462
Thank you for your swift reply.
You were very close to the mark on addressing what I meant to ask (except for the waterfall part).
My primary points were handling multiple values at once and the sector/page/segment/block, …
(give it a name) addressing to be able to be compact with respect to details.It assumes there will be address locality similarities to exploit, but this is just an assumption on my part. It mainly came up when there was talk about the size of the scratchpad. Bigger size means harder time to encode and any locality if you will could be exploited to lessen the negative side of increased size.
As a slight variation on the multiple values in one go theme, would a “touch” like operation that specifies multiple belt positions (maybe just in the last N positions) that are needed soon again help in any way? Say the belt has 32 positions, and the compiler knows some of the last 8 values will be needed shortly again, explicitly copying them as fresh belt values might be more compact than explicitly saving/restoring them elsewhere. It would be compiler controlled removal of “junk” from the belt by duplicating good values as new. Conceptually a belt could be increased in size with only this operation being able to operate on the extended part and nowhere else. It might even allow for a smaller directly addressed belt in the process, saving bits in every operand encoding. I probably should read up on the existing instructions to understand some more, I am currently shooting a bit from the hip here.
Also, never having worked with an architecture that exposed a scratchpad to me, I am wondering how its typically used. If it would be for constants that are needed a few times during execution of code, I imagine normal memory addressing and the cache system would work just fine. Is there a typical cutoff point where the scratchpad starts to benefit and at what sizes?
- in reply to: Scratchpad design decision #3458
First, I am sure I do not fully understand the architecture (but I like it as far as I do).
Conceptually I see the belt at an array, while I know technically its really not.
And I see scratchpad also conceptually as an array of some sort.When it comes to numbers falling off and later recovering them I can imagine them landing to some other space where these can be recovered from (or not). With memory as the final backstop, all fine so far.
Now, I did read a comment here that recovering/addressing values would take so many bits to encode and I was wondering why that would need to be. Sure more positions and selective recovery/fetching would indeed cost bits, but that is not the most compact method of encoding I thought. With that background some ideas crossed my find and like to see your opinion on them.
Idea one:
Specify a “sector” + “some offset bit-mask” to recover/select multiple values that are near each-other (logically).Idea two (more latency):
Specify one or several “big” sectors that have values that need recovering soon.
Then use the prior idea to pick specific values within.Sectors:
Sectors could be static relative to something else and instead of a small bit-mask, a single offset could be used instead for fetching single values. Both methods assume values clustered/produced together will likely be needed around the same time. If correct, this could simplify the encoding problem. A compiler could also group constants together based on use, supporting compact addressing schemes.
It has some similarities with the old Segment + Offset kind of addressing x86 processors used to do back in the days and near pointers in C. If its for selecting values near to each-other or accessing some part of a conceptual array where part of the addressing is constant, it will be very compact.
- AuthorPosts