I’m just learning about neural networks myself (stanford coursera class). I immediately thought about using lower precision fixed point instead of the standard 32 bit floating point that it seems like everybody uses. I ran across that paper and looked around to see if there was any support for it in hardware and found none.
I think the lack of information is due to the lack of available hardware :). The paper I linked to had to use FPGA to test out the idea.
One more paper I found describes an 8 bit FP representation with 16 bit accumulation. It doesn’t seem to discuss training much, just feed forward (much less dependence on precision):
It is the back propagation phase which computes gradients which requires the precision.
This paper is referenced from the original one:
It is mostly about comparing different representations and a sort of dynamic fixed point where the exponent changes as the gradients decrease over the training iterations.