http://benji3up2kxewkqfcq7buxk2xd6zwy3zggnurkrm3l4cvwy2iipvyyad.onion/mirrors/gmpdoc/Assembly-Loop-Unrolling.html
A switch statement, providing separate code for each possible excess,
for example an 8-limb unrolling would have separate code for 0 remaining, 1
remaining, etc, up to 7 remaining. This might take a lot of code, but may be
the best way to optimize all cases in combination with a deep pipelined loop. A computed jump into the middle of the loop, thus making the first iteration
handle the...