Linus Torvalds writes: (Summary)
But there's a big gap between "just use 'rep movs' and 'do some cacheline
unrollong'".
unrollong'".
Why isn't it just doing a simple word-at-a-time loop and letting the CPU do the unrolling that it will already do on its own?
the unrolling that it will already do on its own?
I may have gotten that answered too, but there's no comment in the code about why it's such a disgusting mess, so I've long since forgotten _why_ it's such a disgusting mess.
it's such a disgusting mess.
That loop unrolling _used_ to be "hey, it's simple".
unrollong'".
Why isn't it just doing a simple word-at-a-time loop and letting the CPU do the unrolling that it will already do on its own?
the unrolling that it will already do on its own?
I may have gotten that answered too, but there's no comment in the code about why it's such a disgusting mess, so I've long since forgotten _why_ it's such a disgusting mess.
it's such a disgusting mess.
That loop unrolling _used_ to be "hey, it's simple".