Linus Torvalds writes: (Summary) On Wed, Nov 1, 2017 at 3:14 PM, Dave Hansen <dave.hansen@linux.intel.com>
I guess the optimal version just ends up switching between two
different entrypoints for the on/off case. And the not-quite-as-aggressive, but almost-optimal version would just be a two-byte asm alternative with an unconditional branch to the movcr3 code and back, and is turned into a noop when it's off. But since 99%+ of the cost is going to be that cr3 write, even the stupid "just load value and branch over the cr3 conditionally" is going to make things hard to measure.