cli: Avoid large intermediates in the windows get_time_nanos
By multiplicating the performance counter value (within its own time base) by the intended target time base, and only then dividing, we reduce the available numeric range by the factor of the original time base times the new time base.
On Windows 10 on ARM64, the performance counter frequency is 19200000 (on x86_64 in a virtual machine, it's 10000000), making the calculation overflow every (1 << 64) / (19200000 * 1000000000) = 960 seconds, i.e. 16 minutes - long before the actual uint64_t nanosecond return value wraps around.