x86: Add refmvs.save_tmvs SSSE3 asm
64-bit:
(10 checkasm runs)
Speed ups:
c..ssse3: 2.921x (o=0.0249)
c..avx2: 3.134x (o=0.0289)
Speed diffs:
c..ssse3: 34.24% (o=0.29)
c..avx2: 31.91% (o=0.29)
save_tmvs_c: 25681.4 ( 1.00x)
save_tmvs_ssse3: 8711.6 ( 2.95x)
save_tmvs_avx2: 8075.1 ( 3.18x)
chimera: 453.64 => 455.93 (~+0.5%)
32-bit:
(10 checkasm runs)
Speed ups:
c..ssse3: 2.353x (o=0.0253)
Speed diffs:
c..ssse3: 42.51% (o=0.46)
save_tmvs_c: 23775.4 ( 1.00x)
save_tmvs_ssse3: 9799.6 ( 2.43x)
chimera: 408.40 => 412.16 (~+0.9%)
These chimera decodes numbers aren't very stable, it's probably better to just look at the checkasm ones.
Edited by Victorien Le Couviour--Tuffet