x86/itx: minor refactoring
Removes some duplicate functionality, turn some inlined functions into functions, and improve/simplify some existing function implementations.
- itx/x86: rewrite .transpose4x8packed so it uses only m0-3,4&6
- itx/x86: replace idct8x8.transpose with idct8x4.transpose4x8packed
- x86/itx: add 1/sqrt(2) (rect2) multiply macro
- x86/itx: share pass2 loop between {16,32}x32 dct^2 functions
- x86/itx: combine .write_8x8 and .round{1,2,3,4} into a single function
- x86/itx: combine .write_8x4 and .round{1,2} into a single function
- x86/itx: split dct/adst/identity pass=2 implementations for 16x8