Skip to content
Snippets Groups Projects
Commit fe9e4a7f authored by Konstantinos Margaritis's avatar Konstantinos Margaritis
Browse files

Provide implementations for functions using the instructions SDOT/UDOT in the...

Provide implementations for functions using the instructions SDOT/UDOT in the DotProd Armv8 extension.

Functions implemented:
sad_16x8, sad_16x16,
sad_x3_16x8_neon, sad_x3_16x16_neon,
sad_x4_16x8_neon, sad_x4_16x16_neon,
ssd_8x4, ssd_8x8, ssd_8x16, ssd_16x8, ssd_16x16,
pixel_vsad

Performance improvement against Neon ranges from 5% to 188%.
Following is the output of ./checkasm8 --bench (run on a Graviton4 system):

sad_16x8_c: 1323
sad_16x8_neon: 224
sad_16x8_dotprod: 211
sad_16x16_c: 2619
sad_16x16_neon: 365
sad_16x16_dotprod: 320
sad_x3_16x8_c: 3836
sad_x3_16x8_neon: 403
sad_x3_16x8_dotprod: 317
sad_x3_16x16_c: 7725
sad_x3_16x16_neon: 714
sad_x3_16x16_dotprod: 532
sad_x4_16x8_c: 5080
sad_x4_16x8_neon: 438
sad_x4_16x8_dotprod: 375
sad_x4_16x16_c: 10260
sad_x4_16x16_neon: 794
sad_x4_16x16_dotprod: 655
ssd_8x4_c: 381
ssd_8x4_neon: 157
ssd_8x4_dotprod: 115
ssd_8x4_sve: 150
ssd_8x8_c: 695
ssd_8x8_neon: 238
ssd_8x8_dotprod: 161
ssd_8x8_sve: 228
ssd_8x16_c: 1335
ssd_8x16_neon: 388
ssd_8x16_dotprod: 267
ssd_16...
parent 570f6c70
No related merge requests found
Pipeline #583657 failed with stages
in 2 minutes and 35 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment