Skip to content
Snippets Groups Projects
  1. May 20, 2024
  2. May 14, 2024
    • Kyle Siefring's avatar
      ARM64: Various optimizations for symbol decode · 7f68f23c
      Kyle Siefring authored
      Changes stem from redesigning the reduction stage of the multisymbol
      decode function.
      * No longer use adapt4 for 5 possible symbol values
      * Specialize reduction for 4/8/16 decode functions
      * Modify control flow
      
      +------------------------+--------------+--------------+---------------+
      |                        |  Neoverse V1 |  Neoverse N1 |   Cortex A72  |
      |                        | (Graviton 3) | (Graviton 2) |  (Graviton 1) |
      +------------------------+-------+------+-------+------+-------+-------+
      |                        |  Old  |  New |  Old  |  New |  Old  |  New  |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_bool_neon       |  13.0 | 12.9 |  14.9 | 14.0 |  39.3 |  29.0 |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_bool_adapt_neon |  15.4 | 15.6 |  17.5 | 16.8 |  41.6 |  33.5 |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_bool_equi_neon  |  11.3 | 12.0 |  14.0 | 12.2 |  35.0 |  26.3 |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_hi_tok_c        |  73.7 | 57.8 |  73.4 | 60.5 | 130.1 | 103.9 |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_hi_tok_neon     |  63.3 | 48.2 |  65.2 | 51.2 | 119.0 | 105.3 |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_symbol_\        |  28.6 | 22.5 |  28.4 | 23.5 |  67.8 |  55.1 |
      | adapt4_neon            |       |      |       |      |       |       |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_symbol_\        |  29.5 | 26.6 |  29.0 | 28.8 |  76.6 |  74.0 |
      | adapt8_neon            |       |      |       |      |       |       |
      +------------------------+-------+------+-------+------+-------+-------+
      | decode_symbol_\        |  31.6 | 31.2 |  33.3 | 33.0 |  77.5 |  68.1 |
      | adapt16_neon           |       |      |       |      |       |       |
      +------------------------+-------+------+-------+------+-------+-------+
      7f68f23c
  3. May 13, 2024
    • Henrik Gramner's avatar
    • Henrik Gramner's avatar
      checkasm: Avoid UB in setjmp() invocations · 471549f2
      Henrik Gramner authored
      Both POSIX and the C standard places several environmental limits on
      setjmp() invocations, with essentially anything beyond comparing the
      return value with a constant as a simple branch condition being UB.
      
      We were previously performing a function call using the setjmp()
      return value as an argument, which is technically not allowed
      even though it happened to work correctly in practice.
      
      Some systems may loosen those restrictions and allow for more
      flexible usage, but we shouldn't be relying on that.
      471549f2
  4. May 10, 2024
  5. Apr 15, 2024
    • Kyle Siefring's avatar
      ARM64: Port msac improvements to more functions · 37d52435
      Kyle Siefring authored and Henrik Gramner's avatar Henrik Gramner committed
      Port improvements from the hi token functions to the rest of the symbol
      adaption functions. These weren't originally ported since they didn't
      work with arbitrary padding. In practice, zero padding is already used
      and only the tests need to be updated.
      
      Results - Neoverse N1
      
      Old:
      msac_decode_symbol_adapt4_c:         41.4 ( 1.00x)
      msac_decode_symbol_adapt4_neon:      31.0 ( 1.34x)
      msac_decode_symbol_adapt8_c:         54.5 ( 1.00x)
      msac_decode_symbol_adapt8_neon:      32.2 ( 1.69x)
      msac_decode_symbol_adapt16_c:        85.6 ( 1.00x)
      msac_decode_symbol_adapt16_neon:     37.5 ( 2.28x)
      
      New:
      msac_decode_symbol_adapt4_c:         41.5 ( 1.00x)
      msac_decode_symbol_adapt4_neon:      27.7 ( 1.50x)
      msac_decode_symbol_adapt8_c:         55.7 ( 1.00x)
      msac_decode_symbol_adapt8_neon:      30.1 ( 1.85x)
      msac_decode_symbol_adapt16_c:        82.4 ( 1.00x)
      msac_decode_symbol_adapt16_neon:     35.2 ( 2.34x)
      37d52435
  6. Apr 08, 2024
    • Henrik Gramner's avatar
      meson: Enable parallel execution of checkasm in 'meson test' · dc949013
      Henrik Gramner authored
      It was originally disabled due to older meson versions mixing the output
      of 'meson test -v' from different tests, which made the log difficult to
      read. Newer versions however caches the output from each test as it runs
      and prints it in one contiguous block, so that's no longer an issue.
      dc949013
  7. Apr 02, 2024
    • Martin Storsjö's avatar
      checkasm: Add support for the private macOS kperf API for benchmarking · 5e31720b
      Martin Storsjö authored
      On AArch64, the performance counter registers usually are
      restricted and not accessible from user space.
      
      On macOS, we currently use mach_absolute_time() as timer on
      aarch64. This measures wallclock time but with a very coarse
      resolution.
      
      There is a private API, kperf, that one can use for getting
      high precision timers though. Unfortunately, it requires running
      the checkasm binary as root (e.g. with sudo).
      
      Also, as it is a private, undocumented API, it can potentially
      change at any time.
      
      This is handled by adding a new meson build option, for switching
      to this timer. If the timer source in checkasm could be changed
      at runtime with an option, this wouldn't need to be a build time
      option.
      
      This allows getting benchmarks like this:
      
      mc_8tap_regular_w16_hv_8bpc_c:              1522.1 ( 1.00x)
      mc_8tap_regular_w16_hv_8bpc_neon:            331.8 ( 4.59x)
      
      Instead of this:
      
      mc_8tap_regular_w16_hv_8bpc_c:                 9.0 ( 1.00x)
      mc_8tap...
      5e31720b
  8. Mar 04, 2024
  9. Feb 29, 2024
    • Henrik Gramner's avatar
      checkasm: Add --list-cpuflags option · 85a10359
      Henrik Gramner authored
      Prints a list of cpuflags available for the current architecture.
      
      Flags which are supported on the current system will be printed in
      green, and flags which are unsupported in red with a ~ prefix.
      85a10359
  10. Feb 28, 2024
  11. Feb 26, 2024
  12. Feb 22, 2024
  13. Feb 21, 2024
    • Henrik Gramner's avatar
      checkasm: Improve msac tests · 83ae3e9a
      Henrik Gramner authored
      * Process the entire buffer to get better coverage of eob handling.
      
      * Use a more reasonable buffer size.
      
      * Ignore trailing dif bits to allow for more implementation flexibility.
      83ae3e9a
  14. Feb 18, 2024
  15. Jan 31, 2024
  16. Jan 30, 2024
  17. Jan 24, 2024
  18. Jan 23, 2024
  19. Jan 21, 2024
    • jinbo's avatar
      loongarch: Improve the performance of msac series functions · 38bc0084
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      msac_decode_bool_c:                            0.5 ( 1.00x)
      msac_decode_bool_lsx:                          0.5 ( 1.09x)
      msac_decode_bool_adapt_c:                      0.7 ( 1.00x)
      msac_decode_bool_adapt_lsx:                    0.6 ( 1.20x)
      msac_decode_symbol_adapt4_c:                   1.3 ( 1.00x)
      msac_decode_symbol_adapt4_lsx:                 1.0 ( 1.30x)
      msac_decode_symbol_adapt8_c:                   2.1 ( 1.00x)
      msac_decode_symbol_adapt8_lsx:                 1.0 ( 2.05x)
      msac_decode_symbol_adapt16_c:                  3.7 ( 1.00x)
      msac_decode_symbol_adapt16_lsx:                0.8 ( 4.77x)
      38bc0084
    • Hecai Yuan's avatar
      Add loongarch support · 2e952f30
      Hecai Yuan authored
      2e952f30
  20. Jan 11, 2024
  21. Dec 19, 2023
  22. Dec 15, 2023
  23. Nov 12, 2023
  24. Nov 01, 2023
  25. Jul 12, 2023
  26. Jul 07, 2023
  27. Jul 06, 2023
Loading