Skip to content
Snippets Groups Projects
  1. Jun 20, 2022
    • Henrik Gramner's avatar
      checkasm: Speed up signal handling · 0421f787
      Henrik Gramner authored
      Enabling/disabling signal handlers is very slow and requires a syscall.
      
      A better approach is to keep the signal handlers enabled all the time,
      and use a simple flag variable to determine if a given signal should
      be handled or passed on to the default signal handler.
      0421f787
    • Henrik Gramner's avatar
      checkasm: Improve seed generation on Windows · fa68b036
      Henrik Gramner authored
      GetTickCount() increases at a very low frequency, >10ms per tick.
      When running multiple loops of checkasm instances in parallel
      different instances regularly ends up using identical seeds.
      
      Prefer the use of QueryPerformanceCounter() instead, which ticks at
      a significantly higher rate, which in turn increases randomness.
      fa68b036
    • Henrik Gramner's avatar
      ci: Don't specify a specific MacOS version · 0c590fc7
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      0c590fc7
  2. Jun 14, 2022
  3. Jun 13, 2022
  4. Jun 03, 2022
  5. Jun 02, 2022
    • Henrik Gramner's avatar
      x86: Add a workaround for quirky AVX-512 hardware behavior · 0cfb03cd
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      On Intel CPUs certain AVX-512 shuffle instructions incorrectly
      flag the upper halves of YMM registers as in use when writing
      to XMM registers, which may cause AVX/SSE state transitions.
      
      This behavior is not documented and only occurs on physical
      hardware, not when using the Intel SDE, so as far as I can tell
      it appears to be a hardware bug.
      
      Work around the issue by using EVEX-only registers. This avoids
      the problem at the cost of a slightly larger code size.
      0cfb03cd
  6. May 31, 2022
  7. May 25, 2022
  8. May 20, 2022
  9. May 18, 2022
  10. May 07, 2022
  11. May 05, 2022
  12. Apr 28, 2022
  13. Apr 24, 2022
    • Matthias Dressel's avatar
      x86/itx: Add 32x8 12bpc AVX2 transforms · ffb59680
      Matthias Dressel authored
      inv_txfm_add_32x8_dct_dct_0_12bpc_c: 286.7
      inv_txfm_add_32x8_dct_dct_0_12bpc_avx2: 20.1
      inv_txfm_add_32x8_dct_dct_1_12bpc_c: 7832.7
      inv_txfm_add_32x8_dct_dct_1_12bpc_avx2: 710.6
      inv_txfm_add_32x8_dct_dct_2_12bpc_c: 7838.1
      inv_txfm_add_32x8_dct_dct_2_12bpc_avx2: 711.6
      inv_txfm_add_32x8_dct_dct_3_12bpc_c: 7818.3
      inv_txfm_add_32x8_dct_dct_3_12bpc_avx2: 710.9
      inv_txfm_add_32x8_dct_dct_4_12bpc_c: 7820.6
      inv_txfm_add_32x8_dct_dct_4_12bpc_avx2: 710.5
      inv_txfm_add_32x8_identity_identity_0_12bpc_c: 1526.6
      inv_txfm_add_32x8_identity_identity_0_12bpc_avx2: 19.3
      inv_txfm_add_32x8_identity_identity_1_12bpc_c: 1519.4
      inv_txfm_add_32x8_identity_identity_1_12bpc_avx2: 19.9
      inv_txfm_add_32x8_identity_identity_2_12bpc_c: 1519.9
      inv_txfm_add_32x8_identity_identity_2_12bpc_avx2: 43.6
      inv_txfm_add_32x8_identity_identity_3_12bpc_c: 1519.4
      inv_txfm_add_32x8_identity_identity_3_12bpc_avx2: 67.8
      inv_txfm_add_32x8_identity_identity_4_12bpc_c: 1523.2
      inv_txfm_add_32x8_identity_identity_4_12bpc_avx2: 91.6
      ffb59680
    • Matthias Dressel's avatar
      x86/itx: Add 8x32 12bpc AVX2 transforms · e67a5000
      Matthias Dressel authored
      inv_txfm_add_8x32_dct_dct_0_12bpc_c: 334.6
      inv_txfm_add_8x32_dct_dct_0_12bpc_avx2: 66.0
      inv_txfm_add_8x32_dct_dct_1_12bpc_c: 7929.7
      inv_txfm_add_8x32_dct_dct_1_12bpc_avx2: 489.3
      inv_txfm_add_8x32_dct_dct_2_12bpc_c: 7925.8
      inv_txfm_add_8x32_dct_dct_2_12bpc_avx2: 547.1
      inv_txfm_add_8x32_dct_dct_3_12bpc_c: 7928.9
      inv_txfm_add_8x32_dct_dct_3_12bpc_avx2: 647.8
      inv_txfm_add_8x32_dct_dct_4_12bpc_c: 7916.1
      inv_txfm_add_8x32_dct_dct_4_12bpc_avx2: 701.0
      inv_txfm_add_8x32_identity_identity_0_12bpc_c: 2413.1
      inv_txfm_add_8x32_identity_identity_0_12bpc_avx2: 28.6
      inv_txfm_add_8x32_identity_identity_1_12bpc_c: 2415.2
      inv_txfm_add_8x32_identity_identity_1_12bpc_avx2: 28.6
      inv_txfm_add_8x32_identity_identity_2_12bpc_c: 2413.7
      inv_txfm_add_8x32_identity_identity_2_12bpc_avx2: 55.1
      inv_txfm_add_8x32_identity_identity_3_12bpc_c: 2415.4
      inv_txfm_add_8x32_identity_identity_3_12bpc_avx2: 85.3
      inv_txfm_add_8x32_identity_identity_4_12bpc_c: 2401.8
      inv_txfm_add_8x32_identity_identity_4_12bpc_avx2: 116.8
      e67a5000
    • Matthias Dressel's avatar
      x86/itx: Deduplicate dconly code · 0c1fbdef
      Matthias Dressel authored
      0c1fbdef
  14. Apr 23, 2022
  15. Apr 08, 2022
  16. Apr 07, 2022
    • James Almer's avatar
      picture: ensure the new seq header and op param info flags are attached to the... · 9bd8350a
      James Almer authored
      picture: ensure the new seq header and op param info flags are attached to the next visible picture in display order
      
      If the first picture in coding order after a new sequence header is parsed is
      not visible, the first picture output by dav1d after the fact (which is coded
      after the aforementioned invisible picture) would not trigger the new seq
      header event flag as expected, despite being the first containing a reference
      to a new sequence header.
      
      Assuming the invisible picture is ever output, the result of this change will
      be two pictures signaling a new sequence header was seen despite there being
      only one new sequence header.
      9bd8350a
  17. Mar 31, 2022
  18. Mar 19, 2022
  19. Mar 18, 2022
  20. Mar 16, 2022
    • Wan-Teh Chang's avatar
      Set f->n_tile_data to 0 in dav1d_decode_frame() · 56e7ffc0
      Wan-Teh Chang authored and James Almer's avatar James Almer committed
      Set f->n_tile_data to 0 after the dav1d_decode_frame_exit() call in
      dav1d_decode_frame(). dav1d_decode_frame_exit() unrefs every element in
      use in the f->tile array, so it is good to set f->n_tile_data to 0 to
      indicate that no elements are in use.
      
      We are already doing this after all other dav1d_decode_frame_exit()
      calls.
      
      NOTE: It is tempting to have dav1d_decode_frame_exit() itself set
      f->n_tile_data to 0. I did not do that in this merge request, because
      the following is a common pattern:
      
          dav1d_decode_frame_exit(f, error);
          f->n_tile_data = 0;
          pthread_cond_signal(&f->task_thread.cond);
      
      corresponding to the waiting code:
      
          while (f->n_tile_data > 0)
              pthread_cond_wait(&f->task_thread.cond,
                                &c->task_thread.lock);
      
      I wonder if f->n_tile_data is set to 0 outside dav1d_decode_frame_exit()
      to make clear the association of f->n_tile_data with the condition
      variable f->task_thread.cond.
      56e7ffc0
  21. Mar 15, 2022
Loading