Skip to content
Snippets Groups Projects
  1. Aug 07, 2019
  2. Jul 28, 2019
  3. Jul 27, 2019
  4. Jul 25, 2019
  5. Jul 23, 2019
    • B Krishnan Iyer's avatar
      arm: mc: neon: Merge load and other related operations in blend/blend_h/blend_v functions · 407c27db
      B Krishnan Iyer authored and Martin Storsjö's avatar Martin Storsjö committed
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	71.1	74.1	132.7	137.5
      blend_h_w4_8bpc_neon:	60.2	65.8	137.5	147.1
      blend_h_w8_8bpc_neon:	62.2	68.9	123.1	131.7
      blend_h_w16_8bpc_neon:	82.1	86	180.7	190.3
      blend_h_w32_8bpc_neon:	149.9	149.2	358.3	358
      blend_h_w64_8bpc_neon:	265.3	263.1	630.2	629.8
      blend_h_w128_8bpc_neon:	579.5	571	1404.4	1404.5
      blend_v_w2_8bpc_neon:	118.7	118.7	193.2	195.3
      blend_v_w4_8bpc_neon:	248.6	245.8	373.4	357.3
      blend_v_w8_8bpc_neon:	202.7	202	356.4	357.2
      blend_v_w16_8bpc_neon:	238.8	234.8	590.4	591.3
      blend_v_w32_8bpc_neon:	346.7	344.4	993.7	994.7
      blend_w4_8bpc_neon:	33.5	37.5	90.7	96.7
      blend_w8_8bpc_neon:	49.7	53	123.3	123.3
      blend_w16_8bpc_neon:	151.8	151	348.8	332.4
      blend_w32_8bpc_neon:	372.9	370.9	908.3	908.4
      407c27db
    • B Krishnan Iyer's avatar
      arm: mc: neon: Reduce usage of general purpose registers in blend/blend_v functions · d4df8619
      B Krishnan Iyer authored and Martin Storsjö's avatar Martin Storsjö committed
      	                	A73		A53
                      	Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.1	137.5	137.5
      blend_h_w4_8bpc_neon:	65.8	65.8	147.1	147.1
      blend_h_w8_8bpc_neon:	68.9	68.7	131.7	131.7
      blend_h_w16_8bpc_neon:	86	85.6	190.3	190.4
      blend_h_w32_8bpc_neon:	149.2	149.8	358	358.3
      blend_h_w64_8bpc_neon:	263.1	264.1	629.8	630.3
      blend_h_w128_8bpc_neon:	571	575.4	1404.5	1404.2
      blend_v_w2_8bpc_neon:	118.7	120.1	195.3	196.4
      blend_v_w4_8bpc_neon:	245.8	247.2	357.3	358.4
      blend_v_w8_8bpc_neon:	202	204.2	357.2	358.4
      blend_v_w16_8bpc_neon:	234.8	238.5	591.3	591.8
      blend_v_w32_8bpc_neon:	344.4	347.2	994.7	997.2
      blend_w4_8bpc_neon:	37.5	38.3	96.7	98.7
      blend_w8_8bpc_neon:	53	54.8	123.3	125.3
      blend_w16_8bpc_neon:	151	150.8	332.4	334.5
      blend_w32_8bpc_neon:	370.9	361.6	908.4	910.7
      d4df8619
    • B Krishnan Iyer's avatar
      arm: mc: neon: Use vld with ! post-increment instead of a register in... · b704a993
      B Krishnan Iyer authored and Martin Storsjö's avatar Martin Storsjö committed
      arm: mc: neon: Use vld with ! post-increment instead of a register in blend/blend_h/blend_v function
      
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.6	137.5	137
      blend_h_w4_8bpc_neon:	65.8	66	147.1	146.6
      blend_h_w8_8bpc_neon:	68.7	68.6	131.7	131.2
      blend_h_w16_8bpc_neon:	85.6	85.9	190.4	192
      blend_h_w32_8bpc_neon:	149.8	149.8	358.3	357.6
      blend_h_w64_8bpc_neon:	264.1	262.8	630.3	629.5
      blend_h_w128_8bpc_neon:	575.4	577	1404.2	1402
      blend_v_w2_8bpc_neon:	120.1	121.3	196.4	195.5
      blend_v_w4_8bpc_neon:	247.2	247.5	358.4	358.5
      blend_v_w8_8bpc_neon:	204.2	205.2	358.4	358.5
      blend_v_w16_8bpc_neon:	238.5	237.1	591.8	590.5
      blend_v_w32_8bpc_neon:	347.2	345.8	997.2	994.1
      blend_w4_8bpc_neon:	38.3	38.6	98.7	99.2
      blend_w8_8bpc_neon:	54.8	55.1	125.3	125.8
      blend_w16_8bpc_neon:	150.8	150.1	334.5	344
      blend_w32_8bpc_neon:	361.6	360.4	910.7	910.9
      b704a993
    • Marvin Scholz's avatar
      tools: add a simple player example · 5ab6d231
      Marvin Scholz authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      5ab6d231
  6. Jul 17, 2019
  7. Jul 15, 2019
    • Emmanuel Gil Peyrot's avatar
      Set thread names on Linux · 15a93861
      Emmanuel Gil Peyrot authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      This is using the Linux-only prctl(PR_SET_NAME, …) call, because glibc’s
      pthread_setname_np() is doing exactly the same call so there is no
      reason to use it instead, as it isn’t any more portable.
      
      I don’t have any other OS to test this on, but if you want to add one
      just add an #else defined(__YOUR_OS__) before the #else in thread.h.
      15a93861
  8. Jul 13, 2019
    • B Krishnan Iyer's avatar
      arm: mc: NEON implementation of w_mask_444/422/420 function · b271590a
      B Krishnan Iyer authored
      		                        A73		A53
      
      w_mask_420_w4_8bpc_c:	        	797.5		1072.7
      w_mask_420_w4_8bpc_neon:		85.6		152.7
      w_mask_420_w8_8bpc_c:		        2344.3		3118.7
      w_mask_420_w8_8bpc_neon:		221.9		372.4
      w_mask_420_w16_8bpc_c:		        7429.9		9702.1
      w_mask_420_w16_8bpc_neon:		620.4		1024.1
      w_mask_420_w32_8bpc_c:	        	27498.2		37205.7
      w_mask_420_w32_8bpc_neon:		2394.1		3838
      w_mask_420_w64_8bpc_c:  		66495.8		88721.3
      w_mask_420_w64_8bpc_neon:      		6081.4		9630
      w_mask_420_w128_8bpc_c:	        	163369.3	219494
      w_mask_420_w128_8bpc_neon:		16015.7		24969.3
      w_mask_422_w4_8bpc_c:	        	858.3		1100.2
      w_mask_422_w4_8bpc_neon:		81.5		143.1
      w_mask_422_w8_8bpc_c:	        	2447.5		3284.6
      w_mask_422_w8_8bpc_neon:		217.5		342.4
      w_mask_422_w16_8bpc_c:	        	7673.4		10135.9
      w_mask_422_w16_8bpc_neon:		632.5		1062.6
      w_mask_422_w32_8bpc_c:	        	28344.9		39090
      w_mask_422_w32_8bpc_neon:		2393.4		3963.8
      w_mask_422_w64_8bpc_c:	        	68159.6		93447
      w_mask_422_w64_8bpc_neon:		6015.7		9928.1
      w_mask_422_w128_8bpc_c:	        	169501.2	231702.7
      w_mask_422_w128_8bpc_neon:		15847.5		25803.4
      w_mask_444_w4_8bpc_c:	        	674.6		862.3
      w_mask_444_w4_8bpc_neon:		80.2		135.4
      w_mask_444_w8_8bpc_c:	        	2031.4		2693
      w_mask_444_w8_8bpc_neon:		209.3		318.7
      w_mask_444_w16_8bpc_c:		        6576		8217.4
      w_mask_444_w16_8bpc_neon:		627.3		986.2
      w_mask_444_w32_8bpc_c:		        26051.7		31593.9
      w_mask_444_w32_8bpc_neon:		2374		3671.6
      w_mask_444_w64_8bpc_c:		        63600		75849.9
      w_mask_444_w64_8bpc_neon:		5957		9335.5
      w_mask_444_w128_8bpc_c:		        156964.7	187932.4
      w_mask_444_w128_8bpc_neon:		15759.4		24549.5
      b271590a
  9. Jul 08, 2019
  10. Jul 07, 2019
  11. Jul 06, 2019
  12. Jul 05, 2019
  13. Jul 02, 2019
  14. Jun 30, 2019
  15. Jun 29, 2019
  16. Jun 27, 2019
  17. Jun 26, 2019
    • Martin Storsjö's avatar
      arm64: itx: Add NEON optimized inverse transforms · ef1ea008
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      The speedup for most non-dc-only dct functions is around 9-12x
      over the C code generated by GCC 7.3.
      
      Relative speedups vs C for a few functions:
      
                                                    Cortex A53    A72    A73
      inv_txfm_add_4x4_dct_dct_0_8bpc_neon:               3.90   4.16   5.65
      inv_txfm_add_4x4_dct_dct_1_8bpc_neon:               7.20   8.05  11.19
      inv_txfm_add_8x8_dct_dct_0_8bpc_neon:               5.09   6.73   6.45
      inv_txfm_add_8x8_dct_dct_1_8bpc_neon:              12.18  10.80  13.05
      inv_txfm_add_16x16_dct_dct_0_8bpc_neon:             7.31   9.35  11.17
      inv_txfm_add_16x16_dct_dct_1_8bpc_neon:            14.36  13.06  15.93
      inv_txfm_add_16x16_dct_dct_2_8bpc_neon:            11.00  10.09  12.05
      inv_txfm_add_32x32_dct_dct_0_8bpc_neon:             4.41   5.40   5.77
      inv_txfm_add_32x32_dct_dct_1_8bpc_neon:            13.84  13.81  18.04
      inv_txfm_add_32x32_dct_dct_2_8bpc_neon:            11.75  11.87  15.22
      inv_txfm_add_32x32_dct_dct_3_8bpc_neon:            10.20  10.40  13.13
      inv_txfm_add_32x32_dct_dct_4_8bpc_neon:             9.01   9.21  11.56
      inv_txfm_add_64x64_dct_dct_0_8bpc_neon:             3.84   4.82   5.28
      inv_txfm_add_64x64_dct_dct_1_8bpc_neon:            14.40  12.69  16.71
      inv_txfm_add_64x64_dct_dct_4_8bpc_neon:            10.91   9.63  12.67
      
      Some of the specialcased identity_identity transforms for 32x32
      give insane speedups over the generic C code:
      
      inv_txfm_add_32x32_identity_identity_0_8bpc_neon: 225.26 238.11 247.07
      inv_txfm_add_32x32_identity_identity_1_8bpc_neon: 225.33 238.53 247.69
      inv_txfm_add_32x32_identity_identity_2_8bpc_neon:  59.60  61.94  64.63
      inv_txfm_add_32x32_identity_identity_3_8bpc_neon:  26.98  27.99  29.21
      inv_txfm_add_32x32_identity_identity_4_8bpc_neon:  15.08  15.93  16.56
      ef1ea008
    • Marvin Scholz's avatar
      tools: Use DAV1D_ERR for strerror calls · e0346114
      Marvin Scholz authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      e0346114
    • Marvin Scholz's avatar
      include: Consistently use DAV1D_ERR in docs · 04dc8a4d
      Marvin Scholz authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      04dc8a4d
  18. Jun 24, 2019
  19. Jun 21, 2019
  20. Jun 20, 2019
  21. Jun 19, 2019
Loading