- Oct 21, 2020
-
-
Victorien Le Couviour--Tuffet authored
This could cause a frame waiting on the current one to not be notified on error. Fixes #351.
-
- Oct 02, 2020
-
-
Luc Trudeau authored
long is 32 bits on Win64, as such %ld are replaced with %td. For SEQHDR, %ld was used but the actual value is a 32bit unsigned so %u is enough.
-
- Oct 01, 2020
-
-
Luc Trudeau authored
Prints out values and offsets for content light level and mastering display color volume
-
- Sep 27, 2020
-
-
- Sep 24, 2020
-
-
Martin Storsjö authored
Checkasm benchmarks: Cortex A7 A8 A53 A72 A73 wiener_chroma_10bpc_neon: 385312.5 165772.7 184308.2 122311.2 126050.2 wiener_chroma_12bpc_neon: 385296.7 165538.0 184438.2 122290.5 126205.3 wiener_luma_10bpc_neon: 385318.5 165985.3 184147.4 122311.1 126168.4 wiener_luma_12bpc_neon: 385316.3 165819.1 184484.7 122304.4 125982.4 The corresponding numbers for arm64 for comparison: Cortex A53 A72 A73 wiener_chroma_10bpc_neon: 176319.7 125992.1 128162.4 wiener_chroma_12bpc_neon: 176386.2 125986.4 128343.8 wiener_luma_10bpc_neon: 176174.0 126001.7 128227.8 wiener_luma_12bpc_neon: 176176.5 125992.1 128204.8 The arm32 version actually seems to run marginally faster than the arm64 one on A72 and A73. I believe this is because the arm64 code is tuned for A53 (which makes it a bit slower on other cores), but the arm32 code can't be tuned exactly the same way due to fewer registers being available.
-
Martin Storsjö authored
Before: Cortex A53 A72 A73 wiener_chroma_10bpc_neon: 177063.6 129197.3 127987.9 wiener_chroma_12bpc_neon: 177034.4 129206.8 128409.5 wiener_luma_10bpc_neon: 177072.6 129198.1 127931.8 wiener_luma_12bpc_neon: 177052.4 129196.0 127955.2 After: wiener_chroma_10bpc_neon: 176319.7 125992.1 128162.4 wiener_chroma_12bpc_neon: 176386.2 125986.4 128343.8 wiener_luma_10bpc_neon: 176174.0 126001.7 128227.8 wiener_luma_12bpc_neon: 176176.5 125992.1 128204.8 This gives a small speedup on A53, a bit larger one on A72 and little change (mostly noise?) on A73.
-
Martin Storsjö authored
-
Martin Storsjö authored
The vext.8 instructions only need to produce a single d register each, making more registers available as scratch space, allowing to hide latencies more, and group the vmul/vmla in the form that is beneficial for in-order cores (with a special forwarding path for such patterns).
-
Martin Storsjö authored
-
Martin Storsjö authored
-
Martin Storsjö authored
-
- Sep 20, 2020
-
-
Makes !1078 redundant.
-
This avoids lots of warnings about unsupported warning options.
-
Martin Storsjö authored
Don't pass the .S assembly sources as C source files in this case, as e.g. MSVC doesn't support them (and meson knows it doesn't, so it refuses to proceed with an MSVC/gas-preprocessor wrapper script, as meson detects it as MSVC - unless meson is hacked to allow passing .S files to MSVC). This allows building dav1d with MSVC for ARM targets without hacks to meson. (Building in a pure MSVC setup with no other compilers available does require a few new patches to gas-preprocessor though.) This has been postponed for quite some time, as compiling with MSVC for non-x86 targets in meson has been problematic, as meson used to require a working compiler for the build system as well, and MSVC for all targets are named cl.exe, and you can't have one for the cross target and the build machine first in the path at the same time. This was recently fixed though, see https://github.com/mesonbuild/meson/issues/4402 and https://github.com/mesonbuild/meson/pull/6512. This matches how gas-preprocessor is hooked up for e.g. OpenH264 in https://github.com/cisco/openh264/commit/013c4566a219a1f0fd50a8186f2b11fd8c3efcfb.
-
Janne Grunau authored
Fixes #350.
-
- Sep 17, 2020
-
-
- Sep 15, 2020
-
-
Wan-Teh Chang authored
If c->operating_point_idc is nonzero and either bits 0-7 or bits 8-11 in it are all 0s, it will cause dav1d_parse_obus() to drop all layer-specific OBUs. Prohibit any op->idc with such properties because it could be selected as c->operating_point_idc.
-
- Sep 06, 2020
-
-
- Sep 03, 2020
-
-
Martin Storsjö authored
Examples of checkasm benchmarks: Cortex A7 A8 A9 A53 A72 A73 mc_8tap_regular_w8_0_16bpc_neon: 158.7 106.2 167.0 127.9 55.0 77.2 mc_8tap_regular_w8_h_16bpc_neon: 1000.8 557.5 749.2 609.2 401.4 485.4 mc_8tap_regular_w8_hv_16bpc_neon: 2278.9 1255.4 1352.5 1277.2 867.8 915.9 mc_8tap_regular_w8_v_16bpc_neon: 1060.0 393.6 485.5 448.3 298.0 298.2 mc_bilinear_w8_0_16bpc_neon: 159.7 96.6 161.1 123.7 55.4 74.7 mc_bilinear_w8_h_16bpc_neon: 342.3 250.8 352.9 239.0 158.4 165.1 mc_bilinear_w8_hv_16bpc_neon: 587.7 373.8 469.0 339.8 244.4 247.5 mc_bilinear_w8_v_16bpc_neon: 285.8 189.3 284.9 180.4 103.4 100.9 mct_8tap_regular_w8_0_16bpc_neon: 233.0 136.6 229.3 169.3 86.2 98.3 mct_8tap_regular_w8_h_16bpc_neon: 1106.8 588.3 817.9 654.1 406.4 489.8 mct_8tap_regular_w8_hv_16bpc_neon: 2473.3 1326.3 1428.2 1373.7 903.3 951.1 mct_8tap_regular_w8_v_16bpc_neon: 1266.0 474.1 581.3 505.9 382.0 373.4 mct_bilinear_w8_0_16bpc_neon: 232.9 126.2 225.0 166.3 86.2 91.7 mct_bilinear_w8_h_16bpc_neon: 380.6 270.6 386.0 259.7 154.1 151.9 mct_bilinear_w8_hv_16bpc_neon: 631.4 409.2 509.4 372.1 243.1 244.1 mct_bilinear_w8_v_16bpc_neon: 349.5 233.5 347.9 212.4 138.7 138.4 For comparison, the corresponding numbers for the existing arm64 implementation: Cortex A53 A72 A73 mc_8tap_regular_w8_0_16bpc_neon: 94.1 48.9 62.3 mc_8tap_regular_w8_h_16bpc_neon: 570.4 388.1 467.3 mc_8tap_regular_w8_hv_16bpc_neon: 1035.8 775.0 891.2 mc_8tap_regular_w8_v_16bpc_neon: 399.8 284.5 278.2 mc_bilinear_w8_0_16bpc_neon: 90.0 44.3 57.4 mc_bilinear_w8_h_16bpc_neon: 191.7 158.7 156.4 mc_bilinear_w8_hv_16bpc_neon: 295.6 235.0 244.9 mc_bilinear_w8_v_16bpc_neon: 147.2 99.0 88.8 mct_8tap_regular_w8_0_16bpc_neon: 139.4 78.4 84.9 mct_8tap_regular_w8_h_16bpc_neon: 612.3 395.9 478.6 mct_8tap_regular_w8_hv_16bpc_neon: 1113.0 804.3 963.5 mct_8tap_regular_w8_v_16bpc_neon: 462.1 370.8 353.3 mct_bilinear_w8_0_16bpc_neon: 135.6 77.0 80.5 mct_bilinear_w8_h_16bpc_neon: 210.8 159.2 141.7 mct_bilinear_w8_hv_16bpc_neon: 325.7 238.4 227.3 mct_bilinear_w8_v_16bpc_neon: 180.7 136.7 129.5
-
Martin Storsjö authored
Narrowing the intermediates from the horizontal pass is beneficial (on most cores, but a small slowdown on A53) here as well. This increases consistency in the code between the cases. (The corresponding change in the upcoming arm32 version is beneficial on all tested cores except for on A53 - it helps, on some cores a lot, on A7, A8, A9, A72, A73 and only makes it marginally slower on A53.) Before: Cortex A53 A72 A73 mc_8tap_regular_w2_hv_16bpc_neon: 457.7 301.0 317.1 After: mc_8tap_regular_w2_hv_16bpc_neon: 472.0 276.0 284.3
-
Martin Storsjö authored
This matches how the same logic is written for w4 and above.
-
Martin Storsjö authored
-
Martin Storsjö authored
The previous form was a leftover from how it had to be written on aarch64.
-
Martin Storsjö authored
For loads of a half/full register, the actual size of the elements doesn't matter, but it makes the code more readable and understandable.
-
- Sep 01, 2020
-
-
The previous floating-point implementation produced results that were sometimes slightly off due to rounding errors. For example, a frame size of 432x240 with a render size of 176x240 previously resulted in a PAR of 98:240 instead of the correct 11:27. Also reduce fractions to produce more readable numbers.
-
- Aug 30, 2020
-
-
This adds A<W>:<H> to the Y4M header, to preserve the intended aspect ratio for anamorphic video.
-
- Aug 29, 2020
-
-
Martin Storsjö authored
Cortex A7 A8 A9 A53 A72 A73 avg_w4_16bpc_neon: 131.4 81.8 117.3 111.0 50.9 58.8 avg_w8_16bpc_neon: 291.9 173.1 293.1 230.9 114.7 128.8 avg_w16_16bpc_neon: 803.3 480.1 821.4 645.8 345.7 384.9 avg_w32_16bpc_neon: 3350.0 1833.1 3188.1 2343.5 1343.9 1500.6 avg_w64_16bpc_neon: 8185.9 4390.6 10448.2 6078.8 3303.6 3466.7 avg_w128_16bpc_neon: 22384.3 10901.2 33721.9 16782.7 8165.1 8416.5 w_avg_w4_16bpc_neon: 251.3 165.8 203.9 158.3 99.6 106.9 w_avg_w8_16bpc_neon: 638.4 427.8 555.7 365.1 283.2 277.4 w_avg_w16_16bpc_neon: 1912.3 1257.5 1623.4 1056.5 879.5 841.8 w_avg_w32_16bpc_neon: 7461.3 4889.6 6383.8 3966.3 3286.8 3296.8 w_avg_w64_16bpc_neon: 18689.3 11698.1 18487.3 10134.1 8156.2 7939.5 w_avg_w128_16bpc_neon: 48776.6 28989.0 53203.3 26004.1 20055.2 20049.4 mask_w4_16bpc_neon: 298.6 189.2 242.3 191.6 115.2 129.6 mask_w8_16bpc_neon: 768.6 501.5 646.1 432.4 302.9 326.8 mask_w16_16bpc_neon: 2320.5 1480.9 1873.0 1270.2 932.2 976.1 mask_w32_16bpc_neon: 9412.0 5791.9 7348.5 4875.1 3896.4 3821.1 mask_w64_16bpc_neon: 23385.9 13875.6 21383.8 12235.9 9469.2 9160.2 mask_w128_16bpc_neon: 60466.4 34762.6 61055.9 31214.0 23299.0 23324.5 For comparison, the corresponding numbers for the existing arm64 implementation: avg_w4_16bpc_neon: 78.0 38.5 50.0 avg_w8_16bpc_neon: 198.3 105.4 117.8 avg_w16_16bpc_neon: 614.9 339.9 376.7 avg_w32_16bpc_neon: 2313.8 1391.1 1487.7 avg_w64_16bpc_neon: 5733.3 3269.1 3648.4 avg_w128_16bpc_neon: 15105.9 8143.5 8970.4 w_avg_w4_16bpc_neon: 119.2 87.7 92.9 w_avg_w8_16bpc_neon: 322.9 252.3 263.5 w_avg_w16_16bpc_neon: 1016.8 794.0 828.6 w_avg_w32_16bpc_neon: 3910.9 3159.6 3308.3 w_avg_w64_16bpc_neon: 9499.6 7933.9 8026.5 w_avg_w128_16bpc_neon: 24508.3 19502.0 20389.8 mask_w4_16bpc_neon: 138.9 98.7 106.7 mask_w8_16bpc_neon: 375.5 301.1 302.7 mask_w16_16bpc_neon: 1217.2 1064.6 954.4 mask_w32_16bpc_neon: 4821.0 4018.4 3825.7 mask_w64_16bpc_neon: 12262.7 9471.3 9169.7 mask_w128_16bpc_neon: 31356.6 22657.6 23324.5
-
- Aug 28, 2020
-
-
Martin Storsjö authored
We can't compare the decoding speed with the intended decoding rate, but the frame rate alone is still useful.
-
- Aug 22, 2020
-
-
Janne Grunau authored
Errors on C11 features like anonymous strucs/unions.
-
Janne Grunau authored
-
Janne Grunau authored
-
Janne Grunau authored
-
Janne Grunau authored
Also changes the type intptr_t to make adding variable size members more convenient.
-
- Aug 21, 2020
-
-
Janne Grunau authored
-
-
Makes using unmodified upstream x86inc.asm possible.
-
-
-
- Aug 07, 2020
-
-
Martin Storsjö authored
This fixes building in configurations where no readtime implementation is available at all, such as MSVC targeting 32 bit ARM. This was missed when the check was added in 95a19254.
-