|
|
|
Dav1d multi-threading
|
|
|
|
|
|
|
|
To achieve concurrency, dav1d uses a task-queue design. At a high level, the user-provided stream of input OBU packets is split into multiple processing tasks, with a fine-grained dependency tracking mechanism. The user provides a limit on the number of worker threads, and each worker can then consume tasks from this pool.
|
|
|
|
|
|
|
|
Dav1d can execute the following task types:
|
|
|
|
- entropy-parse one tile-sbrow of entropy data;
|
|
|
|
- reconstruct (prediction+inverse transform) one tile-sbrow of frame data;
|
|
|
|
- deblock (in either direction) or cdef for one sbrow of frame data;
|
|
|
|
- loop restoration for one slice (~64 pixels high) of frame data;
|
|
|
|
- film grain synthesis for one slice (32 pixels high) of frame data.
|
|
|
|
|
|
|
|
Dav1d has some additional task types to administratively blend these tasks together across multiple frames at the same time, and can therefore effectively process multiple tiles, sbrows and frames at the same time. Concurrency is primarily limited by each task's dependencies, and therefore it's critically important that these are minimal-but-complete and that the scheduler orders tasks to achieve maximal parallelism.
|
|
|
|
|
|
|
|
Advantages of this threading design:
|
|
|
|
- simple user-facing configuration/API: just `--threads`, which means "how many cores should dav1d keep busy", and optionally `--maxframedelay`;
|
|
|
|
- extendible: we can add new types of concurrency without needing new CLI params or API changes;
|
|
|
|
- a user does not have to be aware of technical stream characteristics ("does it have tiles?").
|
|
|
|
|
|
|
|
Complications:
|
|
|
|
- the complexity sits in the scheduler, and there's a fair number of heuristical choices that are important but not necessarily obvious. |
|
|
\ No newline at end of file |