|
|
|
Dav1d multi-threading
|
|
|
|
|
|
|
|
To achieve concurrency, dav1d uses a task-queue design. At a high level, the user-provided stream of input OBU packets is split into multiple processing tasks, with a fine-grained dependency tracking mechanism. The user provides a limit on the number of worker threads, and each worker can then consume tasks from this pool.
|
|
|
|
|
|
|
|
Dav1d can execute the following task types:
|
| ... | ... | @@ -9,12 +7,22 @@ Dav1d can execute the following task types: |
|
|
|
- loop restoration for one slice (~64 pixels high) of frame data;
|
|
|
|
- film grain synthesis for one slice (32 pixels high) of frame data.
|
|
|
|
|
|
|
|
Dav1d has some additional task types to administratively blend these tasks together across multiple frames at the same time, and can therefore effectively process multiple tiles, sbrows and frames at the same time. Concurrency is primarily limited by each task's dependencies, and therefore it's critically important that these are minimal-but-complete and that the scheduler orders tasks to achieve maximal parallelism.
|
|
|
|
Dav1d has some additional task types to administratively blend these tasks together across multiple frames at the same time, and can therefore effectively process multiple tiles, sbrows and frames at the same time.
|
|
|
|
|
|
|
|
**Dependency mechanism:**
|
|
|
|
Concurrency is primarily limited by each task's dependencies, and therefore it's critically important that these are minimal-but-complete and that the scheduler orders tasks to achieve maximal parallelism.
|
|
|
|
|
|
|
|
Dependencies are in most cases simple progress-integers ("reconstruction of previous reference frame should be below X") that are checked against their reference. There is no `pthread_cond_wait()` in the code - instead, tasks with unmet dependencies are simply kept in the task-queue.
|
|
|
|
|
|
|
|
Most tasks are implicit dependencies of others. For example, entropy-parsing of a tile's sbrow can only start when the previous tile's sbrow has finished. Some of these dependencies are not reflected in the code. Instead, the tile-sbrow-entropy-parsing task "owns" its own task and can put itself back in the queue (or continue) once the previous tile-sbrow has finished parsing. This mechanism to continue iterating without needing interaction with the task-queue (and thus without needing a lock) is one way to reduce overhead in this system, especially at high thread-counts.
|
|
|
|
|
|
|
|
At the same time, tasks can schedule implicit-dependency follow-up tasks. For example, when a deblock postfilter at sbrow=0 starts, it itself will plan to do cdef and restoration of that same sbrow also. At the same time, it will schedule deblock at sbrow=1 into the taskqueue, because it knows this task wouldn't be able to start until its own task has started. This effectively keeps the number of unschedulable tasks in the task-queue low, which again improves efficiency at high thread counts.
|
|
|
|
|
|
|
|
**Advantages of this threading design:**
|
|
|
|
|
|
|
|
Advantages of this threading design:
|
|
|
|
- simple user-facing configuration/API: just `--threads`, which means "how many cores should dav1d keep busy", and optionally `--maxframedelay`;
|
|
|
|
- extendible: we can add new types of concurrency without needing new CLI params or API changes;
|
|
|
|
- a user does not have to be aware of technical stream characteristics ("does it have tiles?").
|
|
|
|
|
|
|
|
Complications:
|
|
|
|
**Complications:**
|
|
|
|
- the complexity sits in the scheduler, and there's a fair number of heuristical choices that are important but not necessarily obvious. |
|
|
\ No newline at end of file |