tests/bench: flush after every iteration
The whole point of pl_gpu_flush() is to commit work to the GPU and rotate queues, there is literally no point in holding on to work like this besides the miniscule amount of submission overhead. It also completely destroys the parallelism that we get from async compute.