Refactor and simplify VkSemaphore usage
Greatly reduces runtime complexity of command submission by only using one semaphore per command. In theory, we could go further and only use one semaphore per queue, but this is much harder to do in a thread-safe way. (Our command buffers are thread-bound, queues are not)