Optimization: reduce time taking by MP4_TrackClean
Some user report mp4 demuxer close takes a few seconds on Windows.
The sample file has 70 000+ chunks, one sample per chunk.
It's not a best practice with mp4 container, but not uncommon.
The cpu hot point is DestroyChunk(). It can be fixed by using single memory allocation for p_sample_count_dts/p_sample_delta_dts
and p_sample_count_pts/p_sample_offset_pts
, and use
a few bytes of padding in mp4_chunk_t to avoid memory allocation.