Your theory is sound. However you did not listen to the two recordings, they are unquestionably sonically different. Confirmed on a Hugo 2 and Sennheiser HD820. Of course if you pre render and unpack the two files then they will perform identically. However this is not how streamers work, they render a few seconds in advance.
Who else can hear the difference? https://www.dropbox.com/sh/b84ilck2lee00ws/AABP8D9-QSGiqSEg3NdjmhMIa?dl=0