This link is spot-on, particularly this part:
This issue is technical but also standards-related. The popular MP3 standard, for example, defines no way to record the amount of delay or padding for later removal. Encoder delay may vary from encoder to encoder, making automatic removal difficult. Some encoders use a nonstandard header to store actual encoder delay & padding values, but not all players/decoders support it. More recent (newer than MP3) compressed audio formats have been designed to address this problem, and can therefore produce gapless audio if played back correctly.
Because of MP3’s history with patents, we don’t ship our own MP3 handling code–rather we leverage what comes with the operating system. Because of this, we get whatever behavior we inherit from there.
MP3 patents expired a couple of months ago, so we are investigating moving to a single MP3 codec shipped as part of Roon instead of having multiple operating system specific implementations with slightly different behavior.
If we do this, we will try to use one that supports the unofficial gapless conventions.