OK, but without some definition of ‘low latency’ it is hard for any meaningful response to be given.
Some examples - I am a keyboard player - in order for me to keep good musical timing, I personally like to have a round trip time from pressing a key to hearing a note from a synth to under 10ms when running a software synthesizer on my computer. If I am playing electronic drums (Roland V-drum for eg), then I prefer that to be closer to under 5ms (which is getting quite extreme for a general computer with a USB or other audio interface) especially as the sensors need a ms or so as well.
When I used to DJ, I had a similar need so I could drop exactly in time with what I am hearing because it is really important to avoid any audible beat mismatch.
These are relatively hard low latency needs compared to anything that is reasonable to demand for listening to music.
When pressing play on a ripped or stream, then 250ms is fast - more than fast enough. A typical internet radio stream usually demand some buffering to preserve a continuous stream - 1sec is not uncommon from pressing play to audio. The reason is the data is being emitted in real time, so you cant really read ahead like you can with a local or NAS file, or major streaming service APIs.
I don’t get why low latency in listening to music is anything but a bad thing.
Low latency basically makes everything more at risk of dropouts, glitches etc. It also demands higher CPU use. 5-10ms latency is quite relaxed, but still quite sensitive to network issues. 100ms or so latency is a good compromise for non specialized audio over LAN IP - ie scenario where user don’t have a hard need to timely reaction and basically just press play/pause/next/skip etc.
Reliable 1ms latency is really hard to achieve on a general OS and will typically require high CPU use and is very much at the mercy of other I/O going on the computer. In fact all example I can think of where this is achieved reliably involve DSP hardware and/or programmable logic instead of regular software running on a regular CPU. RME for example use field programmable gate arrays for their USB audio implementation in many or maybe even all of their professional audio interfaces because it is the only way to achieve this reliably, so my RME UFX is the only one of many audio interface and DAC that I have where I can achieve sub 3ms round trip latency.
I get the sense that ROCK is reasonably optimised for its sensible use case - ie playing audio in a timely and reliable manner when someone presses play and supporting browsing and search of media.
When you press play, then Roon is making a reasonable gamble that it buffers up sufficient data initially that playback will be seamless for long enough to give it time to start rapidly sending a lot more audio data to get well ahead of the audio stream to ensure the playback remains seamless over even quite significant network issues. This is how most decent media players work that have to deal with audio data over an IP network.
Why do you need low latency again?