Lowest-hassle HQPlayer setup to feed multi-driver digital-crossover system over RAVENNA / AES67?

I’ve seen bits of discussions in multiple threads across multiple years, but (if it’s not too rude to start a fresh thread about this), I’d really like to get clued-in about current best practices given the current state of software and available hardware.

My plan is to migrate a system which currently has a two-driver horn speaker and corner woofers from crossovers and room correction via a TacT RCS-2.2X to what seems like the way the cool kids are doing such things now: tweeters, midrange, and woofers all on separate channels coming out of a Merging Hapi III over a DA8V3P DAC card, with lovingly crafted crossover and room correction filters generated with Acourate and running server-side.

I get the impression that up- (or down- if necessary) sampling, then busting stereo into three channels per side and running the potentially-heavyweight Acourate-derived convolution stuff, then sending via RAVENNA to the Hapi is all something supported by HQPlayer these days?

I’ve only ever used HQPlayer with pure stereo, so the multiple channels are new to me (and I can’t practice until the backordered hardware starts arriving).

I hear that Roon can also handle feeding the Hapi directly, but I was thinking that once I set that up I might wish I had HQPlayer filters as an option, so I’ve been thinking I might prefer to have a dedicated machine sitting on the AES67 VLAN (will this be a problem?) running HQPlayer, fed by Roon (possibly over an additional LAN interface on the Roon core machine dedicated to the AES67 VLAN), with that HQPlayer dedicated to accepting a stream from Roon, doing the processing, then feeding the Hapi six channels derived from the incoming two.

If this overall topology makes sense (please point out anywhere I’ve made a dumb error!), the next thing to plan is what kind of HQPlayer server to set up.

Ideally, I’d like to be able to feed all six channels with 352.8k / 384k PCM (following the source’s rate family), although 176.4k / 192k would be fine if I can do that reliably with more easy to source hardware. If it’s a practical option to be able to try delivering DSD256 to all channels by splurging a bit more in the computer hardware buy, that would be cool to try out later, but I’m not considering it a base requirement.

If, say, a current M4 Mac Mini is actually sufficient to run this HQPlayer instance – well, that’d really be welcome, because any time I can avoid a PC-building exercise I’m really happy to do so. Is the Ravenna Virtual Audio Device for Mac a pretty mainstream and reliable and M4-compatible thing? Is HQPlayer desktop on a Mac like this likely to run reliably, like start-it-and-forget-it infrastructure? Because I’ve used HQPlayer Desktop on my personal desktop Mac, and connections occasionally got dropped, but I don’t know if that’s because I was adding user load to the machine or because the macOS scheduler isn’t ideally suited to this use.

If the easy solution of a little Mac turns out to be suboptimal, I’ll build a box of PC hardware. I’m assuming I need an M.2 SSD to boot from, no big file storage disk, a fast multicore CPU, and… a GPU?

In a setup like this (stereo resampling of PCM, 6 channels of convolution), do we still want an i9-14900K, or is the job so parallel that an AMD processor (which?) might actually be better? If Intel, do we trust the i9-14900K not to have throttling-related flakiness? Do we need / want an added GPU?

If we need a GPU card, what are some sensible choices currently available which HQP is ready to use? Among those, is there a more limited set suitable for use with Linux if HQPlayer Embedded is what gets deployed to do this?

So, on that last option, OS environment for HQPlayer. I really do pretty strongly prefer to deploy Linux machines for any 24/7 computing infrastructure in the house, but if setting up the RAVENNA drivers and keeping them maintained, and keeping GPU support available to HQPlayer Embedded, is a hugely frustrating exercise – I’d be willing to fire this up as a nasty Windows machine, if that’s likely to be less frustrating.

Or… is this job and the hardware discussed among the things HQPlayer OS is suited for?

So those are my many, many questions. I know it’s unreasonable for me to expect someone to answer every single one of them and handhold me through this whole process, but any clarification anyone can offer on any of the things I’ve asked about will be accepted gratefully.

-Jeff

Doable.

HQPlayer just sees Merging VAD as a CoreAudio output. Not that complex.

If you have only one Merging device, you can connect it directly to Mac mini M4 without a switch. But just keep in mind that if you purchase 10GbE version of M4, you have to buy a Sonnet Thunderbolt AVB for it. Apple’s Built-in 10GbE interface is not compatible with Merging VAD (PTP issue). If you plan to let Roon use physical ethernet connection, buy another Sonnet Thunderbolt AVB for it as well. Don’t mix generic LAN with Ravenna network is the golden rule.

M4 is suitable for PCM multichannel upsampling. I use Mac mini M1 for 12ch DXD plus convolution so to speak. If you’d like to try multichannel DSD upsampling, better go high performance PC from cost-effectiveness perspective.

GPU is recommended. One day you’ll use your new system to watch some video contents and GPU offloading (acceleration for HQPlayer) is crucial.

My past experience 3090 is suitable for such job. 4090 is balanced price-performance. I don’t have 5090 so no comments.

I ran HQP on Mac, Win and Linux. HQPlayer Embedded runs on Linux headless server always has the highest performance among them.

HQPlayer OS does not have CUDA driver. You have to build the Linux server and install nVidia driver for GPU offloading.

This is my full AoIP-implemented HQPlayer server system with SMPTE ST2022-7 redundancy:

Built exclusively for multichannel / immersive audio (including gaming and home cinema). :wink:

3 Likes

Thank you for taking the time to reply in so much useful detail!

I’m sure I’ll have further questions, but I’ll try to find as many of them as practical in documentation. You’ve helped me know what sorts of things I need to figure out next.

1 Like

Yes, with the matrix processor you can route any input channel to any output channel you like (max 128 output channels).

For PCM output, the recommendation is to have at least as many P-cores as you have output channels. For DSD output, at least two P-cores per output channel. In particular if you’d like to to do convolution things for DSD->DSD path for more than two channels, I’d recommend a powerful Nvidia GPU to help…

AMD can give you 16 P-core models. Although you don’t have the E-cores then. i9-14900K gives you 8 P-cores and 16 E-cores (which can be used similarly to a GPU).

For multichannel, AMD + big Nvidia GPU may be preferable.

RAVENNA Linux drivers are a pain to deal with. But Linux is otherwise strongly recommended environment. So one option is to run NAA on Windows or macOS where the RAVENNA driver is better and leave the heavy lifting to a Linux machine (this is what I mostly did with Hapi).

Thank you! I’ve been a bit unclear about the plumbing involved for this project, but it’s starting to become clearer the more I read, definitely helped by @Chunhao_Lee and you.

I have a little Mac with 10 M4 P-cores (and a pair of the special blessed Sonnet network gizmos) on the way, which I’m prepared to dedicate to feeding these six drivers. If I move my HQPlayer Desktop license to it and it can support feeding all six drivers PCM, with convolution, at 384k, or 192k, or, hell, even the 96k the system this is replacing ran at… I’ll have the base functionality I need, and reaching for madness like six channels at DSD256X is something I can think about trying as a later wacky project in its own right.

I’m still trying to clarify in my head the order of operations inside an instance of HQPlayer. Is it… applying the configured oversampling filter to the input channels (two of them, in my case), then the matrix pipeline duplicating those two upsampled channels into six, then convolution happening on all those channels as set up in Matrix->Convolution setup?

Would my matrix pipeline channel assignments look something like this?

And then… would the convolution happen after that, operating on the channels as populated by the matrix pipeline setup?

I note that channels are numbered in matrix pipeline, but named for popular surround speaker locations in the convolution setup. Would the numerical IDs for channels in convolution setup be inferred from this list I found in the documentation, but with 1 added to each index number to start counting at 1 instead of zero?

Thanks for your patience! I feel I’m gradually getting less confused. Or maybe that’s wishful thinking.

Yes it’s appropriate pipeline setup for your 3-way stereo systems.

Always start with 1 in HQPlayer. I know it’s painful especially FIR-producing software always start with 0. Need to get used to it.

Matrix happens at the source rate, except for the DSD → PCM case where the processing happens at 1/16th of the source rate. Post-process part happens at the output rate though.

Yes, that’s the case.

The simple convolution engine shouldn’t be enabled together with matrix processor. It is just a simpler setup for straighforward room correction, not for multi-way speaker cross-overs or something like that. For that you have convolution and much more in matrix processor.

Oh, excellent! I failed to find that in the interface. I need to look better.

Ah… so that’s what those Browse buttons are!

You can use browse button to select convolution filter WAV or parametric EQ TXT file. Please remember that each selected file is appended to the Process entry. So you can have multiple convolutions filters, parametric EQ descriptions, delay plugins, etc on the Process line. So if you want to change one of the files, please remember to remove the previous one from the Process entry first! Otherwise you end up having multiple items there.

If you want to just update a convolution filter, you don’t need to do anything. Only information about the file location is stored, not the file contents. The file is read again every time you start playback.

1 Like

I’m really excited to get to try all this, after the hardware arrives!

My Hapi III hasn’t shipped yet. It seems to be something the retailers I’m used to dealing with don’t stock routinely, but have to get from the distributor when somebody orders one.

Oh, speaking of something I’d muddle through figuring out with the hardware available to play with, but which I’m curious about and will ask of people who’ve actually used these devices:

Are these reasonably nimble at changing sampling rate track-by-track, or do they like to get locked to a sampling rate and stay there?

Because I’ve just been assuming playback would toggle between 384k and 352.8k, depending on the rate family of whatever track is being played, but if something like the Hapi doesn’t change rates nimbly, clearly I’ll be picking something for the converter to run at all the time (384 “because higher better” or 352.8 because more stuff in my library is sampled at 44.1 or occasionally a multiple of that).

Actually, I haven’t even spent much time listening to music resampled by HQPlayer from one rate family to a multiple of the other common base rate, because so far everything in my playback chains has worked fine with “auto rate family” enabled, so that’s what I’ve been using because it seemed likely to be optimal. Does resampling across rate families sound a little less good, or is it more computationally intensive, or is auto rate family an option just because only some oversampling filters can accommodate such ratios?

With RAVENNA changing rate is rather big operation. So I’d rather run it at fixed rate instead of switching between rates (which is possible with PCM thought).

DSD is only supported at multiples of 44.1k.

I would keep it at fixed 352.8k for the reason you mentioned, and also because then it is lighter for DSD down-conversion as well. And also conversion from 48k-family to 44.1k family is lighter than vice versa. I also recommend to have convolution filters created for 352.8k rate, then it is most suitable for widest range of material.

It doesn’t sound any different, and it is as good quality. But it is heavier, mostly due to higher memory load, not because it would be heavier on actual calculations. Some filters cannot do the conversion, but I personally don’t find this an issue, since I personally use filters that can convert practically from any rate to any other rate.

2 Likes

I personally always fix output rate to 352.8KHz no matter what source sampling rate:

  1. every time when sampling rate change (for example, 352.8 ↔ 384), Ravenna/AES67 must re-lock the PTP master clock. Time spending is approx. 3s in average. The time gap of sample rate switching might cause listening experience degraded.
  2. Merging devices with DSD capability only support 44.1KHz-based DSD256. So fix at 352.8KHz would help.
2 Likes

Just off-topic, the 5080 NVIDIA is significantly better with HQP than the 3090 or 4090. I’ve owned all of them, and even though it has fewer CUDA cores than the 4090, the GDDR7 helps so much that the 4090 doesn’t stand a chance. So if you ever get a GPU, the 5080 is what you need, no need for the 90 at all.

edit. And you don’t need a 5080 if DSD isn’t the goal.

2 Likes

Then that is what I’ll aim to do! The more I learn from you guys, and think about for awhile, the better I learn what next questions to ask. It’s good to know where there are constraints which make choices clearer.

1 Like

And I thought I was alone downscaling from DSD256! Can confirm that using equipment from last year (14700K with a 4070-Ti-Super, basically the original 4080), downscaling from DSD256 in the 44 family to the 48 family as 96K does not work. And then whether it works to 88K or 176K depends on the HQP filters chosen. The trusty Gauss family works fine. But even sinc-long-h glitches some. Now, this is digging deeper than you are talking about, but 176K is a nice sweet-spot allowing for equipment at lower price points to be deployed downstream. I might try again next year to reach 96K using new equipment.

I only run three-way actives, so I can use BACCH4Mac on an M4 for crosstalk and room correction as well as for the active crossover. I use NAA on the Mac to send from an Ubuntu server running HQP embedded. The profiles function that Jussi added to the HQPe configuration page makes it a lot easier to swap between set-ups. I use the same HQP server for sixteen-channel ATMOS in WAV. In that case, I send analog from an Audient Oria to an SPL MC16. For stereo three-way active, I pass 24-96 (or 24-88) through the Oria via AES to three DACs and then on to an RStAudio VV8 preamp.

I was using USB for all this. But am using Dante now. Gave Ravenna / AES67 a thought. But have not yet tried.

A neighbour runs four-way crossovers. For him, Uli B loaded Acourate filters into a PC-based Roon server. Worked fine until the neighbour changed everything again. So he is back to using Uli B’s convolver software. I would rather use Roon. If I understand right, ROCK does not support though. May need Roon in a Mac or on a Nuc for example.

1 Like

The 4070S is not even close to the 4080. The 4070TiS is essentially the same, a binned 4080 but with a cut-down chip, meaning fewer CUDA cores.

Apologies. Good catch. I updated my text above. I have a 4070-Ti-Super.

1 Like

So. As mentioned earlier, after absorbing the excellent advice from @jussi_laako and @Chunhao_Lee, I’ve given up on the idea of 6 x DSD256 out as impractical, a pipe dream to maybe strive to approach as part of some later masochistic project, but probably not. If I can keep this system fed with a steady, reliable supply of 352.8k PCM, I’ll be a happy camper.

To that end, I got in a little Mac Mini with 10 P-cores (and the special approved Sonnet network interfaces) to use as infrastructure dedicated to running HQPlayer Desktop and feeding six channels to this one particular system. I set it up, spent what seemed like hours chasing down unneeded Apple cloud and local services and turning them off, and have been soak-testing it driving one of the plain 2-channel stereo systems here for days at a time. It seems usably stable, and the overhead when feeding just 2 channels of 352.8 is tiny. I haven’t yet found / made some representative 352.8k filter files to have HQP use, to get an idea of the load with convolution, so obviously I haven’t yet tested splitting out and DSPing 6 channels and sending them out to /dev/null to get a notion of that multichannel load, although I plan to.

But what I have noticed with this setup, even just running in stereo, is that feeding in DSD256 sources gets really expensive, both in measured CPU utilization and in watts turned to heat and dissipated. Individual CPU cores get up to high 90s or a little over 100°C, and that tiny Mac Mini case gets uncomfortably warm to the touch. And while those may still be safe operating temperatures for the equipment, I’d find it stressful to know that that was going on for hours at a time.

Which led me to realize that by the time I’d climbed the upgrade trail to this 14-core version of the Mini, I’d gotten absurdly close to the cost of a Mac Studio with the same core complement, a bit better memory bandwidth, and (importantly to me) way more robust thermal dissipation.

So. Despite the work I’ve already put in getting this Mini lean and mean and working, I think that I’ll return it before that window closes and get a Studio to do this job.

But then… devil on my shoulder…
I really really don’t want to pay the obnoxious additional increment for the M3 Ultra version of the Studio, but this is starting to look to me like the kind of heavily parallelized job that damned thing is actually suited for.

@jussi_laako, what’s a better tool for the job of running six simultaneous channels of potentially heavy convolutions, up/cross-sampling to 352.8k, and dealing with converting incoming stereo sources arriving at up to DSD256x48?

2 M4 P-cores per output channel, and access to some part of a pool of 4 M4 E-cores,

or

3 M3 P-cores per output channel plus 2 P-cores to spare, access to some part of a pool of 8 M3 E-cores, and even better systemwide memory bandwidth?

I want to be carefree and confident that the destination system will never get starved for data because of insufficient computational resources, and now is the time for me to get this right.

Thanks as always…

I’d say the Mini can handle it too, because PCM isn’t very demanding, as far as I know. Others can confirm this, though, as it’s been years since I last used PCM. Those more powerful Macs will certainly handle DSD256, even if you plan to run it as multichannel. Apparently, it has to be Apple, and a PC is out of the question? Because then you’d get by much cheaper…