How does processing speed work?

I’m doing upsampling to DSD512 with convolution filter of 1M taps. Processing speed is 1.4x but overall CPU usage is 7% and there’s not even a single thread that goes over 15%. So the CPU headroon is still HUGE but this doesn’t seem to be reflected in the processing speed.
How does is work?

From what I understand, Dsp uses a single core per end point. 1.4x is about 71% of 1 core.

much less than 1.4 X you start to get stutters and dropouts, regardless of the DSP operations of upsampling / convolution etc

raw clock speed is your friend here not many cores, but older chips are probably less capable still even at 4GHz

I upsample everything to dsd512.
My speed is reported as 1.2 to 1.3x and it has never ever missed a beat.
I think I am probably right on the threshold here…
It’s an older nuc 7i3 so I’m pretty impressed by how well it hangs in there.