High resolution audio

Admin

Administrator
Moderator
Messages
3,677
#1
CD quality allows you to reproduce frequencies below 22.05 Khz at a a signal to noise ratio of up to 96 dB, this is actually pretty good considering CD was format was released in 1982. It's actually very difficult to fully utilize the Signal to Noise ratio CD offers even today


The topping 90se has a signal to noise ratio of 129.5 dB which is equivalent to 21.5 bits.

One potential way to utilize more than 20 bits is to have the track contain very loud low frequency sounds potentially allowing 130 dB of peak volume without damage to the ears, you would also need a very quiet room for that (ideally less than 0dB noise).

The ultrasonics high resolution audio often contain can however make an audible difference


I did my own test to confirm this (XY blind test)

Test0: i preferred 192 KS/s to 48 KS/s 9 of 10 times (p = 0.02148)
Test1: i preferred 44.1 KS/s to 96 KS/s 30 of 43 times (p = 0.01372)

Test1 was a piano track and while the 96 KS/s version did sound more like a piano i actually liked it better when the ultrasonic were cut off, this illustrates that often higher accuracy will not actually be more pleasing to listen to.

The manufacturer of my speaker claims they will reach 32 Khz within ±3 dB but i have not looked at any independent measurements.
 

Admin

Administrator
Moderator
Messages
3,677
#2
The loudness war
Especially the CD version tend to be destroyed in the mastering process for the sake of loudness


Typically a "high resolution" version will be mastered a lot better which will generally sound way better even if you convert it to CD quality.
 

Admin

Administrator
Moderator
Messages
3,677
#3
Looking into the meta-analysis showing high resolution audio to be technically superior
The study did show that people without training had a very hard time hearing a difference but with training it was possible.

1629710223898.png
 

Admin

Administrator
Moderator
Messages
3,677
#4

Admin

Administrator
Moderator
Messages
3,677
#5
Are dac/adc filters to blame?
With CD quality you have a bandwidth of only about 2khz to implement filters that will cut out ultrasonics above the nyquist frequency.

The 2*f*sinc(2*f*x) convolution filter will cut out all frequencies above a certain frequency and keep all frequencies below that frequency.

1629793777372.png


1629794550339.png


1629794368566.png


1629794992644.png



It's very likely that he is wrong here and that the real issue is high frequencies making an audible difference when mixed with lower frequencies, MQA would not fix that in addition to being very questionable in other ways

 

Admin

Administrator
Moderator
Messages
3,677
#6
How to pass sample-rate blind tests
First you need to find a track that actually contain usable ultrasonic frequencies. I recommend finding some track you like that uses some recording of an acoustical instrument.

You also need tweeters able to reproduce these high frequencies but luckily there are inexpensive options for this such as the following ribbon tweeters

madisoundspeakerstore.com/ribbon-tweeters/fountek-rd1.0-ribbon-tweeter/%20%20Fountek%20RD1.0%20Ribbon%20Tweeter

The bigger issue is that having a decent tweeter isn't enough, it's actually really difficult to build a good speaker and even very expensive ones tend to have severe flaws.

People have recommended using audacity to downsample and then upsample back to the original sample rate (to compare) audacity does use a very steep digital filter so there should not be any audible difference below 20.5 Khz, i have tried substracting the two version (making a difference file) and it was not audible to me even at high volume (when it wasn't mixed with the lower frequencies).

Then when you start the listening test i recommend that you put the different versions in a loop and start right away by playing one of these randomly, you can just pick the version you like the best and keep doing that until you get a statistically significant result.

What you will probably find out is that when the ultrasonic frequencies are included it sounds more like you are actually having the instrument physically being played in the room, without these frequencies you can hear that something is missing.
 

Admin

Administrator
Moderator
Messages
3,677
#7
Why it was a mistake to assume "we cannot detect ultrasonic frequencies"
Human hearing is not linear. The mistake people is assuming that the following would hold true for human hearing

Listening(A+B) = Listening(A)+Listening(B)

If that were true if Listening(B)=0 then Listening(A+B)=Listening(A)

But since that does not hold true when humans listen to music we cannot assume that sounds that are themselves inaudible will not make an audible difference when mixed with other sounds.
 

Admin

Administrator
Moderator
Messages
3,677
#8
Intermodulated distortion?
It has been suggested that intermodulated distortion in speakers can result in people being able to tell the difference between high and low sample rates. I do think that the human ear is the most likely place for IMD to have an effect but we still need to properly rule this out, there are 2 options for that:

http://www.davidgriesinger.com/intermod.ppt

0. Do a test where the ultrasonics are played from a separate super tweeter only, this does have issues such as poor integration but if successful it would pretty much completely rule out the "IMD in speakers" theory (AMPs should not have audible distortion, same with DACs, especially not high-end models). One option is to put the ultrasonic content in one channel and then have the speakers close to each other, not ideal but i cannot think of any good alternative.

1. Measure IMD and demonstrate that it would not be audible.

I do not currently have any good measurement equipment for my speakers but even with a bad mic measuring IMD should still be very possible. The issue with this approach however is that you can go wrong making assumptions about what is audible, for that reason 0 is arguably the better option.
 

Admin

Administrator
Moderator
Messages
3,677
#9
The Human ear is pretty messy
The sound does go through several stages before finally being converted into electricity, this could potentially result in audible IMD


The issue with the notion of audible IMD produced in the human ear is that the energy that actually reaches the ear is far less than the energy used to generate the tone in the speaker.

120 dB is just 1 W/m²
 

Admin

Administrator
Moderator
Messages
3,677
#10
Unconscious perception of ultrasonics?
What if humans can indeed detect ultrasonics but not conciously? (like what might be the case with magnetic fields).If that were the case it should be possible to detect that such as via an MRI scan

As a mechanism underlying the effect of inaudible high-frequency sound components, we speculate that the brain may subconsciously recognize high-resolution audio that retains high-frequency components as being more natural, as compared with similar sounds in which such components are artificially removed. A link between alpha power and ratings of ‘naturalness’ of music has been reported. When listening to the same musical piece with different tempos, alpha-band EEG power increased for excerpts that were rated to be more natural, the ratings of which were not directly related to subjective arousal (Ma et al., 2012; Tian et al., 2013). As high-resolution audio replicates real sound waves more closely, it may sound more natural (at least on a subconscious level) and facilitate music-related psychophysiological responses.
https://www.frontiersin.org/articles/10.3389/fpsyg.2017.00093/full

the following study found a statistically significant difference even when using separate supertweeters for the ultrasonics:

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0095464



This study does in part replicate an earlier study demonstrating the "hypersonic effect"

https://journals.physiology.org/doi/full/10.1152/jn.2000.83.6.3548

from the study:

Most of the conventional audio systems that have been used to present sound for determining sound quality were found to be unsuitable for this particular study. In the conventional systems, sounds containing HFCs are presented as unfiltered source signals through an all-pass circuit and sounds without HFCs are produced by passing the source signals through a low-pass filter (Muraoka et al. 1978;Plenge et al. 1979). Thus the audible low-frequency components (LFCs) are presented through different pathways that may have different transmission characteristics, including frequency response and group delay. In addition, inter-modulation distortion may differentially affect LFCs. Therefore it is difficult to exclude the possibility that any observed differences between the two different sounds, those with and those without HFCs, may result from differences in the audible LFCs rather than from the existence of HFCs. To overcome this problem, we developed a bi-channel sound presentation system that enabled us to present the audible LFCs and the nonaudible HFCs either separately or simultaneously. First, the source signals from the D/A converter of Y. Yamasaki's high-speed, one-bit coding signal processor were divided in two. Then, LFCs and HFCs were produced by passing these signals through programmable low-pass and high-pass filters (FV-661, NF Electronic Instruments, Tokyo, Japan), respectively, with a crossover frequency of 26 or 22 kHz and a cutoff attenuation of 170 or 80 dB/octave, depending on the type of test. Then, LFCs and HFCs were separately amplified with P-800 and P-300L power amplifiers (Accuphase, Yokohama, Japan), respectively, and presented through a speaker system consisting of twin cone-type woofers and a horn-type tweeter for the LFCs and a dome-type super tweeter with a diamond diaphragm for the HFCs. The speaker system was designed by one of the authors (T. Oohashi) and manufactured by Pioneer Co., Ltd. (Tokyo, Japan). This sound reproduction system had a flat frequency response of over 100 kHz. The level of the presented sound pressure was individually adjusted so that each subject felt comfortable; thus the maximum level was approximately 80–90 dB sound pressure level (SPL) at the listening position.

A significant difference was evident between FRS and HCS in some elements of sound quality. Subjects felt that FRS was softer, more reverberant, with a better balance of instruments, more comfortable to the ears, and richer in nuance than HCS.
 

Admin

Administrator
Moderator
Messages
3,677
#11
Was there anything wrong with the original hypersonic effect paper?
You will see no shortage of people claiming the following paper to be nonsense but it's very hard to find anyone actually point to anything that would actually be wrong with it, it seems that people are dismissing it merely because the result disagree with what they have always assumed to be true and that's pretty much it.

https://journals.physiology.org/doi/full/10.1152/jn.2000.83.6.3548

They used completely separate supertweeter for the ultrasonics that had their own amplifier. That means that there was no interaction between the ultrasonics and lower frequencies until the sound had left the speakers. If there was an audible difference due to non-linearities that must have been due to non-linearities in the air (extremely unlikely) or inside the human body (such as the ear).

Note that the "hypersonic effect" only takes place when the ultrasonics are mixed with lower frequencies:

None of the subjects recognized the HFC as sound when it was presented alone. Nevertheless, the power spectra of the alpha frequency range of the spontaneous electroencephalogram (alpha-EEG) recorded from the occipital region increased with statistical significance when the subjects were exposed to sound containing both an HFC and an LFC, compared with an otherwise identical sound from which the HFC was removed (i.e., LFC alone). In contrast, compared with the baseline, no enhancement of alpha-EEG was evident when either an HFC or an LFC was presented separately.
This does confirm that ultrasonics alone does not make a difference even when they are complex (rather than singular sine waves).
 

Admin

Administrator
Moderator
Messages
3,677
#12
"pleasantness inactive" better when the ultrrasonics was included (p=0.005)
Note that they did do 10 tests here and only one of them got a statistically significant result. p = 0.005 means that the probability of finding that difference or greater is one in 200.

Here "pleasantness inactive" refers to how pleasant their time was after having finished listening to the song.

https://www.frontiersin.org/articles/10.3389/fpsyg.2017.00093/full

They also measured the brain directly and found statistically significant differences there too:



For high-alpha EEG band, the Sound Type × Epoch × Hemisphere interaction was significant, F(5,17) = 7.06, p = 0.001, η²p = 0.67. Separate ANOVAs for each epoch revealed a significant Sound Type × Hemisphere interaction at the 200-300-s epoch, F(1,21) = 12.63, p = 0.002, η2p = 0.38, and a significant effect of sound type at the post-music period, F(1,21) = 6.99, p = 0.015, η²p = 0.25. Post hoc tests revealed that high-alpha EEG power was greater for the full-range excerpt than for the high-cut excerpt and that the sound type effect was found for the left but not right hemisphere at the 200-300-s epoch. No effects of sound type were obtained at the epochs before 200 s. The main effect of anterior-posterior was also significant, F(1,21) = 15.67, p = 0.001, η²p = 0.43, showing that the high-alpha EEG was dominant over posterior scalp sites.

For low-beta EEG band, the Sound Type × Anterior-Posterior × Hemisphere interaction and the main effect of sound type effect were significant, F(1,21) = 4.49, p = 0.046, η²p = 0.18; F(1,21) = 5.43, p = 0.030, η²p = 0.21. Low-beta EEG power was greater in the full-range condition than in the high-cut condition. Separate ANOVAs for anterior-posterior and hemisphere also revealed significant effects of sound type, for posterior region: F(1,21) = 7.07, p = 0.015, η²p = 0.25; for left hemisphere: F(1,21) = 5.26, p = 0.032, η²p = 0.20; for right hemisphere: F(1,21) = 5.27, p = 0.032, η²p = 0.20; except for anterior region: F(1,21) = 3.94, p = 0.060, η²p = 0.16. Although there were no significant interaction effects including epoch, Figure 2 shows that the difference between the full-range and high-cut excerpts seems to be more prominent at later epochs. Two-tailed t-tests revealed significant differences between the two excerpts at the 200-300-s, 300-400-s, and post epochs, ts(21) > 2.37, ps < 0.027; p > 0.114 at the epochs before 200 s.
 
Top