Wednesday, April 29, 2020

His Master's Voice

AVI thought that dogs would pay more attention to recorded voices as the reproduction quality improved.


I wasn't so sure. Look at the above frequency response thresholds. Dogs can hear higher frequencies than we, but miss out on the lower. When a dog hears your voice, it hears supersonic details we don't, and misses the lower register. It doesn't hear what you hear.

Below find the response curve for a "Wadia 861, Filter A, CD." The response curve is nice and flat--until it falls off. It keeps up with the low register, which the dogs don't notice anyway, but it misses out on the high end, which we don't care enough about to instrument. (Why bother? Dogs don't buy electronics.)


The bulk of the energy spectrum for human speech is below 1000Hz, but there's still about a percent or so above human hearing range. Does 1% matter? I guess that dogs wouldn't have that kind of sensitivity unless it made a difference--maybe in finding mice. I wonder if anybody has tested dog response to human voices with signal response flat out to 30-40 kHz.

7 comments:

Korora said...

I can see how the data would be determined for dogs and chickadees, but how did they determine it for the brachiosaurus?

Douglas2 said...

Korora:

See here: https://www.ncbi.nlm.nih.gov/pubmed/15951620

Douglas2 said...

You've mapped the receiver (dog hearing vs human hearing) but not considered the spectrum of the source - human speech. If we consider that relaxed speaking might have a sound level of 54dB SPL(A), and look at a more detailed dog threshold of audibility, such as this one: https://imgur.com/gallery/xldTgLR

It is easy to see that 54dB is above threshold for all breeds from about 100Hz, which is below the fundamental frequency for typical male human voice. So there really is nothing at the low end that we-could-hear but the dog-would-be-missing at the low end, assuming that the speakers in the laptop compute or phone were adequately reproducing the energy in the 1st place.

As to the high end of the spectrum for human voice, while I admit that fricatives and especially sibilant parts of the speech could have detectable energy above 15kHz, it really isn't carrying information useful for identification of the speaker that doesn't already exist in lower parts of the spectrum.

I'll suggest the proof of this is that we can recognize the voice of familiar people as easily with audio low-pass filtered at 8k vs audio left to the full frequency range of the CD player or video camera and playback. We might describe the sound as less crisp, but we have no impairment of our discrimination between voices. So assuming that there was adequate energy above 15k that we don't hear but the dog does, and that this energy has 'information' that is lost to the dog when listening through 20kHz bandwidth-limited electronic audio, seems implausible to me.

The dog ignores the TV voices but recognizes Zoom voices can be easily explained however: In recent times the TV audio uses some rather extreme data-rate reduction for the original PCM sampling of the source audio, and this data-compression uses a "receiver model" based upon human hearing characteristics to determine what information is encoded and what information is left to become noise.
https://en.wikipedia.org/wiki/MPEG-1_Audio_Layer_II#How_the_MP2_format_works
Zoom, however, instead (for most configurations of its (Advanced Sound Settings page) uses a "Speech codec" which is a "Source Model" for data-rate-reduction, it uses the physical characteristics of human speech to determine what portions of the original PCM encoding of the sound picked up by the microphone should be rendered with the best detail.
(Opus, based on CELP: https://en.wikipedia.org/wiki/Opus_(audio_format), https://en.wikipedia.org/wiki/Code-excited_linear_prediction

Douglas2 said...

Further to Korora:
The "cochlea", the organ that is the part of our ear which has the nerve endings which actually detect the sound, is encased within petrous part of the temporal bone.

So we know that the size of the cochlea is directly related to the cavity in the temporal bone for the cochlea, and we've got fossil temporal bones we can x-ray to determine via this proxy the size of the cochlea for Brachiosaurus.

Further, we know that the size of the cochlea in animals that we can study is very closely related to the frequency range of sounds that they can hear.

Douglas2 said...

Further on TV vs Zoom for dogs:

In sum, human speech using a speech codec such as zoom is going to sound pretty much the same for any species, but human speech over receiver-model codecs such as TV or youtube is going to have a lot of clearly audible and distracting quantization noise for any creature with ear physiology (cochlear size) much different from humans.

james said...

Or cell phones.
And wrt the quantization noise, even humans can hear some of the differences when low sampling rates are used.

Anonymous said...

The difference between digital and analog recordings is easily audible. Of course you need a system that is quite revealing and to compete with a record, RIIA curve and all you need a good DAC. They are available, almost cheaply, these days. ;)