Auditory System

Auditory perception is the ability to identify and localize the sound signals in the environment.

Physical Characteristics of Sound

Sound originates because of a disturbance in the air - it's a mechanical wave that is an oscillation of pressure that is composed of frequencies within our auditory range.


We classify sound waves by their amplitude and their frequency (as well as wavelength, pressure, intensity, direction, etc).

Sound waves are “Condensations” (high pressure regions) and “rarefactions” (low pressure regions) of the air


The amplitude of a sound wave is the magnitude of the change in pressure during oscillation. When we draw the waves, the height is the amplitude.


Decibel (dB): A unit of measure for the physical intensity of sound; Decibels define
the difference between two sounds as the ratio between two sound pressures

We can hear a very vast range of sounds, from 0 decibels (barely audible, relative amplitude 1), to the average speaking voice of 60 decibels (relative amplitude 1000), to the pain threshold of a jet engine taking off at 140 decibels (relative amplitude 10,000,000)



Frequency is the number of occurrences per unit of time - it's affected by the rate of displacement at the source. In a diagram, it's to do with how many waves there are.


Hertz (Hz): A unit of measure for frequency. One Hz equals one cycle per second

All animals have particular ranges of sensitivity to pitch that differ between species.

Loudness and Pitch

The physical size of the Amplitude and the Frequency affect our perception of the loudness and pitch of a sound, respectively.

A higher amplitude equates to a louder sound, and a higher frequency to a higher pitch.

Perceiving Pitch

Simple vs Complex Sounds

Simple sounds are ones with a sinusoidal shape - there's not much more to say. Complex sounds are a combination of simple sounds with different frequencies;

  • The lowest frequency is called the fundamental frequency, and determines the perceived pitch
  • If the component frequencies are related to each other in a regular fashion (integer multiples of a “fundamental frequency”) they are referred to as a harmonic series


The image below shows how amplitude and frequency are unrelated, as well as illustrating the different frequencies of a complex tone.


Missing Fundamental

Sometimes if the lowest frequency is taken out (known as a missing fundamental) we may still perceive the same pitch; This is because of how the brain perceives the pitch (not just by the fundamental, but also by the periodicity implied by the other harmonics).

Shepard Scale

A Shepard Tone is a sound consisting of a superposition of sine waves, separated by octaves. A Shepard scale involves the base pitch of the tone moving upwards or downwards.

This has the affect of creating the auditory illusion of a tone that continually ascends/descend sin pitch, but never gets higher or lower.

Pitch circularity, is another term for this, where the harmonic series is continuously fading in at the lowest frequency an aout at the highest.


Timbre is the quality of a tone, and what differentiates sounds that have the same frequency/amplitude, but are noticeably different. These are because of physical differences in spectrum and envelope.

Timbre is determined by the mixture and extent of harmonics, and “build-up” and “decrease”.

An example of timbre is that even though the harmonic structure is unaffected, a piano tone played backward does not sound like a piano.

Basic Structure of the Auditory System

The Auditory System



The inner ear is filled with endolymph & perilymph fluids (3 sections, 1 with endolymph, 2 with perilymph), compared to the middle/outer ears which are filled with air. The pressure changes mean the vibrates transmit poorly into the denser medium, which is why the tiny bone structures in the middle ear act to amplify the vibration.

The Inner Ear

The Cochlea is where all the magic takes place. It is a spiral-shaped cavity in the bony labyrinth.

The Organ of Corti is the sensory organ of hearing, and this is within the cochlea (distributed along the basilar membrane, the partition that separates the fluid chambers).

Hair Cells

Hair cells in the cochlea are mechano-receptors which are stimulated by the relative
movement of the basilar membrane and the tectorial membrane above; these allow us to encode pitch.


Which hair cells respond allows us to encode by place, and how the hair cells respond allows us to encode by frequency. Different frequencies disrupt different parts of the basilar membrane, allowing us to determine by place - however at lower frequencies, the entire basilar membrane vibrates (maxing the place of maximum disturbance be the entire membrane).

Hence the “place of maximum disturbance” is across the entire membrane
This is where temporal, or frequency theory is important


Frequency Theory

Frequency Theory hypothesises that a higher frequency causes the hair cells to fire faster to keep up with faster vibrations, and hence causes a higher pitch.


Ascending Auditory Pathways

Sound hits the cochlea, and gets transferred to the cochlear nucleus. From there mainly to the opposite superior olive (although some to the same), and then through to the inferior colliculus, the medial geniculate nuckeus, and finally the auditory cortex.


What/Where Pathways

The ventral/what stream is responsible for identifying sounds, whereas the dorsal/where stream is responsible for locating sounds.


Each ear receives an amplitude/frequency varying waveform of the sum of all sounds around us, so the problem becomes analysing and segregating the sounds into distinct sonic events.


Auditory Perceptual Organisation

Grouping by Proximity

We group sounds by temporal and pitch similarity. For instance the following three sound clips show the difference in our perception of two different-frequency tones played together:

  1. Two slightly offset sounds - appear as one
  2. Two sounds with a large frequency gap - appear as two
  3. Two intermediately split sounds - alternate between hearing each


(Examples from lumiere).

Good Continuation

The Gestalt theory of mentally organising things by continuation applies auditorally as well as visually - if we have a sound with obvious gaps we can recognise those, but if a second sound obscures the first we assume it continues.


Talked about here.

Auditory/Sound Localisation

Sound localisation is the process of determining from which direction a sound originated. The brain utilises subtle differences in intensity, spectral (from the filtering of the Pinna), and timing cues to allow us to localise sound sources.


Interaural Intensity Differences

Sound from the right side has a higher level at the right ear than at the left ear, because the head shadows the left ear. These level differences are highly frequency dependent and they increase with increasing frequency.

Knowing this we can determine which region the sound is coming from.

Interaural Timing

Interauaral Time Difference (i.e. the time difference between each ear hearing something) allows us to determine which ear is closer to the sound, and the approximate angle.


Precedence Effect

The precedence effect is a binaural psychoacoustic effect. If there is 0-1ms delay of sound coming from two speakers we determine the noise as one sound, and the location shifts depending on where the sound came from first.

However if there is a 1-5ms delay we only perceive the sound as coming from one speaker, and greater than 5ms we perceive both sounds (the echo effect).



Localising Inside Rooms

Inside a room we have to take into account both direct and indirect (bouncing off objects) sounds, for determining the origin.