A Perceptually Meaningful Audio Visualizer

7 min readDec 21, 2016

Audioscope: What you see is what you hear.

I pay a lot of attention to details in sound. I wanted to be able to see these details, and also point them out while describing sounds to people. Unfortunately, most audio visualizers don’t reveal these details.
So I created Audioscope, and made a video and soundtrack to demonstrate how some of these fine sonic details are made visible and obvious:

Please watch this in HD, or else you’ll literally miss half the details.

How it Works

tl;dr it turns sine waves into circles

Technically: The y axis is the raw audio signal, and the x axis is the signal filtered such that every frequency is phase shifted by 90˚.

Here’s the visual explanation:

Sound Is Made of Sine Waves

We can decompose signals/sound waves into sine waves/pure frequency components. These components have an amplitude and phase.

a sawtooth wave with its component sine waves

By summing them together, we can get the original signal back.

summing many sine waves to create a square-ish wave

Sine Waves Are Made from Circles

We can get a sine wave by tracing out a circle and plotting the y axis.

We can do the same using the x axis.

plotting a complementary sine wave from the x axis

These two waves are the same, except they are 90˚ out of phase.

If we put a sine wave on the y axis and combine it with a 90˚ phase-shifted version on the x axis, we trace out a circle.

plotting a circle with a sine wave and its complement

If we make phase/time a dimension on the z axis, we trace out a helix.

plotting a helix with a sine wave and its complement

Helices turn out to be a mathematically simple way of working with signals. In my opinion, it’s also a more natural way of interpreting audio signals. Given a pure sine wave sound, while converting it to a helix requires you to add an imaginary component to the signal, the resulting helix is more representative of the purity of the sound, since the radius/magnitude is constant.

But let’s keep time in the time dimension and keep the visuals two dimensional.

Turning Waves into Circles

Because we can decompose signals into component sine waves, and convert sine waves into circles, we can convert every component sine wave of a signal into circles, and represent the signal as a sum of circles, where the y axis is the original signal, and the x axis is the signal with every component sine wave phase shifted by 90˚.

successively adding frequency components to form a square wave

Notice how the component frequencies of this sawtooth wave trace out a U shape. Do you remember seeing U shapes in the demo video?

The sum of circles traces out the analytic representation of this square wave. source

This results in a visualization of the signal that is one part real and one part imaginary, but also perceptually meaningful:

Loud sounds have large shapes, and quiet sounds have small shapes. Near silence is a dot in the middle, and pure silence is a plain black screen.
A pure sine wave is just a circle, where the radius corresponds to amplitude.
Purer sounds are very round because they’re made of very few sine waves.
Brighter sounds end up looking spiky because they have many frequency components and also digital sound has limited resolution/is “pixelated”.
Percussive/transient sounds flash on the screen because these signals are very short.
Sustained tones create sustained shapes because tones are periodic signals that have repeating parts that have the same shape, and these shapes keep getting traced out over and over again.
Multiple tones in perfect harmony also have sustained shapes because perfect harmony means the frequencies are integer ratios of each other. In other words, the combination of these periodic signals is also a periodic signal.
Multiple tones in imperfect harmony have shifting/vibrating shapes because something to do with interference and beating and it’s just not periodic so the same shape doesn’t get repeated ok also most music uses imperfect harmony so every time there are multiple tones it’s probably gonna look messy sorry this deserves a dedicated post

More Technicalities

Thickness, Hue, and Saturation

The beam of the Audioscope visualizer has variable thickness and color. These things are more subtle and unpredictable, but if you’re curious and ok with more math, read on.

Thickness: inversely proportional to speed.
Hue: instantaneous pitch, derived from angular velocity.
Saturation: inversely proportional to amount of noise

The thickness decreases as the beam moves faster. This causes high-frequency sounds or loud sounds to appear thinner.

The color is a lot more complicated. Using the HSV color space:

Hue relates to pitch (more technically, pitch class). Pitch is circular, and hue is circular, so this is a natural mapping to make.
Saturation corresponds to amount of noise, where: more noise → more white, less noise → purer colors.
Value is maxed out because I want only the brightest colors

At a high level: the hue of the color roughly corresponds to the pitch of the locally largest frequency component. If we’re dealing with pure sine waves, it directly corresponds to the pitch of the sine wave. This means that, if a 440Hz (A4) sine wave is red, 220Hz (A3) and 880Hz (A5) are also red. A sine wave going from 440Hz to 880Hz would start at red, cycle through every color of the rainbow, and end up at red.

pitch ≈ log_2(frequency) pitch class ≈ pitch mod 1

Technically: At a given point in the beam, we have the angular velocity ω (how fast the beam is turning at that point) (this is distinct from instantaneous frequency). For a pure sine wave, ω corresponds to frequency; If the beam turns twice as fast, the frequency doubles. Interpreting ω as frequency, we can use the above formula to convert it to something corresponding to pitch (class), and use the result as the hue of the color at this point.

Even more technically: For small values of ω, the effects of noise are much more prominent, so there’s actually a filtering step at the end that basically gets the average hue and amount of noise. However, this type of noise isn’t directly related to noise in the signal; It is related to the amount of noise in the angular velocity over time. Well, it should be related, but the current formula needs improvement.

After all this, the colors only have apparent meaning in exceptional cases (pure frequencies). But it does make for nice rainbows that entirely depend on the sound.

Filter Design and Implementation

I’m able to describe the concept of phase shifting every frequency by 90˚ while avoiding heavy mathematics. But actually creating the filter that does this for arbitrary signals requires domain knowledge. This is for those who are familiar with digital signal processing.

I created a generator for an FIR filter that removes all negative frequencies and also DC and Nyquist. I could have just used the plain Hilbert transform, but I wanted to make sure that, for the lower transition band, the magnitude of the real part approximately decreases similarly as the imaginary part, and similarly for the transition band near Nyquist, so that the results will be as circular as possible (as opposed to having vertically oriented ellipses). Low frequencies are very important in electronic music.

Rust is still relatively new and it seems no one has implemented an efficient convolution yet using the FFT, so I just implemented overlap-save on the spot, and made the filter length be as large as possible (and also odd) depending on the FFT size. I generated the impulse response for a bandpass filter with real part removing DC and Nyquist and imaginary part the Hilbert transform, and had it windowed with a Hamming window.

It was an option to use a pair of IIR filter that used less memory and had better magnitude response, but I saw the group delays for the lows and felt it was unacceptably long for an application that needs to be as responsive as possible. Also, I wasn’t okay with the idea of non-linear phase, which I imagine would ruin the integrity of the waveform.

As for the filters for getting the hue and saturation; I just implemented my own biquad lowpass (as in, I copypasted the formula). As I mentioned, I think there’s room for improvement. Currently, I take the angular velocity, take the logarithm of it, and then filter it, because my reasoning was that taking the log would cause the noise to be amplified and the filter would more strongly remove it. But isn’t there some invariance in that ordering? idk I didn’t want to think too much about math tbh and also was too rushed to really take a good look at the waveform and spectrum of the angular velocity BUT IT WOUDL BE NICE

Audioscope Source Code

also, if you liked the music, I’m on SoundCloud