Synesthesia
An Audio Visualizer With Quicktime And SZG
Abstract | Concept | Organization | Compiling | Running
Synesthesia is defined as "perceiving sensory data of one sense with another, eg, seeing sounds or hearing colors."1 Synesthesia exists as a neural disorder in some individuals and is used in some cases of sensory deficit--for example a deaf person can learn to hear through a microphone attached to their tongue with electrodes. My project is called Synesthesia because I aim to represent music as an animated computer graphic.
Audio visualization is common on modern PCs--iTunes, Windows Media Player, and Winamp (3 popular audio players) all come preloaded with visualizations and many more are available as plug-ins.
In order to translate audio into images, a mapping must be established. Several methods are commonly used, including beat detection, volume metering, and the fourier transform. The fourier transform is a natural way to represent audio because it describes the intensities of frequencies--in a sense, it describes what notes are playing.
Joseph Fourier hypothesized in the early 19th century that any periodic function can be exactly represented as a sum (or integral) of sin waves of various amplitudes and frequencies. The digital representation of sound on a computer is basically a series of numbers representing where the diaphragm of a speaker should be over time. This is a discrete periodic function (the period can be taken to be the length of the sound sample), thus a discrete Fourier transform can be applied in order to arrive at a notion of "how much of each frequency" there is in the sample.
The discrete fourier transform is most commonly implemented in computers as the FFT or Fast Fourier Transform. FFT is an algorithm that uses various tricks in order to arrive at an approximation to the discrete fourier transform more quickly than one could arrive at a more accurate result.
Many FFT implementations exist. I chose to use Quicktime 7's FFT because using Quicktime allowed me to ignore the task of acquiring the audio.
The simplest way to visualize audio with the fourier transform is to mimic a graphic equalizer where several vertical bands bounce up and down, with low frequencies on the left and high frequencies on the right. More impressive visuals can be generated with two additions: first, a little pre-meditated human input to animate how the levels are displayed, and second, keeping a short history of levels on the screen so that the sound can be seen passing through time. These features are long-term goals for my project.
1. The Third Plateau : DXM FAQ - Glossary:, http://www.third-plateau.org/faq/dxm_glossary.shtml
David Stolarsky . Math 198 . Spring 2006