Tutorial #30
Visualizing Audio #1 Time Domain Advanced   2013-12-09

Introduction

The Web Audio API allows you to do some really cool things with audio processing and creation using JavaScript. Tutorial 6 described the basic steps of playing an audio file using the API.

The people who created the Web Audio Specification made the inspired decision to include Audio Analysis functions which describe the Amplitude and Frequency variations. We can combine those with some simple Canvas graphics to create impressive visualizations of the audio. Most of the time these are for fun, but they also provide insight in to the 'structure' of the sound being played.

In this series of four tutorials I'm going to describe four ways of visualizing audio. Here are preview images of what we will produce:

1: Real-time Time Domain visualization

Image 1 for this tutorial

2: Real-time Frequency Domain visualization

Image 2 for this tutorial

3: Min-Max Time Domain variation over time

Image 3 for this tutorial

4: Frequency Domain variation over time

Image 4 for this tutorial

Pretty impressive results from a few lines of JavaScript....

As of December 2013, the Web Audio features used here have only been implemented in Mozilla Firefox and Google Chrome browsers.

This first tutorial in the series covers Real-time Time Domain Visualization and you should read this first as I'm going to cover the code in a lot of detail. The other tutorials will focus on the unique parts of their code.

Demo 1 screenshot for this tutorial


Digital Audio

Consider standing in front of an orchestra. There are multiple sources of audio, each with unique patterns of frequencies and amplitudes. These are all merged into a single complex analog waveform that vibrates your ear drum. To represent this in a digital form they require, computers sample this waveform at a high frequency (i.e. very short intervals) and convert the data to the nearest discrete value that can be represented (quantization). This way of representing audio is called Pulse Code Modulation (PCM). A typical sampling rate for digital audio is 44.100 KHz which is roughly twice the highest frequency that humans can hear.

In the analysis described here, the software collects some number of these samples from an audio stream, say 1024 at a time, and then processes these while continuing to collect another batch, so this is a real time calculation.

There are two main types of analysis - Time Domain and Frequency Domain.

Time Domain analysis looks at the variation in Amplitude, or Volume, of the audio over time.

Frequency Domain analysis produces a Spectrum of the Frequencies that are present in the sample. It does this by performing a Fourier Transform on the data.

Both types offer insight into the structure of the audio and we can visualize the results in real time to produce graphics that change rapidly to reflect the chagning audio.

We can also calculate a summary of the data when each batch of samples has been processed, such as the Maximum and Minimum Amplitude, and plot these against Time as a graph which gives a 'bigger picture' of how the audio varies.


Understanding the Code

This is fairly complex code. I'm going to walk through the first of the four in great detail and in the other three just focus on the parts that differ from this one. I recommend that you read them in order.

The audio sample used in the demo is a clip of the Doctor Who theme by Delia Derbyshire and the BBC Radiophonic Workshop and was downloaded from Wikipedia.

The starting point for this series of tutorials was the excellent tutorial Exploring the HTML5 Web Audio: visualizing sound by Jos Dirsken - he has updated his tutorial in November 2013 to reflect some of the changes in browser implementations.

Take a look at the Demo so you know what we are creating, scan through the code shown here and I'll walk you through it below.


There are 3 sections to this code. The first is the HTML code and this consists of a canvas tag pair, into which we will draw the visualization, followed by two buttons that will start and stop playing the audio. The style section contains the CSS for these elements.

The script section is where it happens. I am using a bit of jQuery here so you need to include that library in your page.

Before a HTML5 feature has been implemented everywhere, browser vendors, like Mozilla, will use Vendor Prefixes to ensure that you use the correct version of the code for that browser. This is currently the case with Web Audio and some other browser APIs. The first two function definitions in our code provide a way to deal with this variation for the requestAnimFrame and AudioContext definitions. They are a bit of a hack but fortunately we just need to include them and not worry about them.

The Global Variables are used in the various callback functions that follow. It would be neater to create a single global audio object that has all of these as attributes, but I want to keep the code as simple as possible here.

The $(document).ready(function() {...} block is the standard jQuery container for code that runs once the page has loaded.


Canvas graphics functions are all performed on a graphics Context which you can think of as a container for all the objects and functions associated with a given canvas element on your page. The first statement within the .ready function gets this context from the element that we defined in the HTML.

Web Audio also uses an AudioContext as a container for all its associated objects. So next we try to create a new audo context. If your browser does not support Web Audio then this will fail and the user will receive an Error alert. If it suceeds then we have the audioContext object to work with.

The Web Audio API is built around the concept of Nodes that are linked together to form a sort of workflow. For example, a SourceNode might contain the audio the we want to play. This is linked to an Analyser node, which extracts the data for our visualization, and this in turn is linked to a Destination node which outputs the audio to our speakers. The terminology can be a bit confusing when your starting but the concept is pretty straightforward.

We want the real processing to start when we click the Start button so we create a .on('click'... event handler and do the rest of the set up within this function.


setupAudioNodes creates three Nodes and connects these to the built-in Destination node to create the following network:

Image 5 for this tutorial

We need to have a way to store the audio clip in our context. This is our SourceNode and we create it with createBufferSource.

The Analyser node is what processes batches of audio samples from the Source Node and the JavascriptNode takes the output if that analysis and makes it available to our JavaScript code outside of the AudioContext.

You will see the terms ScriptProcessor and JavascriptNode used in different code examples and tutorials elsewhwere. These should be interchangeable but the former seems to be preferred. The two names are the result of seperate development efforts at different browser vendors.

We need an array for the data produced by the analysis. Regular JavaScript arrays lack the performance needed for the complex calculations involved here. So we use a Typed Array and specifically we use a Uint8Array which can contain only Unsigned 8-bit Integers, which can have values between 0 and 255. Typed arrays behave similar to regular arrays in some ways but they lack some of their available functions, such as sort.

We define our array to have the length of the analyser.frequencyBinCount, which happens to be 512.

Next we connect our nodes together into the network. We connect the Source directly to the Destination - this is what allows us to hear the audio through our output device (speakers or headphones).

In addition we connect the Source to the Analyser node so it can collect batches of samples from the PCM stream and process these. In order for use access the results of that within our JavaScript code, and use them to draw graphics, we connect the analyser to the JavascriptNode and finally we connect that to the Destination.

You can build far more complex networks of audio nodes, especially if you are generating audio directly and/or applying filters to the audio.


Back in our Start button event handler, we now define the javascriptNode.onaudioprocess Callback function. This is called whenever the analyserNode node tells the javascriptNode that a new batch of samples have been processed. The javascriptNode now fetches the time domain data from the Analyser with the getByteTimeDomainData call, placing it into the amplitudeArray typed array.

Now we have the data that we can use for visualization. If the audio is currently playing then we call our drawTimeDomain function which handles the graphics. We call this with requestAnimFrame which helps ensure smooth graphics animations.

We still need to load the audio from the file and we need to play it back through the speakers. Our functions playSound and loadSound do this.

loadSound is called only once and fetches the encoded audio data with an Ajax XMLHttpRequest to a URL that can return the file. When that call suceeds it calls the request.onload function which in turn calls audioContext.decodeAudioData to decode the audio file and place the data into our global variable audioData. At that point it can now run playSound.

Two important things to note here - the audio URL must be on the same server that has served this web page. If not you get a Cross Origin Resource (CORS) as you are trying to violate the Same Origin Policy, which avoids some big potential security holes.
 
Because the request to get the audio uses Ajax, it is Asynchronous, meaning that loadSound return immediately and before the audio has actually been downloaded and decoded. So you cannot call playSound immediately - the data is not yet there. You can call it in the onload callback as when this is called the data will be there.

playSound links our audioData object to the SourceBuffer of our SourceNode. start(0) starts playing the audio immediately, feeding the data to both the destination so we can hear it, and at the same time to the Analyser so we can process it. Setting loop to true means that it will keep playing.

At this point we have defined what will happen when we click the Start button. The Stop button event handler simply calls stop(0) on the SourceNode to stop it feeding audio data into the network. It also set the variable audioPlaying to false, which we use elsewhere to stop drawing the graphics.


Are you still with me? There are a lot of moving parts and the Web Audio terminology can be a bit confusing to begin with.

You will have noticed that we have not actually drawn any graphics yet. We have set up to call drawTimeDomain whenever there is a batch of data to be visualized, just not described what it does.

We are working with real-time Time Domain data, effectively the Amplitude of the audio signal. In this example, we want to draw a waveform in the canvas that reflects the amplitude changes with this batch of samples. Whenever a new batch comes through we want clear the canvas and draw the new waveform.

drawTimeDomain first calls clearRect to remove anything that we have drawn. The underlying canvas element has a black background, so this is what we would see if we did nothing else.

Next we go through each of the 512 elements in the amplitudeArray typed array. We can use the regular for loop with this type of array. The values range between 0 and 255, so we can convert then into an X,Y coordinate pair within the canvas and we then draw a 1x1 pixel rectangle, using a fillStyle set to be white.


So, in summary, when we click the Start button the first time, we set up the audio nodes, we fetch and decode the audio file and start playing it. Every 1024 audio samples, the Analyser node triggers the Javascript node, which calls our graphics function that renders it.

Image 1 for this tutorial

The result is the rapidly changing waveform that you see in the demo. In practice, because of human persistence of vision, you will see multiple waveforms superimposed on each other, but when you click the Stop button there will be only one.

This type of visualization is a good way to show the 'energy' in an audio track. It is impressive but the rapidly changing graphic doesn't give us much information about the 'structure' of the audio and it is difficult to keep track of how it has changed over time, such as over a few seconds. The other visualizations that I will cover, offer different approaches.

Thanks for sticking with this tutorial - it covers a lot of ground so I hope that you have been able to follow it, and that you have a better idea of how web audio works. Please check out the other tutorials.


More information

Here are some other useful guides, refererences, etc. on Web Audio:


Code for this Tutorial


Share this tutorial


Related Tutorials

6 : Play an Audio file using JavaScript   (Intermediate)

31 : Visualizing Audio #2 Frequency Domain   (Advanced)

32 : Visualizing Audio #3 Time Domain Summary   (Advanced)

33 : Visualizing Audio #4 Frequency Spectrogram   (Advanced)


Comment on this Tutorial


comments powered by Disqus