Build an Audio Waveform Display



Build an Audio Waveform Display

With a little understanding of audio data formats, you can easily build a basic graphical audio display.

Representing audio visually is extremely useful. You can use waveform displays to quickly tell audio files apart, like a file thumbnail, or for non-linear editing, such as deleting parts of the file and processing.

Figure shows a waveform displayed in Audacity, a free, open source audio editing application. This hack shows you how to build a basic waveform display from raw audio data.

Audacity with an audio waveform displayed


The end result of this hack is displayed in Figure. You'll start by reading in the entire audio file using an AudioInputStream. Then you'll convert the raw data from the stream into useful audio samples, organized by channel. With the converted channel audio data, you'll create a single waveform panel. Then you'll wrap up the complete audio display by combining several waveform panels to display multi-channel audio.

The waveform display you'll build in this hack


Some Basic Definitions

You'll need to know a few basic terms and concepts about audio before you get started.


Sample

One measurement of audio data. For Pulse Code Modulated (PCM) encoding, a sample is an instantaneous representation of the voltage of the analog audio. There are other types of encoding, like m-law and a-law, that are rarely used.


Sampling Rate

The number of samples in one second. Measured in Hertz (Hz) or kilo-Hertz (kHz). The most common sampling rate is 44.1 kHz (CD quality audio). Often, you'll find 22.05 kHz or 11.025 kHz on the Web, since the files are smaller and the conversion is easier.


Sample Size

The number of bits in one sample. It is typically a multiple of eight because data is stored in 8-bit bytes. The most common sample size is 16 bits, which is CD quality audio. Often you'll find 8-bit audio because the files are smaller. You'll rarely find anything less then 8-bit audio because the quality is pretty poor. Sample size is sometimes called bit depth.


Channel

A channel is an independent stream of audio. Stereo is the most common form of multi-channel audioone independent left and right channel. Higher-end audio formats include 5.1 surround sound (actually six channels) and up.


Frame

A frame is a cross section of samples across all channels in the audio file. So, a 16-bit stereo (two channel) audio file will have 32-bit frames (16 bits per sample * 2 channels per frame = 32 bits per frame).

Load the Raw Data

Java reads raw audio data in 8-bit bytes, but most audio has a higher sample size. So, in order to represent the audio, you'll have to combine multiple bytes to create samples in the audio format. But first, you'll need to load all of the audio into a buffer before you combine the bytes into samples.

Start by getting an audio stream from a file:

	File file = new File(filename);
	AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file);

Now that you have the AudioInputStream, you can read in the audio data. AudioInputStream has a read() method that takes an unpopulated byte[] and reads in data the length of the byte[]. To read in the entire audio file in one shot, create a byte[] the length of the entire audio file. The complete length of the file in bytes is:

	total number of bytes = bytes per frame * total number of frames

You can get the number of frames for the whole file (frameLength) and the size of the frame (frameSize) from the AudioInputStream:

	int frameLength = (int) audioInputStream.getFrameLength();
	int frameSize = (int) audioInputStream.getFormat().getFrameSize();

You can create the byte[] with the length set to frameLength*frameSize:

	byte[] bytes = new byte[frameLength * frameSize];

Finally, you can read in the audio, passing the AudioInputStream the empty byte[] and catching the appropriate exceptions:

	int result = 0;
	try {
		result = audioInputStream.read(bytes);
	} catch (Exception e) {
		e.printStackTrace();
	}

Convert to Samples and Channels

The raw audio data isn't very useful. It needs to be broken up into channels and samples. From there, it's easy to paint the samples.

The bytes will be converted to samples and represented as ints. You'll need a container to store the samples across all channels. So, create a two dimensional int[][] referencing the channel and samples per channel. You've already seen how to get the frame length from the AuduioInputStream, and you can get the number of channels the same way. Here is the code to initialize the int[][]:

int numChannels = audioInputStream.getFormat().getChannels();
int frameLength = (int) audioInputStream.getFrameLength();
int[][] toReturn = new int[numChannels][frameLength];

Now, you need to iterate through the byte[], convert the bytes to samples, and place the sample in the appropriate channel in the int[][]. The byte[] is organized by frames, meaning that you'll read in a sample for every channel rather than all of the samples for a specific channel in a row. So, the flow is to loop through the channels and add samples until the byte[] has been iterated completely:

int sampleIndex = 0;

for (int t = 0; t < eightBitByteArray.length;) {
	for (int channel = 0; channel < numChannels; channel++) {
		int low = (int) eightBitByteArray[t];
		t++;
		int high = (int) eightBitByteArray[t];
		t++;
		int sample = getSixteenBitSample(high, low);
		toReturn[channel][sampleIndex] = sample;
	}
	sampleIndex++;
}

This hack is going to deal exclusively with 16-bit samples. They are by far the most common. Plus, you can get an idea for how sample conversion works while still keeping things pretty straightforward. This code gets much trickier with multiple dynamic sample sizes.


Now for the getSixteenBitSample() method. You can't simply add the bytes together using regular addition because the bits are displacedin a 16-bit sample the high byte represents bits 0 through 7, and the low byte represents bits 8 through 15. It's more like concatenation, so the type of math shown here won't work:

	1010 1101 (high byte)
  + 0011 0010 (low byte)
    1101 1111

What you want is more like this:

	1010 1101		   (high byte)
   +		 0011 0010 (low byte)
   ---------------------
   1010 1101 0011 0010

And in order to get this to work with standard addition, you need to add two 16-bit bytes with bits shifted and placeholder 0s added where necessary. Then you get something like this:

	1010 1101 0000 0000 (high byte)
  + 0000 0000 0011 0010 (low byte)
  ---------------------
    1010 1101 0011 0010

The high byte needs to be bit shifted. Bit shifting, the process of sliding bits around, is typically a big no-no in Javaas a result, you've probably never seen the bit-shifting operator before (it's << or >> depending on the direction followed by the number of bits to shift in either direction). However, here it is necessary to use bit shifting, so you will bit shift the high byte 8 bits to the left:

high << 8 

Now, you need to prepend the leading 0s onto the low byte. You can do this using the bit AND operator and using a 16-bit byte consisting of all 0s. It works like this:

	0000 0000 0000 0000 (all 0's bytes)
   +		  0011 0010 (low byte)
   ---------------------
    0000 0000 0011 0010

Here is the code for the sample conversion:

	private int getSixteenBitSample(int high, int low) {
		return (high << 8) + (low & 0x00ff);
	}

Creating a Single Waveform Display

Now that you have the audio sample data organized by channels, it's time to get to painting. To keep everything modular, create a class called SingleWaveformPanel to paint one channel of audio data. In the next section, you'll write a WaveformPanelContainer to use multiple SingleWaveformPanels to handle multi-channel audio.

The waveform painting is going to be drawn by plotting points scaled to the sample data and drawing lines between them. This is simplistic, but it yields good results. Figures 10-4 and 10-5 show the same waveform in Audacity and the simulator for this hack; they're pretty close.

I'm going to gloss over the scaling code because I really want to concentrate on the conversion from audio information to visualization. But to understand why scaling is necessary, remember that CD quality audio has 44,100 samples per second. So, without scaling, you would need 44,100 horizontal pixels for every second of your audio file. Obviously, this is impractical. So, if you dig into the source code for this hack, you can see the scaling and how the scales are determined. Meanwhile, just assume that the waveform is always scaled to fit in the panel.

Start by drawing the center line at 0:

	g.setColor(REFERENCE_LINE_COLOR);
	g.drawLine(0, lineHeight, (int)getWidth(), lineHeight); 

Next, mark the origin to start drawing at 0,0:

	int oldX = 0;
	int oldY = (int) (getHeight() / 2);
	int xIndex = 0;

Now, you need to figure out the incremental jump between samples to adjust for the scale factor. This works out to be:

	number of samples / (number of samples * horizontal scale factor)

The following code grabs the increment and paints a line from the origin to the first sample:

	int increment = getIncrement()
	g.setColor(WAVEFORM_COLOR);

	int t = 0;

	for (t = 0; t < increment; t += increment) {
		g.drawLine(oldX, oldY, xIndex, oldY);
		xIndex++;
		oldX = xIndex;
	}

Finish up by iterating through the audio and drawing lines to the scaled samples:

	for (; t < samples.length; t += increment) {
		double scaleFactor = getYScaleFactor();
		double scaledSample = samples[t] * scaleFactor;
		int y = (int) ((getHeight() / 2) - (scaledSample));
		g.drawLine(oldX, oldY, xIndex, y);

		xIndex++;
		oldX = xIndex;
		oldY = y;
		}
	}

Create a Container

Now that you have the waveform painting under control, you need to create a container called WaveformPanelContainer for SingleWaveformPanels in order to show multi-channel audio. Figure shows the waveform in the simulator.

Multi-channel (stereo) audio in the simulator for this hack


Figure is the complete code for the WaveformPanelContainer. AudioInfo is a helper class that contains references to the loaded audio samples and the current channel.

Testing out the waveform display
public class WaveformPanelContainer extends JPanel {    
	private ArrayList singleChannelWaveformPanels = new ArrayList();
	private AudioInfo audioInfo = null;
   
	public WaveformPanelContainer() {
		setLayout(new GridLayout(0,1));
	}

	public void setAudioToDisplay(AudioInputStream audioInputStream){
		singleChannelWaveformPanels = new ArrayList();
		audioInfo = new AudioInfo(audioInputStream);
		for (int t=0; t<audioInfo.getNumberOfChannels(); t++){
			SingleWaveformPanel waveformPanel
				= new SingleWaveformPanel(audioInfo, t);
			singleChannelWaveformPanels.add(waveformPanel);
			add(createChannelDisplay(waveformPanel, t));
		}
	}
	private JComponent createChannelDisplay(
			SingleWaveformPanel waveformPanel,
			int index) {

       JPanel panel = new JPanel(new BorderLayout());
	   panel.add(waveformPanel, BorderLayout.CENTER);

	   JLabel label = new JLabel("Channel " + ++index);
	   panel.add(label, BorderLayout.NORTH);

	   return panel; 
	} 
}

Seeing Is Believing

Now, you're ready to run the hack. The main() method shown here is the simulator code. Notice the creation of the AudioInputStream and the creation of the container with the stream. All painting and management of SingleWaveformPanels is encapsulated within the separate panel classes:

	public static void main(String[] args) {
		try {

			JFrame frame = new JFrame("Waveform Display Simulator"); 
			frame.setBounds(200,200, 500, 350);

			File file = new File(args[0]); 
			AudioInputStream audioInputStream 
				= AudioSystem.getAudioInputStream(file);
        
			WaveformPanelContainer container = new WaveformPanelContainer(); 
			container.setAudioToDisplay(audioInputStream);
       
			frame.getContentPane().setLayout(new BorderLayout());		
			frame.getContentPane().add(container, BorderLayout.CENTER);
		
			frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
       
			frame.show();
			frame.validate();
			frame.repaint();

		} catch (Exception e){
			e.printStackTrace();
		}
	}

Then, just make sure to pass the audio filename in at the command line. Use something like this:

java WaveformDisplaySimulator chord.wav

This hack shows you how to do all of the sample conversion and painting you need to display a waveform very simplistically. However, you should address a few key issues before using this in an audio application; for example, this only deals with 16-bit audio. You probably would want to build something a little more generic to deal with other sample sizes. You may also want to deal with compression, so you can display waveforms for MP3 files. That said, this hack still gives you a good idea of how to dig into raw audio data and get your audio visualization on.


Jonathan Simon

    exemuel
    exemuel 3 months ago #
    How to make the vertical scale? Trims