Building an USB audio interface

Get Firefox!
This is more of a proof of technology project than a useful project. The general objective is to create a fundation where future audio project can be build upon. There are several parts to this project that can be useful for digital audio processing.

The core of the project is to send an PCM audio stream down the USB link to the Nexys development board and send the data through a parallel to serial converter which generate the serial audio stream that most DAC designed for audio can utilize.

The project started out fairly simple. It consists of the USB to Wishbone bus interface block I like to use for most of my projects and an interface block to reformat the data to the serial stream the DAC is designed to work with. The DAC I selected for the project is TI PCM1742. With a little of work, I expect the code can be adapted to work with other type of DAC on the market.

As the project progresses, the need for a FIFO becomes apparent. The USB interface on the Nexys board can not guarantee the time of arrival of the data packet, as it is using a bulk endpoint instead of a Isochronous endpoint. As a result, the delay in data packet causes the timing of the audio sample to arrive with a random delay, and the sound is pretty bad.

At first, a simple FIFO using the onchip BRAM is used. The fifo is configured as 4096x32. The data path is set to 32 bits so the left and right channel data can be stored in the same location. This dramatically improved the sound quality. However, there are still times when the sound stops for a fraction of second when the PC is busy. This has more to do with the poor USB driver Digilent supplies. Not only it takes 100% of the CPU cycles when running, the program also failed to fill the USB pipe when there are other program running. The board runs much better on Linux machine with the driver only taking up 2~3% of the CPU cycles.

To further improve the audio quality, I decided to put the onboard PSRAM into good use. With the PSRAM, it is possible to implement a 16MB FIFO, which should hold over a minute of audio data. This way, the USB latency should no longer be an issue. This is leads to the introduction of the psram FIFO block to the project. The implementation of this part was left in the root of the design for this project. However, this function has also been pull out into its own module for later reuse. The link for the separate module can be found on the front page of my web site.

The last touch up to the whole project was a audio sample-rate scaler. This is introduced due to the clock available on the NEXYS2 development board. The onboard crystal is 50MHz, if 1024x oversampling is used for the generation of the DAC bit stream, the sample rate is 48.8ks/s. This is close enough to the 48ks/s winamp is capable of generating with the raw audio stream writer. Which is what I have been using to convert the source audio stream into the raw data format to feed to the project. However, for most digital audio device, 44.1ks/s is a much more common sampling rate, including the CD player. Thus, it would be nice to have a way to convert the sampling rate from the incoming 44.1ks/s to the 48.8ks/s the DAC is running at.

For this, I first implemented a quick and dirty sample rate converter. The converter basically keeps a counter counting the number of sample it has transfered from the PSRAM FIFO to the BRAM FIFO. For every 10 samples read from the PSRAM FIFO, 11 samples are written to the BRAM FIFO. The data in the 10th sample is replicated and write into the BRAM twice. With this, the new sampling rate is calculated as: (44.1ks/s / 10) * 11 = 48.51ks/s which is fairly close to the actual sampling rate of the DAC at 48.8ks/s.

This quick and dirty technique solved the audio pitch too high problem (Or you can call it the music track too short problem.) However, it introduce another problem. Although it is hard to hear, but the technique adds a slight metallic feel to the music, and human voice range (1~4kHz) seems to be the most affected. Well, this is sure not the best way to scale the sampling rate of an audio stream is it.

Going back to my first DSP book, we know the best way to scale the sampling rate is to multiply each audio sample by a sinc(x) function than extract the data points between each original sample pair. That sounds all nice and good except it is rather difficult to implement a sinc(x) function on an FPGA... It would require an floating point DSP core to be constructed first, which I don't feel like doing that just right now. My lazy solution is to use linear interpolation. This is easy enough to do on a FPGA using fix point math.

Lets start to discuss the finer points of the project. For those of you who rather download the source code and start playing with it, skip to the end of the page.


Quick overview


PCM1742 DAC

There is no particular reason I selected TI's PCM1742 DAC for this project. I just happen to have bought a batch of these tiny chips from Digikey several years ago and never got a good chance to put them into good use. The biggest downside of this chip is the smallness of the package, making it rather hard to work with without having an adaptor PCB made specially for it.

I elected to use thin magnetic wire to solder directly to the lead of the chip. It does take several hours of working under microscope to get this put together. For people who don't want to work with this tiny package, I suggest you try TDA1387 or CS4334 DAC that is in SO8 package and is much easier to work with.

The output of the PCM1742 chip is buffered by a pair of MCP802 OPAMP to protect the DAC from having to connect directly to the AMP. The OPAMP is powered by the 5V rail. As a future improvement, 100ohm resistor should be added to the output of the OPAMP, as the OPAMP does drive too much current into the output.


Sample Rate Scaler

The sample rate scaler is defintely one of the most complicated block in this project. For this particular project, the objective is to convert the incoming sample at 44.1kS/s to 48.8kS/s. However, the concept and the functional block used in this scaler design should be able to adapted into scaling sample to different sampling frequency. One possible use of this block is to scale the incoming sample to different frequency while keeping the sampling rate the same.

The basic concept of the scaler is fairly simple. Take 10 samples from the incoming data stream, and generate 11 samples for the outgoing stream. If the timing of the outgoing sampling point lands between two incoming sampling point, use linear interpolation to estimate the outgoing sampling value. This concept is illustrated in the figure below:

The basic technique is to create memory locations to store 110 samples. The samples from the incoming data stream will be placed in location 0, 11, 22, 33 ... 110(0). Note location 110 is wrapped back to location 0. After interpolation, the output sample is taken from location 0, 10, 20, 30 ... 110(0). Doing the computation this way would require quite large amount of logic used as memory element. This is rather undesirable.

However, there is a short cut to make the same computation without having to use all the memory. The key is realize that each output sample is computed from two consective input samples. The first computation require sample at location 0 and 11, extracted sample is at location 10. The second computation require sample from 11 and 22, extracted sample is at location 20. Looking at the problem this way, the calculation can be simplified to the following equation:

Where Y1 is the current sample and Y2 is the previous sample.
n is the position where sample should be extracted.

For the first sample, n = 11, than 10, 9, 8, and so on. The only exception to this is the last sample where the input sample is at location 99 and 110 while two output sample are taken from location 100 and 110.

This is all nice an eazy, except the requirement of dividing the incoming sample by 1/11. This is a rather interesting problem as the Spartan FPGA (and most commercially available FPGAs) has embedded multiplication block. However, there are no division block. The reason for this is... division is hard. Think back to elemetary school days. Addition and substraction is fairly straight forward. Multiplication is simply addition done number of times. With a bit of help from multiplication table, it can be done easily. However, there is no simple say to do division. The reason is division reqire one to predict the quotient (or part of the quotient if long division is used). Multiplication is computed between predicted quotient and divisor and result is compaired to the dividend. If the result is too high or too low, the quotient is adjusted until error is acceptable. This is why computing division digitally is slow. The simplest method is to start counting quotient from 1. Keep looping until the product between quotient and divisor is larger than divident. Compute this on a 16 bit number will take 65536 cycles, not exactly fast.

Fortunally, there is another way to get around this problem. In the digital world, is is fairly easy to compute a division of 2^n. It is just a matter of shifting the bit to the right. In FPGA, shifting bit to the right cost nothing (it is absolutely free). Thus, it is possible to expend 1/11 to a different fraction where the denominator is 2^n. Well, there is a simple fraction that will do the trick: 5958/65536. (Well, not exactly -- 1/11 = 0.09090909.... while 5958/65536 = 0.9091186.... However, the error is less the least significant bit, which is good enough in digital world)

With this in mind, the new 1/11 has now become taking the incoming data, multiply it by 5958 than shift the output to the right by 16 bits (2^16 = 65536). Using the multiplication block build into the Spartan FPGA, this can easily be done, and done pretty fast too.

With a bit of pipelineing, this is what was implemented in the scaler.v block:

	reg signed [16:0] prescale_const;
	wire signed [15:0] mul_in;
	reg  signed [31:0] mul_out_int;

	assign mul_in = diff;
	assign mul_out = mul_out_int >>> 16;

	always @ (posedge clk)
	begin
		prescale_const <= 17'd5958 * mul_scale;
		mul_out_int <= (mul_in * prescale_const);
	end
Notice the shift to the right is done in an assigne statement to ensure the shift is done simply by dropping the last 16 significant bits.
Here is the source code:
One big zip file:pcm1742.zip
View the individual file:
FIFO.v - BRAM FIFO
LED_7seg.v - 7 Segment LED driver
pcm1742.v - Project root!
scaler.v - Sampling rate scaler
USBinf.v - USB interface

Page created with VIM
Page last updated: November, 2008
Email: rihuang ([at]) gmail (*dot*) com
This page is best viewed with Mozilla Firefox