Numerical Analysis Project Notes

This is a work in progress wiki which will be (hopefully) rapidly updated as we get feedback. The project’s main goal is to take a piece of music and turn it into a set of notes. The actual music used will consist of i) the first 3.6 seconds of “Spirit of Radio” by Rush which consists of a single guitar playing a set of notes (and not chords which can comprise multiple notes) and ii) music of your choice. (If you find it difficult to pick your own music, please check out Daniel Seeman's website for good music samples.)

The opening of “Spirit of Radio” has been ripped as a wav file and is available here. As previously mentioned, it runs about 3.6 seconds (after you discount the first 0.10 seconds which are silent). For the rest of this wiki, we’re going to assume that the guitarist—Alex Lifeson—is playing 24 notes in total which gives us an approximate value of 0.15 seconds per note. This may not in fact be correct but we’ll run with it for now.

While it is not really necessary for you to know music theory for this project, a little bit may be helpful. If you want to know more about sharps and flats in basic music theory, please see the cnx website for more. Please focus on getting somewhat familiar with the names of the 12 notes in an octave in Western music. For further reading regarding whole, half and quarter notes which will help you break up your chosen music piece into small chunks, please see the cnx writeup here.

During the course of this project, we’re going to compare our results with those obtained from Audacity 1.3.x (Beta)—a popular open source, cross platform audio editor and recorder. When you import Spirit.wav into Audacity, you’ll see the waveform corresponding to the first 3.6 seconds (+0.10 seconds of silence at the beginning giving us a total of about 3.7 seconds).


PIC

Figure 1: Spirit of Radio opening in Audacity


What you also see in Figure 1 is the cursor placed at the 0.25 second mark and this is going to be significant. When we select the first 0.25 seconds of “Spirit of Radio” (which includes the 0.10 seconds of silence at the beginning) and plot the spectrum using Audacity’s built in tools, we get


PIC

Figure 2: Spectrum corresponding to the first 0.25 seconds using Audacity


Note the cursor placed at 2962Hz with Audacity indicating a local maximum of the spectrum at 2986Hz corresponding to the note F#7. This will be quite important in your project.

In the simple case considered here, notes correspond to frequencies. In the Western music system that we are using which is an equal temperament system, the correspondence between notes and frequencies is as seen here. You can see the relationship between F# and the corresponding frequency by examining the table. Of particular importance is that the main note may not in fact correspond to the note that gets the greatest response in the spectrum. This is the case with F#7 here which is in fact the third note in terms of strength. However, when you analyze Audacity’s spectrum, you also see hits corresponding to F#4, F#5 and F#6 in addition to F#7 on which the cursor rests. Unfortunately, you’ll need music knowledge to figure out what note is dominant (since dominance shows up in the harmonics and not just the fundamental). If you’re a good singer, you can of course sing the notes and use a metronome which has a note recognition feature (and most of the good ones do) to figure out the “actual” notes played. This is not required in our project. We’re merely seeking a comparison with Audacity.

Next, let’s load the file in MATLAB○R and run the FFT algorithm to produce an output similar to that of Audacity. When the wav file is loaded, we get a total of 162467 stereo samples from which we cull 11025 mono samples roughly corresponding to the first 0.25 seconds (including 0.10 seconds of silence at the beginning). Since wav files are sampled at 44,100 times a second, we will use an FFT which is of length 44,100. This gives us an exact correspondence between the frequency and the sample index making it easy to “read off” the frequency from a plot. Since the magnitude of the FFT will be symmetric, the FFT output will actually vary from -22050 to +22050Hz and we will only use the positive frequencies. We will also only plot the FFT for the first 5000Hz since most of the meaningful information occurs in this range. A short diary of this procedure is give below.

>̛> y=wavread(’Spirit.wav’); % This loads the wav file into matlab and assigns it a variable 'y'
>̛> size(y)
 
ans =
 
162467 2
>̛> z=y(1:11025,1); % This picks the first 0.25 seconds of 'y' and assigns it to 'z'
>̛> zfft=fft(z,44100); % We run the Fast Fourier Transform (FFT) of 'z' using an FFT length of 44,100
>̛> abszfft=abs(zfft); % The FFT is complex. We compute the absolute value and assign it to 'abszfft'
>̛> abszfft=abszfft(1:22050); % The frequency domain is symmetric. We pick the positive frequencies
>̛> plot(abszfft(1:5000)) % The spectrum is plotted up to a frequency of 5000Hz

When we plot the absolute value of the FFT in the range from 0-5000Hz, we get


PIC

Figure 3: MATLAB○R plot of the spectrum corresponding to the first 0.25 seconds.


Note the local maximum around 3000Hz which is equivalent to the F#7 note that we obtained via Audacity. Do you see other similarities between the two plots? You will be expected to compare the two results on both of the music files. This basic approach should be extended for the remaining 3.45 seconds of “Spirit of Radio.” Take 0.15 second chunks, analyze them in both Audacity and MATLAB○R and use the note-frequency lookup table to determine the notes corresponding to the local peaks in the spectrum. Do this for your own piece of music as well. The milestones for this project are:

Milestone 1:
“Due” April 10th, 2009. Read both “Spirit.wav” and your own ripped wav file in MATLAB○R and in Audacity. If you need help picking your own wav file, please check out Daniel Seeman's website for some examples.  Plot the mono part of the wav file in MATLABR○ and screen capture the Audacity output.
Milestone 2:
“Due” April 17th, 2009. Break up each wav file into short segments containing one note each. For “Spirit” you may use intervals of 0.15 seconds as described above. You’ll have to figure out the intervals for your piece of music. Run the FFT algorithms on both series of intervals. Process the same intervals in Audacity and record the notes corresponding to the top few peaks.
Milestone 3:
Due at the end of the semester with an update “due” April 21st, 2009. Analyze the Audacity and MATLAB○R FFT outputs and produce a sequence of notes from both. Compare the two sets of notes and comment on the harmonics which make it difficult to assign a winner in each interval. You may present not just one solution but multiple ones by picking the second, third winner in each interval etc.
Extra Credit:
Coming soon. Turning the set of notes into a MIDI which can be played on a MIDI sequencer such as Rosegarden. One way to do this is to learn just enough of Lilypond and use its MIDI output. We’ll be providing a Lilypond to MIDI solution for “Spirit of Radio” which you may be able to adapt.