The opening of “Spirit of Radio” has been ripped as a wav file and is available here. As previously mentioned, it runs about 3.6 seconds (after you discount the first 0.10 seconds which are silent). For the rest of this wiki, we’re going to assume that the guitarist—Alex Lifeson—is playing 24 notes in total which gives us an approximate value of 0.15 seconds per note. This may not in fact be correct but we’ll run with it for now.
While it is not really necessary for you to know music theory for this project, a little bit may be helpful. If you want to know more about sharps and flats in basic music theory, please see the cnx website for more. Please focus on getting somewhat familiar with the names of the 12 notes in an octave in Western music. For further reading regarding whole, half and quarter notes which will help you break up your chosen music piece into small chunks, please see the cnx writeup here.
During the course of this project, we’re going to compare our results with those obtained from Audacity 1.3.x (Beta)—a popular open source, cross platform audio editor and recorder. When you import Spirit.wav into Audacity, you’ll see the waveform corresponding to the first 3.6 seconds (+0.10 seconds of silence at the beginning giving us a total of about 3.7 seconds).
What you also see in Figure 1 is the cursor placed at the 0.25 second mark and this is going to be significant. When we select the first 0.25 seconds of “Spirit of Radio” (which includes the 0.10 seconds of silence at the beginning) and plot the spectrum using Audacity’s built in tools, we get
Note the cursor placed at 2962Hz with Audacity indicating a local maximum of the spectrum at 2986Hz corresponding to the note F#7. This will be quite important in your project.
In the simple case considered here, notes correspond to frequencies. In the Western music system that we are using which is an equal temperament system, the correspondence between notes and frequencies is as seen here. You can see the relationship between F# and the corresponding frequency by examining the table. Of particular importance is that the main note may not in fact correspond to the note that gets the greatest response in the spectrum. This is the case with F#7 here which is in fact the third note in terms of strength. However, when you analyze Audacity’s spectrum, you also see hits corresponding to F#4, F#5 and F#6 in addition to F#7 on which the cursor rests. Unfortunately, you’ll need music knowledge to figure out what note is dominant (since dominance shows up in the harmonics and not just the fundamental). If you’re a good singer, you can of course sing the notes and use a metronome which has a note recognition feature (and most of the good ones do) to figure out the “actual” notes played. This is not required in our project. We’re merely seeking a comparison with Audacity.
Next, let’s load the file in MATLAB
and run the FFT algorithm to produce an
output similar to that of Audacity. When the wav file is loaded, we get a total of
162467 stereo samples from which we cull 11025 mono samples roughly corresponding
to the first 0.25 seconds (including 0.10 seconds of silence at the beginning). Since
wav files are sampled at 44,100 times a second, we will use an FFT which is of length
44,100. This gives us an exact correspondence between the frequency and the
sample index making it easy to “read off” the frequency from a plot. Since the
magnitude of the FFT will be symmetric, the FFT output will actually vary from
-22050 to +22050Hz and we will only use the positive frequencies. We will
also only plot the FFT for the first 5000Hz since most of the meaningful
information occurs in this range. A short diary of this procedure is give
below.
When we plot the absolute value of the FFT in the range from 0-5000Hz, we get
Note the local maximum around 3000Hz which is equivalent to the F#7 note that
we obtained via Audacity. Do you see other similarities between the two plots? You
will be expected to compare the two results on both of the music files. This basic
approach should be extended for the remaining 3.45 seconds of “Spirit of Radio.”
Take 0.15 second chunks, analyze them in both Audacity and MATLAB
and use
the note-frequency lookup table to
determine the notes corresponding to the local peaks in the spectrum. Do
this for your own piece of music as well. The milestones for this project
are:
and in Audacity. If you need help picking your own wav file, please check out Daniel Seeman's website for some examples. Plot the mono part of the
wav file in MATLAB
and screen capture the Audacity output.
FFT outputs and produce a
sequence of notes from both. Compare the two sets of notes and comment
on the harmonics which make it difficult to assign a winner in each interval.
You may present not just one solution but multiple ones by picking the
second, third winner in each interval etc.