[vox-tech] Marking Audio file based on Freq.

Bill Broadley bill at cse.ucdavis.edu
Wed Mar 12 23:30:07 PDT 2008


 >I think I'm on the right track now. What I figured out is that I can use
 >whatever application I want to find the start time's for the region of
 >interest and easily write those as a label track for audacity (see the .aup
 >file its xmlish)

Cool.  Not sure what good audacity would be doing at that point, I'm pretty
sure it would just be a line or two to start playing an audio form the .wav
file from a detection point or even from a cursor.

>So python is up on the top of the list now. I also poked around and took a
 >look at the new VAMP Plugin system which is directed towards analysis and
 >could be integrated into Audacity or Sonic Visualiser (
 >http://www.sonicvisualiser.org/ ). The other application I've tinkered with 
 >is Praat ( http://www.fon.hum.uva.nl/praat/ )which is for speech recognition 
 >but that might be more hassle than it's worth.

Interesting, I hadn't known about those.

> This linked file has:
> A sound sample

Was the sample recorded in stereo?  It makes it harder to work with if it's 
fake stereo.  I was hoping a mono .wav file or related so I could suck it into
a program easily.  Uncompressed data also makes it much easier to seek around
to the interesting parts.

> An audacity project with label track
> A spectrogram screenshot 

Normally I'd display display frequency on a log based graph.  I doubt you are
really seeing over 50db at 0hz.  Hell I don't think MP3's even encode at less
than 20 or so.  I wouldn't usually use a lossy compression designed for 
fooling the pschyoacoustic characteristics of a human ear for research related 
data, but in this case I don't think it would make much difference.  Not to 
mention there may well be ultrasonics involved, but not necessary of course 
for simple recognition.

Here's a log based graph of the same data:
  http://cse.ucdavis.edu/~bill/out.png

> A spectrogram text dump (looks like 10-60Hz is the region of interest)
> http://ftp.dfg.ca.gov/Public/RAP/Projects/GGOW/GGOWspectrum.zip

Looks like 42 hz to 900hz on the log based graph.  Hard to say if 20 hz
is real, background noise, microphone limitations, or the result of an
MP3 encoder.

> Thanks for the help, I guarantee at least one lugod talk later this year (after I finish my thesis) in exchange.

Cool, I'd like to see more science based talks.  I'm pretty sure with a .wav
file I could put together a pretty side scrolling colorful spectogram like
http://www.onlamp.com/python/2001/01/31/graphics/num_py_2.gif with a few lines 
of code.


More information about the vox-tech mailing list