[vox-tech] Marking Audio file based on Freq.
Alex Mandel
tech_dev at wildintellect.com
Thu Mar 13 11:02:39 PDT 2008
Bill Broadley wrote:
> >I think I'm on the right track now. What I figured out is that I can use
> >whatever application I want to find the start time's for the region of
> >interest and easily write those as a label track for audacity (see the .aup
> >file its xmlish)
>
> Cool. Not sure what good audacity would be doing at that point, I'm pretty
> sure it would just be a line or two to start playing an audio form the .wav
> file from a detection point or even from a cursor.
>
>> So python is up on the top of the list now. I also poked around and took a
> >look at the new VAMP Plugin system which is directed towards analysis and
> >could be integrated into Audacity or Sonic Visualiser (
> >http://www.sonicvisualiser.org/ ). The other application I've tinkered with
> >is Praat ( http://www.fon.hum.uva.nl/praat/ )which is for speech recognition
> >but that might be more hassle than it's worth.
>
> Interesting, I hadn't known about those.
>
>> This linked file has:
>> A sound sample
>
> Was the sample recorded in stereo? It makes it harder to work with if it's
> fake stereo. I was hoping a mono .wav file or related so I could suck it into
> a program easily. Uncompressed data also makes it much easier to seek around
> to the interesting parts.
>
>> An audacity project with label track
>> A spectrogram screenshot
>
> Normally I'd display display frequency on a log based graph. I doubt you are
> really seeing over 50db at 0hz. Hell I don't think MP3's even encode at less
> than 20 or so. I wouldn't usually use a lossy compression designed for
> fooling the pschyoacoustic characteristics of a human ear for research related
> data, but in this case I don't think it would make much difference. Not to
> mention there may well be ultrasonics involved, but not necessary of course
> for simple recognition.
>
> Here's a log based graph of the same data:
> http://cse.ucdavis.edu/~bill/out.png
>
>> A spectrogram text dump (looks like 10-60Hz is the region of interest)
>> http://ftp.dfg.ca.gov/Public/RAP/Projects/GGOW/GGOWspectrum.zip
>
> Looks like 42 hz to 900hz on the log based graph. Hard to say if 20 hz
> is real, background noise, microphone limitations, or the result of an
> MP3 encoder.
>
>> Thanks for the help, I guarantee at least one lugod talk later this year (after I finish my thesis) in exchange.
>
> Cool, I'd like to see more science based talks. I'm pretty sure with a .wav
> file I could put together a pretty side scrolling colorful spectogram like
> http://www.onlamp.com/python/2001/01/31/graphics/num_py_2.gif with a few lines
> of code.
Unfortunately I don't have any uncompressed samples of that species yet
or even know the real origin of that file. The devices I built are going
to deliver me flac files specifically because of that issue. (Cowon A3)
After looking at the text dump some more and a spectrogram I actually
found the base frequency of the call is more in the 150-250 hz range,
but I need to play with it this weekend and run some real descriptive
stats and try those calculations.
The key to the playback part, is that I'm not the one who will be
listening to them. So it's important that I flag the spots of interest
and then pass the info on to less technical people who will listen to
each spot, and look at the graphs. So giving them and interface that I
don't have to write is relatively important. You'll notice both Audacity
and Sonic have nice pretty color spectrogram options.
Alex
More information about the vox-tech
mailing list