Just like images, we can extract features that can be used to get a higher-level understanding of the audio. There are some features that have become de-facto in audio processing, and one of these is the Mel-Frequency Cepstrum Coefficients (MFCCs). They give an overview of the shape (or envelope) of the frequency components of the audio based on some perceptual scaling of the frequency domain.
OpenIMAJ provides an MFCC class based around the jAudio
implementation. Unsurprisingly, it's called MFCC
! We can use it in exactly the
same way as the FFT processor, so if you take the code from FFT example in the previous chapter
you can change the FFT processor to be the MFCC processor.
MFCC mfcc = new MFCC( xa ); ... while( (sc = mfcc.nextSampleChunk()) != null ) { double[][] mfccs = mfcc.getLastGeneratedFeature(); vis.setData( mfccs[0] ); }
MFCCs were specifically developed for speech recognition tasks and so are much
more suitable for describing speech signals than a sine wave sweep. So, let's
switch to using the JavaSoundAudioGrabber
so we can speak into the
computer. Secondly, we'll fix the analysis window that we're using. The literature
shows that 30ms windows with 10ms overlaps are often used in speech processing.
At 44.1KHz, 10ms is 441 samples, so we'll use the FixedSizeSampleAudioProcessor
to deal with giving us the appropriate size sample chunks.
JavaSoundAudioGrabber jsag = new JavaSoundAudioGrabber( new AudioFormat( 16, 44.1, 1 ) ); FixedSizeSampleAudioProcessor fssap = new FixedSizeSampleAudioProcessor( jsag, 441*3, 441 );
You should see that when you speak into the computer, the MFCCs show a noticeable change compared to the sine wave sweep.
Tip | |
---|---|
You might want to try fixing the axis of the visualisation bar graph using
the method |