Jump to content


Capture Speech with DirectSound Problem


10 replies to this topic

#1 CrazyPlaya

    New Member

  • Members
  • Pip
  • 2 posts

Posted 10 December 2008 - 09:53 AM

Hi @ all,

i´m a newbie on programming audio.
I try to capture speech from Microphone and streaming fragments of a capture buffer to network.
I send the fragments by catching the Notification Event.
My Capture Buffer is 2560b big and is seperated in 4 segments.
Every segment gets 20ms of speech. The Samplerate is 16kHz with 16 bits per sample.
Here my Code of the Notification Thread.

DWORD capturePos, readPos;
    m_captureBuffer->GetCurrentPosition(&capturePos, &readPos);
    DWORD diff;
    readPos >= m_NextOffset ?
        diff = readPos - m_NextOffset :
        diff = readPos + m_BufferSize - m_NextOffset;
        
    if(diff < m_NotifySize) return;
    
    //Lock the capture Buffer
    VOID* lockedBufferPointer = NULL;
    DWORD lockedBufferSize;
   
    m_captureBuffer->Lock(m_NextOffset, m_NotifySize, &lockedBufferPointer, &lockedBufferSize, NULL, NULL, 0L);
    //Put data to Output Sink
    const int SIZE = lockedBufferSize/2;
    //write to wave file
    write((BYTE*)lockedBufferPointer, lockedBufferSize);
   //edit the sample
    pSink->on_NewAudioDataCaptured((short*)lockedBufferPointer, SIZE);
    
    //Unlock the capture Buffer
    m_captureBuffer->Unlock(lockedBufferPointer, lockedBufferSize, NULL, 0);
       //Move the capture offset along
    m_NextOffset += lockedBufferSize;
    m_NextOffset %= m_BufferSize;

My Problem is that the speech I´m sending is very noisy after capturing.
What could I do wrong by Capturing?
I don´t know on and it´s hard to find some good information about capturing by directsound.

Greetings
Karsten

#2 CrazyPlaya

    New Member

  • Members
  • Pip
  • 2 posts

Posted 10 December 2008 - 03:19 PM

Ok i have it. The fragment was to small with 20ms. Now i have it with 60ms and it sounds good.

#3 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 10 December 2008 - 04:05 PM

ive made music programs before, split it up into sinewaves with the furior transform :)

dont worry loading a wav is heaps easier doing it yourself than using direct x products.

[EDIT] just learn off the samples, itys what i do [/EDIT]

but theres no dolby in direct sound.

#4 alphadog

    DevMaster Staff

  • Moderators
  • 1716 posts

Posted 10 December 2008 - 05:54 PM

I've sometimes used furious transforms. It never turns out well... :)

#5 imerso

    Senior Member

  • Members
  • PipPipPipPip
  • 431 posts
  • LocationBrasil

Posted 10 December 2008 - 06:15 PM

this is very easy to get 'clicky' sounds, with improper sampling and/or mixing.

in this specific case, i think the playback quality is being affected by network latency. good to see you've got a solution already.

#6 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 11 December 2008 - 11:08 AM

ive implemented it, with no phase smearing, or very little.
very good results. :)

i also coded the furious transform myself, even tho i cant even
say the name properly. hehe

im actually implementing a new wave file format that stores the oscillator
positions of every harmonic over time - so it will help with latency for
pre recorded sounds (samples), just not live inputs - they are the problem.

#7 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 11 December 2008 - 11:24 AM

i dont recommend using the fft, the dft is simpler and does a better job.

#8 Reedbeta

    DevMaster Staff

  • Administrators
  • 5307 posts
  • LocationSanta Clara, CA

Posted 11 December 2008 - 05:44 PM

Rouncer, the FFT is just a fast algorithm to compute the DFT. It gives the same results.

The FFT is more complex to implement, but you can also just use libfftw, which is very easy.
reedbeta.com - developer blog, OpenGL demos, and other projects

#9 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 12 December 2008 - 12:52 PM

maybe it does change the sound...

#10 Nils Pipenbrinck

    Senior Member

  • Members
  • PipPipPipPip
  • 597 posts

Posted 12 December 2008 - 01:57 PM

rouncer said:

maybe it does change the sound...

No, it does not. As Reedbeta said it's the same thing, just a faster algorithm. Think bubblesort and quicksort. Both algorithms compute the same result eventually, but quicksort is faster (most of the time at last).

FFT are a little tricky to use if you don't fully understand which one to pick. A FFT that is speed-optimized to give only 8 bit results but takes 16 bit inputs will sound bad due to roundoff-errors and truncating for example. These are fine for graphics but suck for audio.
My music: http://myspace.com/planetarchh <-- my music

My stuff: torus.untergrund.net <-- some diy electronic stuff and more.

#11 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 12 December 2008 - 07:12 PM

I know your probably right, but my experiments in making a phase vocoder (which turned out successful, if you remember me asking silly questions about the fft a while back) is the longer a section you pass to it the more wooshy the sound comes back out of it, and coding the dft finally myself i could actually pick off each harmonic one by one and thats what gave it to me.

but ill definitely learn the fft eventually, so i dont really know what im talking about yet.

the woosh you get out of it sounds a little like reverb, except its a bit different and it sounds real nasty and im making music with it now, its totally awesome. harmonic domain effects are the future of digital.

turning every sine wave into a square wave individually is a new kind of expensive distortion, and theres opportunity for chorusing also.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users