Generating Textures from Video Frames

corey 101 Oct 16, 2005 at 14:00

I was originally going to develop a code gem that included arbitrary video file types and webcam devices. However, I’ve limited this first piece to the concept of generating a texture from a video frame. I will leave more discussion for later.

There are several caveats for this discussion:
* This approach requires DirectShow. The DirectX SDK up to 9.0b included DirectShow but an extras package is now required for 9.0c and later.
* This approach is not exactly the most optimal. There are some memory copies and design aspects that are given up in order to keep the code simpler for this context.
* The code supplied uses some ATL wrapper templates to keep the code concise. CComPtr and CComQIPtr are reference counting aware and provide clean wrappers to COM interfaces. They release on destruction/scope-loss.
* The code supplied uses 1 G3D class for code clarity. This gem assumes texture generation knowledge is already available.
* This does not load WMV files properly. This is because the special WMV reader filter should be used. This can be added by doing a simple file extension test and replacing the source filter loading with another filter load.

I don’t want to keep this very long especially since I left out cameras just for that (and I didn’t have test equipment and time available this weekend). What I am going to show is how to load an arbitrary video file, make each frame convert to a useable 24-RGB format and then play the file. After the file is playing, I will show you an approach to grabbing the current frame.

Now, as we are dealing typically with compressed video format, there is an automatic performance issue with decompression. There is typically another performance hit when converting to 24-bit RGB before grabbing the frame. This is because most DirectShow filters like to pass data in the YUV or DX color space. Because of this, I would suggest a small video resolution or low playback rate. I did a basic test at simulated 10fps, 15fps and 25fps for playback. Any of those will do – the key is to limit the number of times per second that a frame is converted into a texture. My tests were all in G3D so I don’t want to add that here.

The class setup
This is a dummy class am creating for the gem out of my code. The benefit of a class is encapsulating the main interfaces. I did not test playing different files one after the other. Helper methods I provide are ConnectPins and FindPin.

#include <windows.h>
#include <atlcomcli.h>
#include <dshow.h>
#include <qedit.h>

class VideoWrapper {
    CComPtr<ICaptureGraphBuilder2> graphBuilder;
    CComPtr<IFilterGraph2> filterGraph;
    CComPtr<ISampleGrabber> sampleGrabber;
    bool videoInitialized;
    int videoWidth;
    int videoHeight;
    long* frameBuffer;
    long bufferSize;

    /** G3D Texture object */
    TextureRef videoTexture;

    VideoWrapper {
        // This must be callled before 
        // interfaces can be accessed
        videoInitialized = false;
        frameBuffer = NULL;

    ~VideoWrapper (
        // Technically, uninit ever init

    bool loadVideoFile(const std::wstring& filename);

    bool loadVideoCamera();

    TextureRef grabFrameTexture();

    void uninitVideo();

    bool ConnectPins(IBaseFilter* outputFilter, 
                     unsigned int outputNum,
                     IBaseFilter* inputFilter,
                     unsigned int inputNum);

    void FindPin(IBaseFilter* baseFilter,
                 PIN_DIRECTION direction,
                 int pinNumber,
                 IPin** destPin);

Video Setup and Render
This takes a std::wstring to simplify the code since DirectShow takes Unicode strings. I originally wrote some code to convert to Unicode but didn’t want to confuse the topic.

bool VideoWrapper::loadVideoFile(const std::wstring& filename) {
    if (videoInitialized) {

    // Create the main object that runs the graph



    CComPtr<IBaseFilter> sourceFilter;

    // This takes the absolute filename path and
    // Loads the appropriate file reader and splitter
    // Depending in the file type.
                                 L"Video Source",
    // Create the Sample Grabber which we will use
    // To take each frame for texture generation
    CComPtr<IBaseFilter> grabberFilter;
    grabberFilter->QueryInterface(IID_ISampleGrabber, reinterpret_cast<void**>(&sampleGrabber));

    filterGraph->AddFilter(grabberFilter, L"Sample Grabber");

    // We have to set the 24-bit RGB desire here
    // So that the proper conversion filters
    // Are added automatically.
    AM_MEDIA_TYPE desiredType;
    memset(&desiredType, 0, sizeof(desiredType));
    desiredType.majortype = MEDIATYPE_Video;
    desiredType.subtype = MEDIASUBTYPE_RGB24;
    desiredType.formattype = FORMAT_VideoInfo;


    // Use pin connection methods instead of 
    // ICaptureGraphBuilder::RenderStream because of
    // the SampleGrabber setting we're using.
    if (!ConnectPins(sourceFilter, 0, grabberFilter, 0)) {
        return false;        

    // A Null Renderer does not display the video
    // But it allows the Sample Grabber to run
    // And it will keep proper playback timing
    // Unless specified otherwise.
    CComPtr<IBaseFilter> nullRenderer;

    filterGraph->AddFilter(nullRenderer, L"Null Renderer");

    if (!ConnectPins(grabberFilter, 0, nullRenderer, 0)) {
        return false;

    // Just a little trick so that we don't have to know
    // The video resolution when calling this method.
    bool mediaConnected = false;
    AM_MEDIA_TYPE connectedType;
    if (SUCCEEDED(sampleGrabber->GetConnectedMediaType(&connectedType))) {
        if (connectedType.formattype == FORMAT_VideoInfo) {
            VIDEOINFOHEADER* infoHeader = (VIDEOINFOHEADER*)connectedType.pbFormat;
            videoWidth = infoHeader->bmiHeader.biWidth;
            videoHeight = infoHeader->bmiHeader.biHeight;
            mediaConnected = true;

    if (!mediaConnected) {
        return false;

    // Tell the whole graph to start sending video
    // Apart from making sure the source filter can load
    // This is the only failure point we care about unless
    // You need to do more extensive development and debugging.
    CComQIPtr<IMediaControl> mediaControl(filterGraph);
    if (SUCCEEDED(mediaControl->Run())) {
        videoInitialized = true;
        return true;
    } else {
        return false;

/** For a later time but probably faster displays. */
bool VideoWrapper::loadVideoCamera() {
    return false;

TextureRef VideoWrapper::grabFrameTexture() {
    if (videoInitialized) {
        // Only need to do this once
        if (!frameBuffer) {
            // The Sample Grabber requires an arbitrary buffer
            // That we only know at runtime.
            // (width * height * 3) bytes will not work.
            sampleGrabber->GetCurrentBuffer(&bufferSize, NULL);
            frameBuffer = new long[bufferSize];
        sampleGrabber->GetCurrentBuffer(&bufferSize, (long*)frameBuffer);
        // G3D Texture creation for code simplification, the format is obvious.
        return Texture::fromMemory(
            "Video Frame",
            (const uint8*)frameBuffer,

    return NULL;

void VideoWrapper::uninitVideo() {
    videoInitialized = false;

    if (videoInitialized) {
        CComQIPtr<IMediaControl> mediaControl(filterGraph);

    delete[] frameBuffer;
    frameBuffer = NULL;

bool VideoWrapper::ConnectPins(IBaseFilter* outputFilter,
                       unsigned int outputNum,
                       IBaseFilter* inputFilter,
                       unsigned int inputNum) {

    CComPtr<IPin> inputPin;
    CComPtr<IPin> outputPin;

    if (!outputFilter || !inputFilter) {
        return false;

    FindPin(outputFilter, PINDIR_OUTPUT, outputNum, &outputPin);
    FindPin(inputFilter, PINDIR_INPUT, inputNum, &inputPin);

    if (inputPin && outputPin) {
        return SUCCEEDED(filterGraph->Connect(outputPin, inputPin));
    } else {
        return false;

void VideoWrapper::FindPin(IBaseFilter* baseFilter,
                   PIN_DIRECTION direction,
                   int pinNumber,
                   IPin** destPin) {

    CComPtr<IEnumPins> enumPins;

    *destPin = NULL;

    if (SUCCEEDED(baseFilter->EnumPins(&enumPins))) {
        ULONG numFound;
        IPin* tmpPin;

        while (SUCCEEDED(enumPins->Next(1, &tmpPin, &numFound))) {
            PIN_DIRECTION pinDirection;

            if (pinDirection == direction) {
                if (pinNumber == 0) {
                    // Return the pin's interface
                    *destPin = tmpPin;

Libraries and Includes needed
strmiids.lib - From DirectX’s lib directory.
windows.h - From the PlatformSDK or Visual C++
atlcomcli.h - (Part of ATL) From the Platform SDK or Visual C++
dshow.h - From DirectX’s include directory
qedit.h - From DirectX’s include directory

Follow up
This is only for displaying existing video on a texture. This allows for perspective projection of the video instead of always just displaying 2D. If you want to just display the video in a 2D box over a window, this can be done in a much more efficient manner without textures. If you want to actually take a texture or frame buffer and convert it into a video file then that requires much more extensive COM and DirectShow Filter creation that is out of the scope of this gem entirely.

I look foward to any questions and more development.

Corey Taylor
G3D 6.07 3D Engine

11 Replies

Please log in or register to post a reply.

Trip99 101 Oct 16, 2005 at 17:25

Looks good. I believe DirectShow has now moved to the Platform SDK.

corey 101 Oct 16, 2005 at 18:58


Looks good. I believe DirectShow has now moved to the Platform SDK.

Correct, but I thought it best to keep in context of a generic DirectX setup.

This is just for explaining the concept really anyway.


corey 101 Oct 17, 2005 at 16:45

If you decide that you don’t want to buffer each sample, and want to implement the ISampleGrabberCB interface, then there is yet another caveat.

Even after the simple implementation, you have to realize the threading model of a rendering graph. Each sample is going to be delivered on a different thread than your rendering loop. So, if you’re trying to generate textyres during a drawing routine, then you’re going to get an invalid operation in OpenGL unless you sync up properly.


john 102 Oct 17, 2005 at 17:00


Thanks for sharing this code gem.

Are we allowed to use this code in our projects without any restrictions or constraints? I’m going to test the code out and let you know how it goes. I’ll make sure I give due credit though.

corey 101 Oct 17, 2005 at 17:03



Thanks for sharing this code gem.

Are we allowed to use this code in our projects without any restrictions or constraints? I’m going to test the code out and let you know how it goes. I’ll make sure I give due credit though.

Yes, of course. When I get some more time this week, I’m going to work on an optimized rendering layout including webcams. This seems a little too slow for me.


corey 101 Oct 19, 2005 at 00:23

There is another method, that I will do some comparisons with.

If you render to a normal Video Renderer filter or the VMR filter, there is a method on the IBasicVideo interface that allows you to grab the current frame in DIB format. VMR is the only reliable filter for this, so be careful to use that one.

As you know, an old EXT extension was EXT_bgra for texture formats which matches the DIB internal representation exactly. This should allow you to remove a fragmented (because a conversion is done in parts) memory copy from the equation.

I have not tested yet if the buffered sample grabber’s memory copies are slower or faster than the DIB creation. I would assume that the DIB would have to be generated when the VMR filter is rendering in a hardware-accelerated mode. The DIB will *not* generate in a normal video renderer under accelerated hardware modes.


corey 101 Oct 25, 2005 at 05:59

If anyone’s still interested, I will complete the capability and include a demo program.

Does anyone like any of the ideas above or would like to see more in some area?


dk 158 Oct 25, 2005 at 21:45

That would be great in fact. Perhaps you can even extend this into an article…

Hodge 101 Oct 28, 2005 at 04:05

Very nice. I’m not a big fan of directX but this might codegem might prove to be very helpful in the future. Thanks Corey for the all codework.

Unconnected 101 Jan 13, 2009 at 17:03

Thank you very much for presented approach to take frames via Direct Show! It works nice even with Windows 2003 and Windows Vista!

Sascha88 101 Sep 01, 2011 at 08:36

Thanx)) i’m using Vista cell phone spy now and everything works perfect too)) and don’t think that will change it for something else!! good job