Both ATI & Nvidia offer multiple GPUs solutions for some years now (Crossfire & SLI). There are three common modes of operation available:
- Super AA or SLI AA mode (for improved image quality with increased AA);
- Scissor or split frame rendering (SFR) mode;
- Alternate frame rendering (AFR) mode.
As outlined in some talks from both IHVs, for a modern game engine that requires some sort of post-processing effect (all of them these days) the only usable mode is AFR . Note that even using AFR, care must be taken on when render-targets (DirectX) or render buffers (OpenGL) are updated and cleared.
Here are a few tips from vendors:
- To avoid GPU starvation make sure frames are buffered at least 2 frames. I think default is 3 for Nvidia and 2 for ATI. This way, at the end of a frame (SwapBuffers with OpenGL), the CPU does not wait for the end of one GPU processing before starting over new commands for the other GPU.
- Disable VSync, swap on V sync or wait for vertical refresh to maximize FPS if tearing is acceptable.
- Never call glFinish (OpenGL) since this would kills the asynchronous work between the CPU and GPUs. The CPU would have to wait for the end of one GPU processing before submitting any new commands to the other GPU.
Obviously those tips will maximize parallelism between the CPU issuing commands and GPUs. But how is it possible to render a smooth animation this way? If the CPU is totally asynchronous with the GPU, then the CPU has no idea at the time to compute positions when the actual frame will be displayed on screen.
For my application tearing is not an option, so at least this way I know that frames are displayed at a constant rate (the display refresh rate). But since frames are still buffered, CPU still doesn’t know when the computed frames will be displayed!? The lag may also become a problem… with 3 frames buffered and 2 frames being the maximum time taken by a GPU to render a scene, the latency could max out to 5 frames! @ 60 Hz this would mean 83 ms…
I found that on Vista it is possible using DirectX 9 or 10 to call a function named WaitForVBlank that would solve my problem.
Does anybody found a way to address this in Windows XP with DirectX or using OpenGL?
/Cubitus
Multi-GPUs AFR vs frame lag & smooth animation
Started by Cubitus, Mar 03 2008 01:59 PM
5 replies to this topic
#1
Posted 03 March 2008 - 01:59 PM
#2
Posted 03 March 2008 - 04:46 PM
You can of course tell the GPU to wait for v-blank to display the next frame in either XP or Vista, with either Direct3D or OpenGL. In fact this is normally the default to prevent tearing. But I don't think that really addresses your lag issue. While it prevents tearing it also potentially makes lag *worse* because it limits the framerate to the refresh rate of the display.
reedbeta.com - developer blog, OpenGL demos, and other projects
#3
Posted 03 March 2008 - 04:54 PM
Cubitus said:
For my application tearing is not an option, so at least this way I know that frames are displayed at a constant rate (the display refresh rate). But since frames are still buffered, CPU still doesn’t know when the computed frames will be displayed!? The lag may also become a problem… with 3 frames buffered and 2 frames being the maximum time taken by a GPU to render a scene, the latency could max out to 5 frames! @ 60 Hz this would mean 83 ms…
As far as i understand it the graphics card is allowed to buffer up to 3 frames of rendering data (In practice i'm pretty sure the IHVs sometimes allow more than 3 frame to buffer).
Normally it will accept a frame's worth of rendering data and then present is called and the GPU kicks into aciton (This ISN'T actually how they do it but its easier to visualise this way). The CPU can keep pushing data into this buffer up until 3 frames worth is there and then it has to stall to wait for the GPU to catch up.
With a dual SLI (Or crossfire) system then instead of waiting for 1 frame worth of data to buffer before starting rendering it need to wait for 2 frames worth of data. Then the 3rd frame is readied and present stalls. I will admit i don't know exactly how SLI/XFire works but the above would seem pretty logical to me ...
Either way you have a 3 frame latency ... So i don't see the problem ;)
#4
Posted 03 March 2008 - 06:12 PM
CPU isn't totally asynchronous to GPU. If you are GPU bound and try to push too much data (e.g. over 3 frames) to command buffer, CPU will stall in Present() call (D3D) until GPU has processed the frame. This will synchronize CPU with GPU and your CPU frame time will match GPU frame time. Also, in your case the lag will only be 3 frames not 5, because GPU will be max 3 frames behind CPU, i.e. there can be only 3 frames worth of data in command buffer.
#5
Posted 03 March 2008 - 09:37 PM
For me no image tearing is a requirement... I don't care about maximum FPS but I do care about maximizing usage of a second or third GPU (tri-way SLI).
First I understand that using the API (either OpenGL or DirectX) or by forcing it using the Control Panel, GPU's buffer swap is synchronized with the video v sync signal.
Frame latency: On paper, when both GPUs swap during v sync, I found a case where with three buffered frames and the GPU processing time being 2 frames the latency becomes 5 frames. I didn't find a way to quickly paste an image in my posting...
In fact my main concern here is more related to the fact that since frames are buffered, the CPU doesnt know when frames will be displayed. So how a smooth animation can be computed on the CPU? The current computed frame maybe displayed in 2 frames ahead, or maybe 3 frames... I dont see any way to know it.
WaitForVBlank is available on Vista (DirectX 9 Ex & DirectX 10). Using this call it all become deterministic if I use it with "swap on vsync". If GPU processing takes a bit less than 2 frames then the frame currently computed will be displayed after two vertical blank signals. Combined with Nvidia OpenGL fences or DirectX events I think I could managed cases where GPU processing took more than 2 frames (overload).
Any better/simpler idea?
/Cubitus
First I understand that using the API (either OpenGL or DirectX) or by forcing it using the Control Panel, GPU's buffer swap is synchronized with the video v sync signal.
Frame latency: On paper, when both GPUs swap during v sync, I found a case where with three buffered frames and the GPU processing time being 2 frames the latency becomes 5 frames. I didn't find a way to quickly paste an image in my posting...
In fact my main concern here is more related to the fact that since frames are buffered, the CPU doesnt know when frames will be displayed. So how a smooth animation can be computed on the CPU? The current computed frame maybe displayed in 2 frames ahead, or maybe 3 frames... I dont see any way to know it.
WaitForVBlank is available on Vista (DirectX 9 Ex & DirectX 10). Using this call it all become deterministic if I use it with "swap on vsync". If GPU processing takes a bit less than 2 frames then the frame currently computed will be displayed after two vertical blank signals. Combined with Nvidia OpenGL fences or DirectX events I think I could managed cases where GPU processing took more than 2 frames (overload).
Any better/simpler idea?
/Cubitus
#6
Posted 04 March 2008 - 06:37 PM
If tearing isn't acceptable, why don't you just enable VSync in present parameters? I don't see what's the big problem. You don't need to fool around with WaitForBlank or such. Regarding smooth animation, because CPU is stalled at Present, the time from Present to Present on CPU is the time it takes for GPU to render a frame. It's reasonable to assume that this time is constant between few successive frames. Also, if D3D prevents CPU from proceeding until GPU has presented a frame, there is no way you can have 5 frame latency. It sounds to me you are over-complicating the situation.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users











