Jump to content


VRAM caching?


  • You cannot reply to this topic
16 replies to this topic

#1 Mazter

    New Member

  • Members
  • Pip
  • 5 posts

Posted 25 September 2005 - 11:23 AM

Hi!

Has anyone done a caching mechanism for VRAM?

Having some sort of manager that distributes VertexBuffers using a LRU scheme, keeping track of which VRAM memory chunk to fill?

Would be very intressted discussing this with someone experienced :)

#2 Alex

    Valued Member

  • Members
  • PipPipPip
  • 152 posts

Posted 25 September 2005 - 02:56 PM

I had something the like running in my last project. It was a pretty controverse subject. Coworkers either hated or loved it. I allocated several static and dynamic default pool vertex and index buffers at startup (a certain percentage of the available vram). The rest was left for textures, shaders etc. It was pretty much a virtual memory manager that swapped stuff in and out on demand (and on device reset). So you could stop worrying about eating up all vidmem. IHMO this is especially usefull if you work with many ppl on the 3d engine. More often than not one of us would just grab an insane amount of vid mem for a special effect, or accidentely. without the mem manager this killed performance immidietly. Also you get a pretty accurate indication of how much vid mem/agp mem is still available and how badly you're trashing the memory (paging per frame). There is a lot of discussion going on on how to use buffers correctly for best performance. I think m$ pretty much failed to address this completely. The memory api is fine if you're writing a demo for a special gfx card. But if you want to release a big project to the public you need scalability and predictabilty of the resources available and he resources left. Here the api fails completely IHMO.
The biggest problem I had was that I wanted the mem manager to take care of restoring lost buffers automatically while keeping a malloc style interface for the manager. This didn't work out well because for dynamically created content the manager needs the ability to poll the data through some kind of interface in order to restore lost buffers.

Well..I'd also be interested to hear what ppl think about this kinda thing. As said some ppl totally rejected it because they thought it is the "wrong way" of using vertex/index buffers. IHMO the published nvidia/ati docs support this approach..
oh..and it had a pretty decent performance and guaranteed proper usage of lock flags etc...

ALex

#3 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1225 posts

Posted 25 September 2005 - 04:49 PM

Wait for DirectX 10. It will require drivers to virtualize all memory, so the problem is solved at the root. If I understand correctly, most current generation DirectX 9 cards will also get a DirectX 10 driver.

Until then the only solution seems to keep track of memory resources yourself. Some engines use a layer directly on top of the Direct3D API which does this. So you can't accidentally call an API function directly and mess up the memory management.

#4 Nils Pipenbrinck

    Senior Member

  • Members
  • PipPipPipPip
  • 597 posts

Posted 25 September 2005 - 06:38 PM

I never had problems allocating video memory resources. I just allocate all resources as managed. Textures, vertex and index buffers, everything that's possible. Sure - they eat up system memory, but in case of a device reset you need to reload them anyways. Why load them from disk? I'd rather leave the main memory copy and let the memory manager swap them out if I run short on memory.

The only exceptions are dynamic vertex/index buffers, dynamic textures (for movies) and other surfaces. These are handled different, but they don't eat up that much memory, so no problem here.

I also shipped several titles, and we actually test our games on video cards that definately don't have enough vram for our need. The Matrox G400 is my favorite gfx-card to test on. It's a bad joke feature and memory-wise nowadays, but if your game at least remotley works on it you can be sure it will work everywhere else.

In case you use way much memory than the gfx-card can provide, the DirectX debug runtime will tell you that the internal memory manager goes into panic mode. In my case The DX runtime has discarded the entire video memory (except frame and z-Buffer) tree times per frame, but it worked, and if I consider how much memory was moved on a per frame basis it wasn't anything but slow.

So - why reinvent the weel and add another caching layer when the existing one already does a very good job?

#5 Alex

    Valued Member

  • Members
  • PipPipPip
  • 152 posts

Posted 25 September 2005 - 07:10 PM

Nils>
The problem is that if you code your game on a specific card that has x amount of vram and things work fine then switching to a card with a tiny bit less vram can totally kill your perf because suddenly you use more memory per frame than is available. Using the d3d api there is no way of telling that you will or might hit this boundary apart from realizing that your perf just goes nuts. If you know how much vram is left you can start using lower res models etc to avoid hitting this case at all.
Current d3d basically exspects you to start allocating (most importand res first) until you're done and pray it works out..no chance to predict how much mem you should actually use to get a decent perf. This is ok if you have abundand mem ...but on most customer pcs the vram is very limited (check the steam statistics page to see how many gfx cards are still being used that have virtually no vram left after the frame buffer has been set up).
Sure...use managed for everything and have ppl upgrade their hardware..that works..
but how many customers are pissed off and return the product just because you used n bytes of vram to much without even knowing?

Alex

#6 Mazter

    New Member

  • Members
  • Pip
  • 5 posts

Posted 25 September 2005 - 08:48 PM

Nils, Alex and Nick, thx for your input :)

Nils >
Concider this:

A player rotates from position A -> B -> C than back again C -> B -> A.
At each of the stops there are totally new Models and textures.

So prefferably you want all the Models from both A, B and C in VRAM -
so that you don't need to send anything over the Buss - unfortunately
you can't since there is more data than the VRAM can take :(

Lets assume:
A = 32mb vertex data + textures
B = 32mb vertex data + textures
C = 32mb vertex data + textures
and lets say you can have two scenes in VRAM simultaneously

You move from A -> B and both are cached in VRAM
Then you move to C, so you need to remove either A or B's data
from VRAM in order to upload the data from C to the VRAM and render
it.

So will DX / OpenGL automatically know to discard the Last recently used
buffers, which is the data from scene A and keep the data from scene B?

Since then you go back to B - which with your own LRU scheme would be
cached in VRAM. So you don't need to upload that data ;)

Then you move to A again - and need to remove the C's data from VRAM
in order to render it.

Then we go back to B...

So, if DX and OpenGL already has this mechanism built in I see no greater
point in making your own :)

But as far as I know, they don't or do they??

If they did there should at least be some way of knowing if the VertexBuffer has been invalidated and
needs to be uploaded again - which I can't see ther is.

C ya

#7 Alex

    Valued Member

  • Members
  • PipPipPip
  • 152 posts

Posted 26 September 2005 - 07:48 AM

Managed buffers pretty much work the way you described. D3d keeps a system memory copy of your data. Using your flags and some heuristics (maybe more sophisticated than LRU depending on the gfx driver) it copys the data from system mem to agp or vram (if there is enough space in either). This typically happens the first time you use that resource. So you should "touch" the resource (render a singly textured triangle of it or so) before you start real rendering. Otherwise you might get noticeable latency the first time the resource is rendered cause the driver has to copy it around. If you run into a memory limited situation the driver will start putting your data into less optimal memory
(if vid mem is full and there are no unused managed resources stored in vidmem that can be evicted the driver might use agp or even system memory for static geometry which is suboptimal).
To sumarize, the driver takes care of swapping in and out managed resources.

Default pool resources are a totally different story. You have to swap them and restore them your self in case of a lost device. They should be allocated before the managed resources because for some reason they interfere with the managed resources in a way the runtime cannot handle properly(and you might be denied to create a def pool res cause the managed ones have eaten up all vid mem). I guess the managed resource manager cannot handle fragmentation properly. You have to evict all managed resources if you create a default pool one or the complete managed mem management might go nuts.

I still wonder what they were thinking when they introduced this to d3d...
I don't see a reason why the driver cannot save and restore "lost" memory from lost devices. Might be slow but doesn't matter in case of a dev reset.
Nor can I see a good reason why the user has to syncronise managed an default pool resources using evictallmanaged() in case of a non standard allocation order. Finally I don't see why d3d doesn't supply some sort of statistics that help you to decide what kind of resolution data you might wanna use to avoid scenes that use more mem than is available.
The end user is forced to experiment with the detail/model/texture resolution
option of the 3d engine to get an acceptable performance. Using the (unavailanble) memory statistics the engine could dynamically adapt to the current scene being rendered showing the best possible quality at rather constant speed.

Alex

#8 Axel

    Valued Member

  • Members
  • PipPipPip
  • 119 posts

Posted 26 September 2005 - 09:59 AM

Nick said:

Wait for DirectX 10. It will require drivers to virtualize all memory, so the problem is solved at the root. If I understand correctly, most current generation DirectX 9 cards will also get a DirectX 10 driver.
AFAIK DX10 doesn't have Caps anymore, so I don't think so. But the LDM will virtualize memory for DX9 aswell.

#9 kusma

    Valued Member

  • Members
  • PipPipPip
  • 163 posts

Posted 27 September 2005 - 12:29 AM

also remember that d3d has a preload-feature for managed textures. now, be a nice boy, compress all your textures, try to do sensible reductions on the vertex-data, preload every now and then, and your driver will treat you right ;)

#10 XORcist

    Member

  • Members
  • PipPip
  • 39 posts

Posted 27 September 2005 - 08:22 AM

I think there is hardly anything left for you to save by doing this stuff by yourself. Its best to leave it d3d/driver to do this stuff for you ( atleast on pc )

As alex suggested precache all your geometry before u start presenting frames.

I've written my own vid-mem manager but, that is on XBox and it proved
useful because of Xbox's architecture & hardware access (UMA)
On XBox you can allocate your own contigous memory block( UMA ) and then point D3DResource's Header. This saved memory & also aligns properly, especially when there are lots of small textures (lightmaps).

On PC I did not do any mem management as that was the best option IMO.

#11 Mazter

    New Member

  • Members
  • Pip
  • 5 posts

Posted 27 September 2005 - 09:20 AM

Ok thx guys, and especially to Alex :)

So, If my VRAM is full the data is uploaded to agp memory - but is it then
copied to VRAM if I use that alot more than some junk that still is
in the VRAM ?

Thx

#12 Alex

    Valued Member

  • Members
  • PipPipPip
  • 152 posts

Posted 27 September 2005 - 10:41 AM

AFAIK that totally depends on the driver. Managed resources allow you to set a priority that is a hint for the driver how importand the resource is. High prio resources will be evicted later than low prio ones. That's about all control you have for managed resources. If scalability is no issue for you then just used managed and let the driver worry about things :)

I just had a quick look at the docs again..there is a IDirect3DResource9::PreLoad() function that precaches the resource if that makes sense under the current mem situation. So you don't need to "touch" resources manually rendering a subset.

Alex

#13 Mazter

    New Member

  • Members
  • Pip
  • 5 posts

Posted 27 September 2005 - 12:47 PM

There is also one more concern I had about this.

Usually GPU's don't like small buffers to be sent to them, it's kind of a dissapointment seeing a few hundreds of vertices beeing sent to them right?

And accordingly to some NVidia papers the optimal size of VertexBuffers
should be around 64000 vertices ~ 4 mb.

So you should batch the vertices in buffers of that size right?

Then I ask myself how that would work together with the managed buffers.

There will probabilly be a scenario where you batch shit loads of models
into one buffer and let it stay in VRAM... but then again after some time
you only want to render 10% of that buffer. But even though you
only render 10% the drivers will give this buffer higher priority than others
accordingly to LRU scheme.

So 90% of the buffer might actually be wasted - which could in reallity
be used by some other active geometry?

#14 Alex

    Valued Member

  • Members
  • PipPipPip
  • 152 posts

Posted 27 September 2005 - 02:32 PM

Small buffers are not really the problem, submitting small batches is. But to have big batches you mostly need big buffers. To avoid the problem you have identified you should batch stuff together that is very close in 3d space and normelly rendered together
like different sub batches of one model or several trees located close to each other.
By using the custom mem manager you can totally circumvent this problem but you have to take care of fragmentation etc.
So you are right and you have to code around this hidden issue. Which again hints at the crappyness of the d3d mem interface...if I may say so :)

Alex

#15 Mazter

    New Member

  • Members
  • Pip
  • 5 posts

Posted 27 September 2005 - 05:52 PM

Well, finally I the anwser I want lol :)

No really, I'll do it just for the sake of learning... since I have wanted to learn
about memory schemes anyway ;)

The thing is I have no clue about what memory cache strategy would be best
for low fragmentation vs very fast speed?
If anyone has any book recommendation or any web resources, that
are up to date, I would really appreciate if you would give me some tips.

#16 Nils Pipenbrinck

    Senior Member

  • Members
  • PipPipPipPip
  • 597 posts

Posted 27 September 2005 - 07:21 PM

I think Alex already answered most questions and gave enough info, but I have to disagree:

Small batches:

They are indeed evil, but they aren't that evil either. I draw a lot of small batches in my gui code (really a lot) and the performance is still great.

What I do is to use large dynamic vertex buffers (non managed) and use a circular allocation scheme.

That means, I can allocate .. say .. 10 vertices from a big VB (enough for 60k vertices), lock and fill them, and then do the DrawIndexedPrimitive call. Several hundret times per frame (sometimes for each word of text when I display a big HTML-table with statistics. It's the worst case, but it happends im my gui code all the time).

That sounds like a worst case performance problem if you've read nVidias or Ati's guidelines, but in reality it's not. The drivers will merge DIP-calls whenever possible, and in the case of dynamic vertex buffers the data has to be copied to the video ram anyways (pushbuffers!) and it is very possible to merge the calls unless you change materials or the vertex layout.

Dynamic data:

In case of a device lost there is no need to restore the data inside dynamic textures/buffers at all. D3D will ignore DIP-calls anyways, so why care about the data? There is *no* need to cache anything dynamic since D3D won't display any frame unless you've restored your device. And after you did so, you'll fill the buffers with fresh new data.

Managed data:

Our current game has about 100mb of vertex- and texture data, and those are locally grouped together. That means, any time I render a frame from the game I need just a little subset of the Lightmap-texture, vertex and index data. I haven't done any statistics, but I think a good guess would be a need something between 40 and 60 mb of raw data. A simple Portal visibility scheme is all you need.

The game runs very well on 32 mb cards (1Ghz Cpu + gforce 2mx was our minimum requirement because the game targeted the casual gamer market). The game runs playable (that means > 20fps) on those totally underpowered, 5 year old cards. Resource trashing inside the VRAM happens, but is no real issue if you allocate your resources wisely (e.g. group vb/ib and texture data locally, don't spread them across the entire scene).

That said, I doubt the game would run much faster *if* the old gfx-cards had enough memory. The amount of video-ram just reflects the fillrate power of the cards. I'm pretty sure the time nessesary to upload textures and geometry overlays nicely with the time to render the remaining batches.

Imho there is no need to write a custom vram memory handler unless you work on consoles (and those have special requirements for everything anyways). Better spend your time in a shading system that makes it possible to get great looking scenes without the video-memory killer number one: Textures.. Clever use of multi-texturing (Detail Textures and other simple multi-layer techniques) can reduce your memory requirements a lot.

#17 Alex

    Valued Member

  • Members
  • PipPipPip
  • 152 posts

Posted 27 September 2005 - 09:44 PM

Small batches:
The eat cpu like hell. I profiled that you can do about 1k batches per frame at a 1.5Ghz cpu at about 30fps. That means using 100% cpu. On different systems similar values were retreaved. The dip calls were not mergable though. So using different textures and other states. Of course the amount of cpu u can afford to spend on dips depends on the application. But IHMO they should be minimized as much as possible. Opengl does a lot better here the "dip" call in ogl is a lot less costly cpu wise.
So in a game where often even sub objects of a single objects use several materials (unmergable dips) and that might be displaying many different objects your dip calls can really eat into your cpu usage. Add some culling, AI, user input etc on top and you quickly arrive at unplayable rates...

As said before, if your target spec doesn't need your game to scale well from high end to low end while using the max of what each end has to give (feature and perf wise) then you should use the d3d managed stuff and dynamic buffers where applicable.

If your game is developed on high end and even pushes that to the limits cause object count and variety is just insane (maybe not the best design decission but that's not a programmer decission) and it has to scale down to the low end while staying playable then the amount of information and control given by d3d is IHMO not sufficient.

Alex





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users