Z-Buffer, C-Buffer, Span-Buffer, ...?

E3a1db864249a05e4952ac91cb55418d
0
rarefluid 101 Jun 11, 2007 at 13:55

I’m in the process of implementing a software render engine for a low-power device.
Do any of you people have implemented C-Buffers, Span-Buffers or some other culling schemes? IMHO:
- Z-Buffers are simple but have too much overdraw and memory requirements.
- Span-Buffers need no pre-sorting of polygons and have zero overdraw, but need some memory and are harder to implement.
- C-Buffers need pre-sorting, are simple to implement and need almost no memory.
I don’t really know what to choose…

10 Replies

Please log in or register to post a reply.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 Jun 11, 2007 at 21:05

What low-power device are you talking about exactly? My laptop with Core 2 Duo is low-power too. ;)

The choice of algorithm also depends on the resolution and the number of polygons. Also of influence is how much control you have over the appliction side. If you’re rendering a BSP then most likely a C-buffer is the best choice… unless you also have other more dynamic geometry to render. If you’re implementing OpenGL|ES then a z-buffer is likely the only option.

Also, what are the performance and quality requirements? Is it acceptable to have some artifacts (cfr. painter’s algorithm)?

E3a1db864249a05e4952ac91cb55418d
0
rarefluid 101 Jun 11, 2007 at 21:28

The requirements are for Gameboy Advance’ish hardware meaning:
- few polygons (max. 1k?), resolution max. 240x160
- Not really memory for a decent-resolution z-Buffer (though z-values are nice-to-have…). Actually not much memory at all (96k vram, 256+32k ram :) )…
- Subpixel-correctness (like 2-4 bits or something)
- Low power (16MHz), so overdraw is expensive depending on per-pixel operations
- should handle BSPs (easy) als well as dynamic objects

We have:
- many registers
- RISC, conditional instructions
- no hardware-divide, sqrt!

…painters algorithm sounds quite wasteful to me…

I know this has probably all been done before, but now I want to do it ;)

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 Jun 11, 2007 at 22:08

@rarefluid

…painters algorithm sounds quite wasteful to me…

Sort in reverse and use a c-buffer, and you have the same result without overdraw. ;) I was mainly referring to the artifacts you’re willing to allow. But considering the extremely low processor power I guess that’s not really a problem.

So a c-buffer is probably going to be your best bet. For BSPs just render from front to back. For other geometry sort polygons front to back. Both can make use of the c-buffer so it’s a pretty uniform way of rendering.

The c-buffer itself can be implemented either with spans (requires some basic memory management), or 1 bit per pixel (requires a fast way to count bits).

E3a1db864249a05e4952ac91cb55418d
0
rarefluid 101 Jun 11, 2007 at 22:24

The GBA has no count-leading-zeros instruction, which is bad especially after reading Nils’ article on math tricks with fixed-point…

I wanted to take a shot at the c-buffer, but I’m afraid that it needs pefect front-to-back sorting or subdivision of polygons to avoid of artifacts. Is this only an issue if you have overlapping polygons?

And thanks for all the replies Nick! :)

B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 Jun 11, 2007 at 22:36

hi rarefluid,

You would be surprised how much good looking 3d graphics you’ll get with just painters algorithm. That was the only way to get things on the screen on the ps-one, and it did worked quite good.

Your hardware seems to be really low level. I would suggest that you try c-buffer and rely on c-buffer friendly geometry. The Z and S-buffer overhead is significant if you only run on 16mhz.

btw - if you have no clz but fast memory accesses (likely on a 16mhz machine) you can do a bit of table work. Take a look at this method: http://graphics.stanford.edu/\~seander/bithacks.html#ZerosOnRightMultLookup

Nils

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 Jun 11, 2007 at 22:43

It’s hard to avoid all artifacts. However, given the platform, I think it’s going to be acceptable. Remember that the original Unreal game used polygon sorting… :yes:

Does the processor have a way to shift a register and take a conditional jump depening on the bit shifted out? Or maybe a ‘test’ instruction which can single out a bit?

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Jun 11, 2007 at 23:26

@Nick

Does the processor have a way to shift a register and take a conditional jump depening on the bit shifted out? Or maybe a ‘test’ instruction which can single out a bit?

The ARM7TDMI in the GBA is very cool. *Every* instruction can be conditional. Also, the so called barrelshifter lets you apply a shift or rotate to one of the source operands of most of the instructions.

E3a1db864249a05e4952ac91cb55418d
0
rarefluid 101 Jun 12, 2007 at 07:07

Shifting is a pseudo-instruction on the ARM. You can do a move-register-to-register while specifiying a shift-value. The shift and rotate instructions update the carry flag with the last bit shifted out.
It also has a TeST instruction (non-destructive AND) and a Test for EQuality (non-destructive XOR).
It also has a 16bit mode (Thumb) you can switch to and from as you like with quite powerful instructions too.
For the interested: http://eceserv0.ece.wisc.edu/\~morrow/ECE353/arm7tdmi_instruction_set_reference.pdf

@Nils: That page has some excellent tricks! :D

40291b377f26fec3eb0eec8d217935d9
0
DanDanger 101 Jun 12, 2007 at 09:53

Hi,

I recently implemented a software renderer which runs on Symbian mobile phones.

I implemented a simple Z buffer in our system, it really doesn’t take up that much memory (for a 176x208 screen memory used == 72k) and it works fast enough that one of our games runs about 30fps on some mobiles.

E3a1db864249a05e4952ac91cb55418d
0
rarefluid 101 Jun 12, 2007 at 10:23

With 96k of VRAM on GBA you can do:
240x160 indexed mode with backbuffer, 21k left
240x160 16bit mode w/o backbuffer, 21k left
160x128 indexed mode with backbuffer, 57k left (16bit backbuffer possible)
160x128 16bit mode with backbuffer, 16k left

There isn’t even space for some hierarchical z-Buffer tricks…
And IMHO a z-buffer should at least be only one memory access per pixel if your’re wasting so much memory and bandwith on it… :)