HDR Texture Compression. How to.

1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 08, 2005 at 22:23

Hi to all,

I need to introduce HDR textures in my DX9 engine. Actually, IMHO the best texture format for HDR images is A16R16G16B16. The problem is that every texel occupies 8 bytes! An extremely huge quantity of memory.
Compression techniques are necessary. I looked at DX9 demos and I see a nice compression method called RGBE8. Basically, it rewrite every channel (r, g, b) in the form: m * 2\^e. m represent the mantissa of the channel and e is the exponent (stored in the alpha channel). Exponent is common to all the three channels.
This method requires 4 bytes per texel, that is an improvement over the naive one.
Actually I can extend this compression up to 1,5 bytes per texel in the following way: I compress RGB channels with DXT1 compression (0,5 bytes per texel). Then, I use another texture to store the exponent uncompressed (this increase the precision because exponent precision is crucial). Total: 1,5 bytes per texel.
A quite good compression.
However, there is a problem: this method works only if you use point filtering. Other filtering methods, like the bilinearing filter, make the things go wrong. :sad:
I have tried to apply point filtering to exponent texture (to fetch it correctly) and bilinearing in the RGB channels… but this still expose too artifacts.
I have another idea: splitting the 16 bit per channel texture in two 8 bit per channel textures (low byte & high byte texture) and apply DXT1 compression to both. Total: 1 byte per texel! Problem: an error in the “low byte” texture is not very noticeable. But an error in the “high byte” texture (because DXT1 is a lossy compression as you know) is quite devastating… HELL! :angry:
A solution to the problem is to store the low byte texture compressed with DXT1 and the high byte texture uncompressed. Total: 3,5 bytes per texel. :huh:
Here bilinearing filters don’t make problems.
Good quality, but It’s not a great compression… I will be happy if I can compress it a bit more (up to 2 bytes per texel or less :).
I’m very happy if you cool guys can suggest me others methods.

Thanks in advance,

  • AGPX

22 Replies

Please log in or register to post a reply.

820ce9018b365a6aeba6e23847f17eda
0
geon 101 Sep 09, 2005 at 07:40

It should work to use lower resolution for the exponent. I am not shure about the artifacs, but is should only be noticeable in high contrast edges.

1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 09, 2005 at 20:56

@geon

It should work to use lower resolution for the exponent. I am not shure about the artifacs, but is should only be noticeable in high contrast edges. [snapback]21034[/snapback]

Actually I do the RGBE encoding with the following code:

int encode(float r, float g, float b,
         float &encodedR,
         float &encodedG,
         float &encodedB)
{
  const float maxComponent = max(max(r, g), b);
  const int exp = (int)ceilf(log2f(maxComponent));
  const float divisor = powf(2.0f, exp);
  encodedR = r / divisor;
  encodedG = g / divisor;
  encodedB = b / divisor;
  return exp;
}

So: r = encodedR * 2\^exp, g = encodedG * 2\^exp and b = encodedB * 2\^exp;

I found that it works better if I return exp * 16 (instead of exp), but in high contrast edge some Halo effects appear.

Anyway, I dunno if is convenient to do bilinear filtering manually in the pixel shader…
This sound too nasty and slow… :dry:

The following image is a shot from my editor. I use RGBE normally (without multiply by 16):

pic18cx.png

The following is with the exponent multiplication:

pic29co.png

The last show halo and artifacts (again with multiplication by 16):

pic34wj.png

Exponent is clipped to stay in the range [-8, 7] when multiplied by 16. So it lie in the range [-128, 112]. I also sum 128 to remove the sign ([0, 240]). So it can stay in a single byte. Yes, dynamic range is less than previous, but images look really better.

I don’t understand the nature of the artifacts, at all. Halo should be due to bilinear filtering. The problem is that filtering should happen AFTER decoding and not before.

Is this what you mean for “use lower resolution for exponent”?

Thanks,

  • AGPX
1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 09, 2005 at 21:52

Hi again,

I have fixed the artifact problem. It’s due to DXT1 compression of the RGB channels (not the exponent). However, halo still occurs… (thanks to mikez for his hints).

  • AGPX
820ce9018b365a6aeba6e23847f17eda
0
geon 101 Sep 09, 2005 at 21:53

@AGPX

Is this what you mean for “use lower resolution for exponent”?

Not at all…

As you showed in your code, you can encode several values with the same exponent (hence RGBE). There is nothing stoping you from encoding even more values to a single exponent.

A neat usage of this would be to store the color / exponent in a separate texture, as mentioned, BUT! why store so many of those exponents? If you take a block of say 4x4 pixels, the exponents should generally be virtually the same. (Except in very high contrast.) So; just store the exponent once and reuse it.

If you use this solution, you must of course use no interpolation for the E channel, since the RGB values “expect” exactly the original exponent. The RGB channels should be fine to interpolate, though, expect for seams between 4x4 blocks. (This could be really ugly.)

One could, however, take this into account. Let’s say you use linear interpolation. (Can we assume the interpolation will return exactly the same values for every card/driver? I think this is very well defined by D3D… ?) Then, sample down the original exponent texture first, and encode the RGB values based on the interpolated values. Theoretically, this should work perfectly in most cases. (You might need to run an extra pass on the downsampled exponent texture to give each pixel the highest value of its neighbours, so no interpolation gives less than needed max-RGB-value. Then the final color would allways stay too dark.)

Also, if you match the block-size to the DXT compression size (4x4), the RGB values should still compress nicely.

1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 09, 2005 at 22:03

thanks to all repliers.

1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 09, 2005 at 22:32

Actually, I have two textures: one for RGB and the other for exponent.
I have tried to apply bilinear filtering only on the RGB, disabling it in the exponent. Here the result:

pic45et.png

The halo is more evident.

The problem is that mantissas related to two quite different exponent, cannot be interpolated linearly. :sad:

My research still continue…

Helps are always appreciated, thanks.

  • AGPX
6ad5f8c742f1e8ec61000e2b0900fc76
0
davepermen 101 Sep 10, 2005 at 16:24

the only way working out correctly would be to do the linear sampling manually by doing 4 pointsamples…

on ps3.0 you could optimise it by determine if you actually are at an edge of the exponent, and do the full bilinear only as needed.

this way, you could gain much speed to get it near to pointsample performance.

1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 10, 2005 at 20:29

Hi again,

I will become crazy… without doubts. :wacko:

I have changed my method. I split my 16 bit integer texture (A16R16G16B16) in two 8-bit per channel texture (one for the low byte, and the other for the high byte).
The low texture is compressed with DXT3 compression. The other one is stored uncompressed (A8R8G8B8). According to my calculations, this method should be immune from bilinear filtering issue. And it is… but only with Reference Rasterizer! With HAL device, the thing don’t works. :blink:
WHY??????

Take a small look to the pixel shader I used to reassemble the two 8 bit textures in a 16 bit one:

sampler2D lightMapLow;
sampler2D lightMapHigh;

float4 Expose(in float4 rgba, float exposure)
{
  return 1 - exp(-rgba * exposure);
}

float4 texHDR2D(in sampler2D texLow, in sampler2D texHigh, in float2 tex)
{
  return tex2D(texLow, tex) + tex2D(texHigh, tex) * 256.0;
}

float4 ps_main(float2 inTex: TEXCOORD0) : COLOR0
{
  return Expose(texHDR2D(lightMapLow, lightMapHigh, inTex), 1.2);
}

And now take a look at the results using Reference Rasterizer:

picture13ki.png

And the following is with HAL device on Sapphire Radeon 9800 Pro (128 Mb, 256 bit):

picture24gg.png

HOW THE HELL IS THIS POSSIBLE?

Please help me!

P.S.: I have tried to use A8R8G8B8 for low texture instead of DXT3, but NONE change!

P.S.2: Works well on HAL only if I switch from bilinear filtering to point filtering!

P.S.3: Here you can see the low & high textures generated by my lightmapper:

LOW:

ltbox010low0rf.png

HIGH:

ltbox010high6in.png

The high is not totally 0, they have some pixel set to 1,2 and 3. The textures appears to be ok.

1b28d9c9e5171f60c303e9e78f56b3c1
0
AGPX 101 Sep 10, 2005 at 23:27

Hi,

all the problems are now solved.

I post here my solution to support other peoples with a similar problem.
Basically, the problem is due to the high interpolators imprecision.
Reference rasterizer is preciser than HAL device.
So basically, there are nothing to do: you have to perform POINT filtering and make bilinear filtering via pixel shader. Here the code to perform the filtering (gently posted to me by a guy on #flipcode. Thanks to you!)

Here the pixel shader:

float4 bilinear(in sampler2D s, in float2 t)
{
  const float2 wh = {256.0, 256.0};
  const float2 dwh = 1 / wh;
  float2 dxy = t * wh - floor(t * wh);
  float4 a = lerp(tex2D(s, t + float2(0,   0)), tex2D(s, t + float2(dwh.x,   0)), dxy.x);
  float4 b = lerp(tex2D(s, t + float2(0, dwh.y)), tex2D(s, t + float2(dwh.y, dwh.y)), dxy.x);
  return lerp(a, b, dxy.y);
}

This way, also the RGBE encoding works well, so I switch to it, because it requires less storage memory. Oh, well, at least… only if I can compress it with DXT1-5 without introduction of artifacts… I will investigate on that tomorrow and, finally, I think that I’ll write a tutorial on HDR compression… could be helpful, until hardware manufacturer will introduce compression for 16 bit textures! (to say all the truth some 16-bit FourCC formats exists, but are largely not implemented).

Thanks to all for help. :D

C4b4ac681e11772d2e07ed9a84cffe3f
0
kusma 101 Sep 11, 2005 at 21:25

thanks a lot for the posts, really nice to see that someone else is struggeling with bilinear interpolation of hdr-textures ;)

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 Sep 12, 2005 at 00:55

Reference rasterizer is preciser than HAL device.

Should read: Reference rasterizer is preciser than ATi HAL device. :rolleyes:

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Sep 12, 2005 at 01:59

Yup. ATI skimps on precision, using only 24-bit floats for internal pipeline. But nVidia uses the full 32 bits.

8da63768a5161e191b4715e48292a435
0
bonzaj 101 Sep 13, 2005 at 15:27

Hello. Really nice topic :). And what with RGBE cubemaps?

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 Sep 13, 2005 at 20:47

Yup. ATI skimps on precision, using only 24-bit floats for internal pipeline. But nVidia uses the full 32 bits.

AFAIK the interpolators are FX8 only.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Sep 13, 2005 at 22:37

Yeah, the bilinear interpolation in the texture fetch may well be only 8-bit. But even in a pixel shader, the temp registers are only 24-bit rather than 32-bit.

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 Sep 13, 2005 at 22:49

No, I mean I’m pretty sure that the interpolators on Radeons are not floating point. The bilinear interpolation is only 6 bit btw…

8da63768a5161e191b4715e48292a435
0
bonzaj 101 Sep 13, 2005 at 22:57

yes but what with bilinear filtering on cubemaps? How to fetch the neightbours??

8da63768a5161e191b4715e48292a435
0
bonzaj 101 Sep 14, 2005 at 00:02

ok I’ve got it - It’s on ATI’s page.

C4b4ac681e11772d2e07ed9a84cffe3f
0
kusma 101 Sep 14, 2005 at 09:58

@bonzaj

ok I’ve got it - It’s on ATI’s page.

a link would rule.. ;)

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Sep 14, 2005 at 16:09

Radeon 9500 and up supports floating point pbuffers, so I’m pretty sure they must have floating-point temp registers. Unless they are doing some kind of fancy bit-packing stuff…

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 Sep 15, 2005 at 15:55

@Reedbeta

Radeon 9500 and up supports floating point pbuffers, so I’m pretty sure they must have floating-point temp registers. Unless they are doing some kind of fancy bit-packing stuff…

They have FP24 temp registers, but the interpolators are not FP.

8da63768a5161e191b4715e48292a435
0
bonzaj 101 Sep 15, 2005 at 18:45