This has to do with optimal memory segment alignment, and a number of other things the details of are untold us.
The 256x256 textures 'claim' roots back to the early days of 3D hardware acceleration: when cards had 2 Megs of video ram, instead of half a Gig like today.
And it hasn't changed over the years, because changing it would collateraly mean to break the performance of the older games developed with 256x256 textures in mind (still widely played, you know).
Who is gonna take the responsibility of this change? nVIDIA? ATI?
They aren't rich enough to face the consequences.
Many people believe there is a sort of standard set of rules followed by the various hardware manufacturers.
The truth is, there isn't.
It's a nonsense, really. Today we have just ATI and nVIDIA. They rule the market. Other brands barely survive.
Why ATI and nVIDIA don't sit together and decide on a little standard to follow?
As programmers, we would be the first to benefit from it.
Every card works in its own way.
Every card segments its own on-board memory into a number of blocks (or clusters) of unknown size to us.
These clusters of video memory work similarly to those of your HD.
Optimal video memory usage is to use memory in multiples of the size of these video clusters.
I'll make an example.
Create a new *.txt file on your disk.
Open it, write in 1 single character, then save and close.
Now, how much space do you believe the file takes on your disk: 1 Byte or 32 KiloBytes?
The correct answer is 32 KB (right-click on the file and choose properties if you want a proof).
It could take up more space than the 32 KB of my example, of course.
It depends on how big the partition is and how is formatted the drive (FAT32 versus NTFS), but you get the idea.
The same goes for video cards, but gets more complicated.
When you allocate video memory (i.e.: upload a texture) you use space in multiples of the size of a video cluster.
Question: how big is this cluster?
Answer: you do not know it. And they won't tell you.
The card knows it. And the driver knows it.
DirectX and OpenGL, however, do not.
So do you.
To solve this problem we would either need a unified database of the size of video clusters for every existing card, or (even better) a known standard followed by the cards manufacturers.
Neither of the two exist. And we have no means to determine the exact amount of free video memory.
The problem is further aggravated by the fact that cards may or may not silently employ texture 'joining' when managing the images you upload to their memory.
I'm not referring to DTX compression and such.
It's like when you use WinZip to create an archive composed of 2 files.
You don't necessary need to employ a compression. You have the option to simply store the bytes of the two files, uncompressed.
Is it useful? Yes. It allows you to create an uncompressed entity rapresenting, say, 32 files of 1024 bytes each, yet keeps the data very fast to access.
Assuming you have clusters of 32 KB, this archive would fit into 1 single cluster (instead of 32 separate ones).
(For simplicity I'm not accounting for the archive's header)
This is the silent joining I meant.
You do not need to know when/if it is employed.
It may silently join one or more textures so that they fit in 1 video cluster (saving memory), without actually compressing the pixel array.
But when do they do it? On what do they base this decision?
You do not know it (I'm repetitive, I know).
You may upload 10 small textures and use 10 separate video clusters, or you may upload 100 bigger textures, and use 50 or less video clusters.
The duo card/driver decides it.
Also, cards may upload a variable amount of extra informations along with the texture (for example, to handle the texture joining). How many bytes are these extra infos?
You see, we are not told any of this.
Now, when you want to ready a texture for rendering, using this command:
// I'm assuming the API is DirectX.
pD3Device->SetTexture (stage, texture);
what happens is that the card makes a copy of the actual texture contents.
Where does it copy it? In a fast access buffer: the Video Ram.
Question: how many bytes are gonna be copied?
Answer: size multiple of 1 cluster.
A 256x256 texture would likely fit into 1 cluster, thus minimizing the amount of bytes to copy, and there would still be room for other textures.
What if more of 1 texture is acually joined together?
All of those textures would be moved together in one single operation.
So you'd end up with multiple textures ready for rendering.
What then, if you happen to need one or more of those other textures once you have finished with the one you did call SetTexture() for?
Calling again SetTexture() to ready a different texture would likely make the card discover that the needed texture is already in place.
Therefore to render with the new texture, no real movement of data would be needed.
Ok, now, this is a happy case.
But the chances that it occurs increase with the use of 256x256 textures. When they are assigned similar priority the textures end up tied and moved together, thus resulting in less real swappings when you render, thus resulting in faster renderings.
Bigger textures would (likely) require more clusters to move around, effectively resuling in more copy operations performed during rendering.
Look, I suck when it comes down to explanations. I hope you can follow me.
The bottom line is that, today like yesterday: textures of 256x256 pixels are the fastest to manage, because in a way or the other they happen to keep fitting well into video clusters.
Call it a happy coincidence, or call it an attempt of the hardware manufactuter to give us something we can count on.
Either way this is all we're gonna be told.
I know your next question: BUT WHY it has to be 256x256 pixels?
Why not 199x199 or 512x512 pixels?
For the same reason that 1 Byte is made up of 8 bits.
Why it can't be 9, or 12 bits?
No real reason, today, other than historycal (and, well, financial -but that's another matter-).
In the past they found out that 8 bits were enough yet not too many.
They were a good compromise between quantity and complexity, and so they adopted it, basing everything on it (the history of the ASCII char set explains this better than I do).
Today we have no real reason to maintain this amount.
We could change it to something better (something that would solve the big problem ASCII-UNICODE in a better way, for example).
But we don't.
Same goes for 256x256 textures. They were ok.
And everything has been based upon them.
Today we could change it. But we don't.
Hope I haven't confused you.
Ciao ciao : )