as I mentioned in the article -
- here is my little attemp to try to explain those “why’s”. I’d be glad
if anyone made a reply containing - whether I shall continue and further
do the explanation (e.g. explain stuff like how does triangle rendering
& shading work and why, and also how to access buffers - etc. etc. -
there is really a TON of stuff to explain) … now few little notes for
I’m really NOT writing this or the accompanying application as something
high-performance, not that this can’t be turned into high-performance
renderer in the end - this whole thing is to explain how rendering works
and mainly WHY it works like that.
So what do we need (for compiling & using the application) - you need
just Gtk, C99 standard compiler and ehm… thats probably all.
What is our goal in this part - our only goal in this part is to write
two basic types into the renderer - the framebuffer and the
(vertex)buffer. Plus additional procedures for clearing framebuffer and
drawing points (really the easy and extremely useless way now - just to
show something on screen).
The source + headers + makefile with explanation is here
now the details (e.g. the article):1. The Framebuffer object
The word framebuffer is quite self-explaining itself - it’s the buffer
that holds the frame (e.g. the result of rendering). Of course this
explains why it has to have width, height and some data. Additional
parameters are holding data like number of channels in it (this comes
useful when you have different types of framebuffer, like RGBA (4
channels), or DEPTH (1 channel)) and size of one pixel in bytes (useful
for simple stepping through buffers pixel by pixel).
So why do we need framebuffer - basically the framebuffer is what you
think, it’s a thing that allows us drawing to texture (because storing
dimensions + pixel size + channels are actually texture infomation) -
and textures are good. Our resulting image is a texture, or if we want
for example shadow mapping for shadows - we need rendering to texture -
this is where framebuffers are quite useful. Note that we can bind
different textures to framebuffer sequentially (changing the parameters
and data pointer of framebuffer now, but actually I’d like to point out
that there should be pointer (or pointers) to some texture_t type). If
I’ll have some success with this thread - I’ll extend this so we’ll be
able to attach F.e. multiple textures to framebuffer (and this
explaining how MRT work).
2. The (Vertex)Buffer object
Further I’ll call this just buffer. This one is quite similar to
Framebuffer, but stores just 1D data. The buffers mostly stores vertices
(4D floats) and also other parameters for them (texture coordinates,
normals, etc. etc.). So…
Why vertex buffers? Why won’t we call Vertex(x, y, z, w) all the time?
Because calling that is a huge waste of resources and for static meshes,
it’s even more huge waste! There is a little (very very little computing
power we need for each call - but doing that 1 million times a frame
more is really quite a bit of computing power). For static objects this
is even larger - because we can create buffer once at start and then we
don’t have to touch it, for dynamic objects we can just update vertices
that need to be updated (so rest vertices wont be even touched) +
updating buffer in a loop is faster than calling Vertex(x, y, z, w)
zillion times (you’ll save call overhead multiplied zillion times).
But this isn’t all - vertex buffers are stored in a single memory block
- so it means, that transforming vertex buffer in a loop is a lot faster
than doing it per vertex (memory accessing is a lot faster - because
next vertex to transform will most likely be in cache - be it on CPU or
GPU). Also the matrix-vertex multiplication (and practically whole
vertex processing) can be batch-processed - and thats a win (in
performance of course)!
So how do we work with buffers? Because this renderer is a state
machine, we work with single framebuffer/vertexbuffer at once - and that
single is the binded one. I mostly followed the way OpenGL does it (but
DirectX isn’t far off from this - in the end they’re both quite similar)
- so you have to generate the object, bind the object and then you can
work with binded object (e.g. fill it with data, read parameters from it
and write into it’s data). Although writing to them is quite simple now
it will work until we meet the parallelization (then it becomes
not-as-simple) - I hope I’ll ever make it that far - to explain why we
need to map/unmap buffers in OpenGL.
The last thigs are - basic operations - clearing the currently bind
framebuffer (just see the file basic_ops.c) and draw operations with
brain dead simple point write (warning, no clipping occurs to really
keep it brain dead simple).
Well, this should be probably everything for now. I hope I’ll get at
least a bit of feedback whether it is helpful and whether I should
continue or whether I should rather to stick to programming (instead of
explaining - I know that I’m not a good teacher). If this is
understandable and explaining, at least a bit - and if you allow me to
continue - I’d like to get to lines, triangles, vertex processing and
pixel processing next time(s). And then I might get enough courage to
post as article (and feel like real ninja on devmaster :ph34r: ).
And by the way, thanks for reading this. :)
Please log in or register to post a reply.
WOW, nice! Thank you, and YES please keep going like a Ninja :ph34r:
You are pehaps jumping to conclusion too fast, maybe you try to explain
too many things at once? donno, I understand it all, but for a beginner,
I’m not sure if he will.
What I think, is first explain how the GPU works, for example, does it
scan the viewport pixels and convert that to vertex and call the vertex
shader? Or does it loop the list of triangles and calls the vertex
shader for each vertex? When you call a gl function, is it executed
right away or does it go in a statement buffer until you call another gl
function? You know, the basic oparating of the GPU so we know what it
does. Like explaining how a car works, need gas for the pistons to go up
and down, front wheel for easy steering, not the back wheels, they steer
too fast, etc. Not to go into too much technical details that gets
Once we have a good base of the GPU, then explain about what it expect
from the programmer in order to display his model. Including
limitations, such as 32 textures max or whatever, so the programmer
don’t start thinking ‘oh, I can fit the whole thing in 2,000 textures
and draw the whole thing at once’ kind of things. What I mean by “what
it expect” is some of the gl functions to get going, and why they are
needed and what they do, and why in that specific order. That kind of
I’m probably missing tons of stuff, but that’s just a geneal idea of
what I think the layout of a tutorial should be for everyone, not just
for the genius all of you are. If I had that tutorial when I first
started, I would never have posted on this board those dumb questions of
Basically I’d like to continue in this into really full article
explaining how today’s GPUs work - because in the old times (what most
articles are talking about) we had some bunch of vertices, set texture
and light, send it to GPU and it magically created a scene out of it
(thats the all time blamed fixed function). Today the GPUs works in very
similar way to our multicore CPUs (e.g. not even close to what they did
before) - e.g. they’re general, parallel and vectorized (like SSE in our
CPU). So explaining it whole in single article is quite a TON of
information (and most people would run away from it).
Although lots of people are still learning from articles all time
discussing about fixed-function and using it and in my opinion thats
very wrong - it’s unnatural for GPU to work that way today - it’s like
writing 386 processor emulator for our core i7 and then using it to
actually do stuff (but it’s even worse, because 386 was architecturally
closer to core i7 than old GPU to new programmable one).
I could also make an OpenGL code for each article - that could give idea
how much shorter is calling just GPU to do the stuff (e.g. to see how
much work the OpenGL library actually does for us). And it could also be
better for beginners in OpenGL to understand what is going on in the
library. (Thanks for pointing me here).
Anyway your point that in some introduction where I should say how GPU
works today and the basics of it before I drop implementation at the
reader is very good (the purpose of this thread is mainly finding some
good way to structurize the article(s) and if they will be useable, then
if the staff here agrees, I’d like to publish them here on
Okay… time to play Nightcore and start coding :D
And well another thing came to my mind - I’m one of the linux guys,
though I think it could be good that I’d give away also windows binary +
source (because I doubt that most beginners work on linux).
Very good points. The new GPU stuff is a must, and drop the old as there
are many tutorials on that old stuff. Windows, yes, Linux, yes, but what
about a more general one? pseudo-code, so everyone can understand it,
not just c++ users, and pseudo-code is platform independent.
Maybe write a table of content first to make sure it’s well organized
and covers everything you want it to cover?
About a title for the tutorial?…
In depth tutorial - Today’s GPU - by Vilem Otte
On the one side pseudo code is fine (for description), but also giving
some real C code thats actually showing the stuff in motion is important
in my opinion.
So far in my “pseudo” table of contents I’ve got:
1.) The basics - how does CPU <-> GPU interaction actually work -
basically all the stuff the graphics developer should know about GPU
2.) Implementing basic software renderer (e.g. emulating toplevel
library + driver + GPU) - the “first triangle”, of course it will be a
lot simplified (we’ll fuse those 3 together to keep it simple (it’d
really get quite complicated if we wouldn’t) - although one can get a
picture how much work one needs to draw a single triangle)
3.) Doing the same magic as GPU (e.g. rendering some actual scene with
textures, lighting and maybe shadows) - e.g. to see that created library
is capable of actually rendering the stuff (and I hope i’ll manage to
get at least some fps on my Core i3 here on laptop).
4.) … (Any hints here?)
5.) Profit :D
Basically I will break 2.) into few pieces, and 3.) into two or three.
The whole thing should be to show how much stuff is internally happening
(1), how does rendering actually work (2) and to show that whole thing
wasn’t as useless as it seems to be (3).
Note: Fusing those 3 together is also necessary for getting quite good
performance. I’d recommend looking at mesa for actual re-implementation
of what EXACTLY graphics library, driver and hardware does … after 5
minutes it should be clear, that simplifying the stuff is really
necessary (especially for people that don’t know too much about how
exactly rendering works - dropping implementation and simulation and
description of virtual hardware on then would be really scary in my
opinion). And the last thing is performance - doing it the mesa way is
slower on CPU and I really mean a lot slower.
As far as i know the main things to avoid with using direct x is bus
overload with too many state sets or matrix sends when animating objects
versus projection count of points. if you hit a before b then youve
completely screwed it up and not getting max projections. another no no
is using the geometry shader for instancing. :) woops thatll pump 3
times too many projections into it and slow you to a halt also. this
idiot did all of the above, so definitely i know what to avoid.
nifty thing to do is run the whole thing on gpu, all the animation on
gpu circumvents the bus send problem for approx 4 times the speed on my
comp, with an experiment.
all that just gives you the max instance count achievable, so if you
want to render a detailed city, you need to render appropriately - i bet
all this is in your doc, ill scan read it over, 2 cents added.
forgive my french, but this is extremely idiotic and i prefer to call it
rooting the bus, avoid it at all costs its the most lame excuse for
render code there is.
Okay, so far I’ve got implemented the whole thing (just few little
details remain) - it’s not optimal and written in plain C (thats why it
went THAT fast). So far it seems that it will really be big and I mean
I asked myself a few questions what I want and what I don’t want to
include (now I mean in the article). Because well… it’s quite huge
(and I’m glad I decided to merge whole thing together and not split it
to virtual machine, driver, toplevel library and client application -
because well, then it would really be a lot bigger than it’s now).
So far I’ve implemented (and thats probably everything what will be in
the description) - rasterization as fixed function (e.g. client
application can’t re-work this one - like it’s in OpenGL/Direct3D),
programmable vertex & pixel shading (when you get idea how it’s working
here, you’ll most like get idea that adding geometry shader there isn’t
that hard (same goes for tessellation shaders)) - e.g. to deliver user
the idea that HE is writing most of the stuff, GPU just does what it’s
said like the normal CPU (ehm… in case of this project it actually is
normal CPU), and calls for “driver/toplevel library”. It’s actually in
single project and everything is using the single one same CPU - but the
code is structured (partly commented - need to finish this one) and uses
different naming conventions to get user idea what is done on device and
in driver in real scenario.
Now is the time to do some “real scene” into this (one with texture &
per-pixel lighting (maybe even shadow maps, if my Core i3 will manage to
do it interactively) for example) and write the article (with purpose to
help people understand what is actually going on when I do rendering (at
least I hope I’ll help them understand)).
We all can’t wait :ph34r:
It’s quite time to tell my progress, right now I got like 10 pages in
word processor written, thought still covering like two thirds of the
stuff I’d like to cover there. An application is complete and needs just
few commentaries right now, and I’m happy it works quite fast on my Core
i3, and very fast on my Core i7 (even though it’s not written to be
I’ll also upload application in few days :) (maximally - I need to make
and debug Windows port).
Sounds very exciting! I hope the sysop will put it in it’s own
Okay article structured and mostly done (as for now it has some 17 pages
of text, including few images and math) … now just get ready 2 sample
applications (and some tech stuff about them to article) for it and then
write some final words. Note that I actually have no idea what to do
with it, I’ll definitely upload it to my server (as pdf) and post link
here, possibly getting some feedback on errors there (if they would be
there - nobody is perfect and we make mistakes, especially in works done
“overnight” after work) … and if it will be good enough, I’d be very
honored to post it here on devmaster (somehow if that would be
Vilem, we would be happy to have it hosted here on DevMaster as an
article, or perhaps split up into an article series given the length. On
the articles page can you
see the “Submit Article” button? (I’m not sure if it’s only there for
mods, or for all users.) Once you’re ready, you can submit there, or
just PM me or Dia and we can put stuff up there.
Okay, after like 2 days I managed to get back to this project - and so
far everything is checked as complete (except the demonstration
programs). Right now I’m working on them and today (meaning Wednesday),
or tomorrow I’ll most probably finish them. :)
:D can’t wait :D
Okay, first sample application done (Sorry for delay, I’ve been actually
sleeping between these posts). It’ll need though code cleanup & comment
a bit more - but it’s finally working. It has been heavily inspired by
cubes with very old NeHe’s texture :D
Here is image of wonderful application! Note that it features everything
from clipping, perspective correct texture coordinates (e.g. the stuff
that device is doing these days without our interfering), through vertex
shading (where matrix-vector multiplications are going on) to pixel
shading (where texture is sampled).
And no project would be complete without a glitch in development. On
this one the clipping went actually crazy (I’m not totally sure what
happened there, but I was working with our beloved W coordinate of
vertex and messed something up).
I like the glitch! :ph34r:
Note. It’s though a bit slow - although it could be made at least 10
times faster with a while spent on optimizations (using more intrinsics
in the code, more static memory instead of dynamic, modifying half-space
triangle raster procedures to work on NxN blocks instead of single
pixel, also interpolating perspective-correct texture coordinates is
done from barycentric coordinates per pixel (using deltas would make it
a lot faster), etc.) - basically I think it’s quite descriptive (the
code) and it can be seen whats going on there and that was my point.
Also this was actually created during evenings and free time in quite a
little time - optimizing it would need at least another week or so, and
the code would be a lot more messy than it’s now. :) Okay time to
cleanup code, comment and do the second sample (scene model and textures
Okay, so I’m gonna get a little break - anyway - the paper & sample are
Any feedback is welcome, if there is some mistake, or anything missing,
please write here and I’ll add it to the article (or sample). If it’s
okay, please write it and I’ll try contact Reedbeta about putting the
document here on DevMaster.
I’ll add second sample later in the process :)
Very well done! I can’t wait to see what’s next :)
It would be better if ReedBeta put this thread on top so it doesn’t kep
going down the list. Like the link on how to search the site.
Hold your horses, Alienizer. Vilem and I are going to be working on an
article version of this. :)
oh ok, sorry, I thought he was being ignored since nobody posted
anything about what he did!