Advanced Rasterization
#161
Posted 22 July 2010 - 11:56 PM
https://sourceforge....cts/phenomenon/
#162
Posted 23 July 2010 - 06:55 AM
I have abandoned the idea, in favour of a pvs calculation system i am starting to rethink again.
My main problem was basically that i had to render stuff 2 times, even tough at a lowe resolution regarding the occlusion buffer.
Are things mature enough to work on it again ? , isn't realtime ray tracing approaching fat, will all the competency accumulated be wasted in a 3-4 years ??
Basically i mean is it worth to write a 'manual' occlusion culling system right now ?
Gurus, please respond...
#163
Posted 23 July 2010 - 10:15 AM
Herrcoolness said:
https://sourceforge....cts/phenomenon/
Try using RCPPS followed by one or two iterations of a newton-raphson division (if x is an approximation of 1/d (which is given by the rcpps instruction), x*(2 - d*x) is a better one).
-
Currently working on: the 3D engine for Tomb Raider.
#164
Posted 23 July 2010 - 10:31 AM
v71 said:
I have abandoned the idea, in favour of a pvs calculation system i am starting to rethink again.
My main problem was basically that i had to render stuff 2 times, even tough at a lowe resolution regarding the occlusion buffer.
Are things mature enough to work on it again ? , isn't realtime ray tracing approaching fat, will all the competency accumulated be wasted in a 3-4 years ??
Basically i mean is it worth to write a 'manual' occlusion culling system right now ?
Gurus, please respond...
-
Currently working on: the 3D engine for Tomb Raider.
#165
Posted 24 July 2010 - 10:43 AM
.oisyn said:
Thnx for the tip ;)
#166
Posted 24 July 2010 - 10:09 PM
For a software rasterizer, bascially you have to write 2 renderers , one using opengl or directx and the other running entirely on the cpu.
I mean, everything vertex rotation, perspective divison, and a fast triangle filler.
I know that it is sufficient to use the z-buffer , use a lower screen resolution , and other optimization, but i am asking to myself, is it worth to write a system like this ? isn't ray tracing approaching fast ?
Even if ray tracing won't be used to render a complete scene with light, will the new multicore gpu boards allow us to write a visibility system running entirely on hardware in a matter of 2-3 years ???
#167
Posted 25 July 2010 - 05:39 AM
v71 said:
For a software rasterizer, bascially you have to write 2 renderers , one using opengl or directx and the other running entirely on the cpu.
I mean, everything vertex rotation, perspective divison, and a fast triangle filler.
I know that it is sufficient to use the z-buffer , use a lower screen resolution , and other optimization, but i am asking to myself, is it worth to write a system like this ? isn't ray tracing approaching fast ?
Even if ray tracing won't be used to render a complete scene with light, will the new multicore gpu boards allow us to write a visibility system running entirely on hardware in a matter of 2-3 years ???
#168
Posted 25 July 2010 - 04:52 PM
.oisyn said:
I used the this method which added 4 sse instructions : 2 muls, 1 mov (constant read), 1 sub
The speed was same as using divps. :huh:
#169
Posted 25 July 2010 - 08:35 PM
.edit: no, it's still there: http://www.intel.com...nual/248966.pdf
Chapter 6.1:
Quote
— If reduced accuracy is acceptable, use them with no iteration.
— If near full accuracy is needed, use a Newton-Raphson iteration.
— If full accuracy is needed, then use divide and square root which provide more accuracy, but slow down performance.
If you google on "rcpps newton raphson", a lot of sites are saying it's faster as well.
-
Currently working on: the 3D engine for Tomb Raider.
#170
Posted 26 July 2010 - 04:44 AM
.oisyn said:
.edit: no, it's still there: http://www.intel.com...nual/248966.pdf
Chapter 6.1:
If you google on "rcpps newton raphson", a lot of sites are saying it's faster as well.
I used :
// xmm1 - input value RCPPS xmm0,xmm1 mulps xmm1,xmm0 mulps xmm1,xmm0 addps xmm0,xmm0 subps xmm0,xmm1 // xmm0 - output valueNow, there is no constant load.
But almost still same speed as using:
*DIVPS - 32.50 fps
*RCPPS + Newton-Raphson iteration - 32.40 fps
*RCPPS - 33.70 fps
. I have Intel Core 2 Quad Q8300. It may be true. After 4 years the DIVPS can be faster. But i will use the iteration for older CPU's. :) But still thnx for the tip :).
#171
Posted 26 July 2010 - 01:14 PM
Herrcoolness said:
divps still has a high latency of maximum 15 cycles, but if you have other instructions that can execute independently then that's no problem. If instead you use rcpps and a Newton-Raphson iteration the total latency is nearly identical but you're executing more instructions (while you could have done other work instead).
So indeed on newer processors its faster to use divps, and you even get full precision!
#172
Posted 26 July 2010 - 02:26 PM
-
Currently working on: the 3D engine for Tomb Raider.
#173
Posted 13 August 2010 - 02:46 PM
I uploaded 2 demos. One with colored debug info and one without the coloring to see how it normal works.
*black tiles - skipped tiles of the hidden small quad
*green tiles - tiles drawn with the fast write fucntion (no z comparison) and are not compared against the triangle edges
*cyan tiles - tiles are drawn with fast write function (no z comparison) but compared against the triangle edges
*gray tiles - tiles are drawn with function that compares the z-values agaisnt the z-buffer and are compared against the triangle edges
Next stop ...clipping and transform pipeline ... and first rotated cube? :happy:
https://sourceforge....cts/phenomenon/
#174
Posted 29 August 2010 - 02:55 PM
Now the triangle input coordiantes are in NDC (Normalized device coordinates), so x and y postion need to be in +1,-1 interval. Why this? because this are using graphicards and helped me to solve the problem when you change the size of the window. Now the size of of the triangles is changing too and is propotional to the rendering window.
Aaand i added third texture filtering method for low-end pc's. Its almost fast like nearest texture filtering (because of 1 texture fetch) but looks almost like bilinear. Yes-yes you saw this method in Unreal. I found a description about this technique in old flipcode archive on net (http://www.flipcode....In_Unreal.shtml)
There are 2 demos :
-one static to see how fast are all 3 techniques (push 1,2,3 to change the filtering technique)
-and dynamic to see the dither-bilinear technique in action (push 1,2,3 to change the filtering technique)
#175
Posted 18 September 2010 - 03:09 PM
About the demo;
q,e - moving in y direction
a,d - moving in x direction
w,s - moving in y direction
1,2,3- filtering method
9,0 - vsync on-off
https://sourceforge....cts/phenomenon/
#176
Posted 18 September 2010 - 04:36 PM
An unhandled exception occurred at $00402437 : EAccessViolation : Access violation $00402437 $004183A4 DDRAWFLIPWINDOWED, line 45 of ddrawwindowed.inc $0041872C GS_WNDPROC, line 144 of gs_screen.inc $0041D87E WNDKEYBPROC, line 29 of fenomenon_keyboard.pas $0042DAA2 WNDMOUSEPROC, line 62 of fenomenon_mouse.pas $7E418734 $7E418816 $7E428EA0 $7E428EEC $7C90E473 $7E4196C7 $00411530 $00401D0D Heap dump by heaptrc unit 97 memory blocks allocated : 13722380/13722720 86 memory blocks freed : 13616288/13616616 11 unfreed memory blocks : 106092 True heap size : 5373952 (128 used in System startup) True free heap : 5266832 Should be : 5267016 Call trace for block $0007DEE8 size 64 $004097E8 $00408381 $0041C375 $004092AE $004183A4 $0041872C $0041D87E $0042DAA2 Call trace for block $00067158 size 24 $00408381 $0041C375 $004092AE $004183A4 $0041872C $0041D87E $0042DAA2 $7E418734 Call trace for block $000670F8 size 16 $0041C187 $004092AE $004183A4 $0041872C $0041D87E $0042DAA2 $7E418734 $7E418816 Call trace for block $0011E410 size 391 $00417D0C $004119B8 $00401CD0 $0040D111 Call trace for block $0011E240 size 391 $00417D0C $004119B8 $00401CD0 $0040D111 Call trace for block $020299A0 size 1159 $00417D0C $004119B8 $00401CD0 $0040D111 $F0F0F0F0 $F0F0F0F0 $F0F0F0F0 $F0F0F0F0 Call trace for block $02028FC0 size 2439 $00417D0C $004119B8 $00401CD0 $0040D111 $F0F0F0F0 $F0F0F0F0 $F0F0F0F0 $F0F0F0F0 Call trace for block $020275E0 size 6535 $00417D0C $004119B8 $00401CD0 $0040D111 Call trace for block $02022400 size 20871 $00417D0C $004119B8 $00401CD0 $0040D111 Call trace for block $027A0198 size 74119 $004178D2 $004119B8 $00401CD0 $0040D111 Call trace for block $00116238 size 83 $004119B8 $00401CD0 $0040D111
#177
Posted 19 September 2010 - 06:55 AM
#178
Posted 19 September 2010 - 09:04 AM
512 RAM
GeForce4 MX 440 with 64 MB
PS/2 Mouse + USB Keyboard
An unhandled exception occurred at $00403247 : EAccessViolation : Access violation $00403247 $0041A4B4 DDRAWFLIPWINDOWED, line 45 of ddrawwindowed.inc $0041A7DE GS_WNDPROC, line 131 of gs_screen.inc $0041F98E WNDKEYBPROC, line 29 of fenomenon_keyboard.pas $00432382 WNDMOUSEPROC, line 62 of fenomenon_mouse.pas $7E418734 $7E418816 $7E42C03D $7E42C228 $7E42C1D5 $004122E7 $0041A820 $0041F98E $00432382 $7E418734 $7E418816 $7E428EA0 Heap dump by heaptrc unit 84 memory blocks allocated : 8439373/8439688 77 memory blocks freed : 8434761/8435056 7 unfreed memory blocks : 4612 True heap size : 1867776 (80 used in System startup) True free heap : 1862528 Should be : 1862616 Call trace for block $00085DD8 size 64 $0040A5F8 $00409191 $0041E485 $0040A0BE $0041A4B4 $0041A7DE $0041F98E $00432382 Call trace for block $00067068 size 24 $00409191 $0041E485 $0040A0BE $0041A4B4 $0041A7DE $0041F98E $00432382 $7E418734 Call trace for block $00067008 size 16 $0041E297 $0040A0BE $0041A4B4 $0041A7DE $0041F98E $00432382 $7E418734 $7E418816 Call trace for block $000E96B0 size 147 $0040DF21 Call trace for block $000A96B8 size 3859 $00402A45 $0040DF21 Call trace for block $000A16A0 size 403 $00402A45 $0040DF21 Call trace for block $00099698 size 99 $00402A45 $0040DF21
#179
Posted 19 September 2010 - 03:12 PM
Mihail121 said:
512 RAM
GeForce4 MX 440 with 64 MB
PS/2 Mouse + USB Keyboard
An unhandled exception occurred at $00403247 : EAccessViolation : Access violation $00403247 $0041A4B4 DDRAWFLIPWINDOWED, line 45 of ddrawwindowed.inc $0041A7DE GS_WNDPROC, line 131 of gs_screen.inc $0041F98E WNDKEYBPROC, line 29 of fenomenon_keyboard.pas $00432382 WNDMOUSEPROC, line 62 of fenomenon_mouse.pas $7E418734 $7E418816 $7E42C03D $7E42C228 $7E42C1D5 $004122E7 $0041A820 $0041F98E $00432382 $7E418734 $7E418816 $7E428EA0 Heap dump by heaptrc unit 84 memory blocks allocated : 8439373/8439688 77 memory blocks freed : 8434761/8435056 7 unfreed memory blocks : 4612 True heap size : 1867776 (80 used in System startup) True free heap : 1862528 Should be : 1862616 Call trace for block $00085DD8 size 64 $0040A5F8 $00409191 $0041E485 $0040A0BE $0041A4B4 $0041A7DE $0041F98E $00432382 Call trace for block $00067068 size 24 $00409191 $0041E485 $0040A0BE $0041A4B4 $0041A7DE $0041F98E $00432382 $7E418734 Call trace for block $00067008 size 16 $0041E297 $0040A0BE $0041A4B4 $0041A7DE $0041F98E $00432382 $7E418734 $7E418816 Call trace for block $000E96B0 size 147 $0040DF21 Call trace for block $000A96B8 size 3859 $00402A45 $0040DF21 Call trace for block $000A16A0 size 403 $00402A45 $0040DF21 Call trace for block $00099698 size 99 $00402A45 $0040DF21
#180
Posted 19 September 2010 - 04:20 PM
Herrcoolness said:
SSE2 is not supported, MMX and 3DNow! only. Desktop resolution is 1024x768@16.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












