asm engine

946a7917ecd7a6fc9ffb881dfdf23067
0
Sephir 101 May 05, 2006 at 11:28

Well, I’ve been programming for sometime now in C++ and honestly I got tired of it. Basically there’s nothing else that attracts me more in C++ and since programming is a hobby for me, I wish to take a new, different path. I decided to go assembly.

I’ve been googling and taking a look at Devmaster.net but I can’t really find a game (or just rendering) engine for the assembly medium. Maybe because people are afraid of programming in assembler due to the time spent to write and debug, or maybe people are really only interested in C++ because for some it might be more simple.

I took a look at the revamped TripleBuffer also and the asm resources are okay but none that really fullfills my needs (for description’ sake).

My question is if anyone interested in programming in asm knows about any sort of asm engine out there. If any I think it would be a great addition to the programmers community. If there isn’t then I think I will have to drop my hopes or start everything from the ground up which would take a lot of time.

Any useful information would be great.

Thanks for your time.

\~Sephir.

83 Replies

Please log in or register to post a reply.

2b97deded6213469bcd87b65cce5d014
0
Mihail121 102 May 05, 2006 at 15:20

Be my guest, clicky

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 05, 2006 at 15:49

I’m an assembly nut, but I still value C++ very highly. I regard C++ as a tool to write assembly code in situations where absolute control and performance is not required, which is the case most of the time. So I don’t really understand how you got ‘tired’ of C++. Can I ask how many years experience you have? Personally I have about six years experience in both C++ and assembly, and I use both almost daily, but I can’t imagine writing everything in assembly. That’s just tedious, error prone, ugly and very hard to manage. I only use it where it’s more suitable for the task.

I think all you need is an exciting project to work on, not another language.

065f0635a4c94d685583c20132a4559d
0
Ed_Mack 101 May 05, 2006 at 19:26

I think all you need is an exciting project to work on, not another language.

Here here.

6c301c866bd83b07323e88a609231111
0
cypher543 101 May 05, 2006 at 20:37

If you are still interested, here is an open source 3D RTS created entirely in 32bit Assembly: http://www.oby.ro/rts/index.html

6f0a333c785da81d479a0f58c2ccb203
0
monjardin 102 May 05, 2006 at 20:49

@”Hostile Encounter RTS”

It is written in FULL 32bit Win32ASM for great speed and effects

* We want to make XBox, Linux and MacOS versions also.

Why the hell would you write all of a game in assembly if you are planning on porting it to completely different architectures? :wallbash: What a porting nightmare!

When they say MacOS, do they mean after Apple switches to Intel chips? :surrender

406ad7b52f4ed7443e3971e6b2adf430
0
kariem2k 101 May 06, 2006 at 12:39

Why again you are bored from C++?Thats ok,but moving to assembly :blink: sorry for all assembly fans (I am one of them and i have experiance in asm) :) but assembly is not practical for todays gaming requirments and of course it is not for any indie who wants to develop a game.
There is a point i want to clarify,This is the learning path of assmbly: after learning some 8086 assembly for dos you will need to go to win32 assembler and after a bit you will find that you are ruled with the same windows APIs which you have in C++ but with much work to do and nearly no performance boost because the compilers makers are much much experianced than you(since you are a hoppiest) so the code generated with compilers will be much optimized than your code.
Assembly is interesting if you want to explore new things which are not represented to you by any higher programming langauge and if you want to learn reverse engineering (Thats why i have learned assembly), But it is not practical for medium to large applications and it will be a time waste.
If you want to learn assembly there is http://win32asm.cjb.net and “The art of assembly” free ebook and reading source codes.
If you want engines that use assembly,i did not hear about any new engine that uses assembly but there is TGE which uses assembly for some textures blending,Doom(gpl),Quake(gpl),and i think Quake 3(gpl).
Sorry for the negative feedback :).
bye

46462f88a1670d7e9cbbfa360aa20134
0
juhnu 101 May 07, 2006 at 05:03

Why all in ASM?
This is a complex matter of oppinion. Here are some of my arguments IMHO: * We know ASM well. ASM is also very easy to learn
* Speed is of the essence in GAMES. ASM is 100% up to 300% faster than today “optimized” compilers
* There will allways be ASM code in a GAME so: Why NOT write all in ASM?
* ASM is free, cant tell that about Most HLL Compilers
* We dont like BLOATWARE generated by most of today HLL compilers
* Very good tutorials at Iczelion’s site and web support from Hiroshimator’s Win32ASM messageboard make the difference
* A Game req. maximum control over the hardware and software
* ASM gives total power and maximum flexibility
* It is FUN!

It’s interesting how people assume hand-written assembly being always faster. Not that these other reasons why they chose asm are any better..:wallbash:

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 May 07, 2006 at 08:48

Let them learn it the hard way ;)

Eaa9847123e828897f960de0badf1ffa
0
Alex 101 May 07, 2006 at 16:37

I don’t know about you..but C/C++/asm or any other language is just a tool to build what ever you want. The fun thing for me is to build something interesting (neural nets sim or what ever) using what ever tool is most apropriate. It doesn’t hurt to know some asm because it can be helpful when debugging and it gives you some clues about how your systems works. But apart from some exotic problems you’ll hardly ever code in asm…unless just for hobby fun…
Alex

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 07, 2006 at 16:50

@juhnu

It’s interesting how people assume hand-written assembly being always faster.

It’s true actually. It’s just not always easy.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 08, 2006 at 01:00

Are things like instruction pipelining still doable on today’s wide market of different x86 architectures? I mean, in the old days you knew the CPU your code was going to run on. Today, AMD and Intel processors totally differ from eachother, and with things like hyperthreading good pipelining seems like a waste of time (as stalls are filled up by the other thread, or according to the theory at least).

That was more like a serious question than it was retorical btw, haven’t done any hand-written ASM in ages. The last time I checked an instruction reference manual was when AMD introduced their 3dnow :wub:

46462f88a1670d7e9cbbfa360aa20134
0
juhnu 101 May 08, 2006 at 02:40

@Nick

It’s true actually. It’s just not always easy.

My point exactly. What I meant was that it’s true only if you know what you are doing and have enough time to actually do it. Just writing everything in asm won’t automatically give you 100%-300% performance, which they think it does (especially when asm encourages micro-level optimizations instead of algorithmic ones).

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 08, 2006 at 08:23

@.oisyn

Are things like instruction pipelining still doable on today’s wide market of different x86 architectures?

Ever since the Pentium Pro (1995), desktop processors are capable of out-of-order execution. This means they look for the next (independent) instruction to fill their pipelines themselves.

919cdb65748ac9d3c6aa6ccea59ad6ad
0
Dias 101 May 08, 2006 at 09:07

Why stop at asm? write 1´s and 0´s for ultimate performance :P

Seriously though, if you want to do it just for fun/learning then go ahead,
but i doubt very strongly that that the speed gain overweights anything else
that higher-level languages offer.

860fe478a2545d6c07b88c759292499e
0
SmokingRope 101 May 08, 2006 at 09:16

This thread is awesome! I wouldn’t have ever thought to start reading win32asm source-code at 5:15 AM but thanks to this thread that’s exactly what i’m doing!!!

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 08, 2006 at 12:29

@Dias

Seriously though, if you want to do it just for fun/learning then go ahead, but i doubt very strongly that that the speed gain overweights anything else that higher-level languages offer.

Do you know assembly or are you just guessing? :mellow:

High-level languages use the lowest common denominator, which is just basic 32-bit operations. Modern processors offer a whole lot more, like complex operations and vector operations.

One example of a complex x86 instruction is ‘bsr’, which stands for ‘bit scan reverse’. It returns the position of the first 1 bit, starting from the most significant bit. This is equivalent to an integer logarithm, and can be very useful in some projects. In a high-level language it takes many operations to compute this value. Vector operations use MMX and SSE. SSE can make many floating-point intensive calculations up to four times faster. Examples of real-life software that use (some) manually written assembly code are: audio and video codexes, physics engines, raytracers, drivers, low-level libraries, etc.

Obviously I agree that writing everything in assembly is useless, but using it in some hotspots can be very rewarding to take the application to a whole new performance level. And even if you have no direct use for it, it’s still very valuable for debugging. Issues that can take days to debug at high level can become amost trivial when looking through the assembly code.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 08, 2006 at 12:44

For those interested in learning assembly code, here’s an excellent tutorial: PC Assembly Lanugage.

But avoid using an assembler directly (and skip the assembler specific topics in the tutorial). Instead, use inline assembly in C++ to make life easier:

#include <stdio.h>

int main()
{
    int a = 345;
    int b;

    __asm   // Start inline assembly block
    {
        mov eax, a     // Load 'a' from memory into eax register
        bsr ebx, eax   // Compute integer logarithm (base 2)
        mov b, ebx     // Store result to memory
    }

    printf("log2(%d) = %d\n", a, b);
}
340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 08, 2006 at 14:00

A fast equivalence in pure C++ without __asm blocks

int main()
{
    int a = 354;
    int b;

    unsigned value = a;
    // replicate top bit
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;

    // count bits
    value = ((value & 0xaaaaaaaa) >>  1) + (value & 0x55555555);
    value = ((value & 0xcccccccc) >>  2) + (value & 0x33333333);
    value = ((value & 0xf0f0f0f0) >>  4) + (value & 0x0f0f0f0f);
    value = ((value & 0xff00ff00) >>  8) + (value & 0x00ff00ff);
    value = ((value & 0xffff0000) >> 16) + (value & 0x0000ffff);

    b = value - 1;
    printf("log2(%d) = %d\n", a, :wub:;
}
919cdb65748ac9d3c6aa6ccea59ad6ad
0
Dias 101 May 08, 2006 at 14:23

@Nick

Obviously I agree that writing everything in assembly is useless, but using it in some hotspots can be very rewarding

I guess i could agree with that though :)
However i feel theres too many discussions about speed around development forums when there are many other important factors to consider as well.

Cff67041e0c439e1beefef7de6f864fe
0
Nodlehs 101 May 08, 2006 at 17:31

Premature optimization is worthless. If you want to write in ASM for just the heck of it, for a challange, or whatever, go ahead have a blast, sounds fun.

Please don’t do it for performance reasons. If you want something to ‘interest’ you more, first read Code Complete, it may prompt you to evaluate your outlook on C++ and other languages. Finally, as others have pointed out, maybe find an exciting project instead of an exciting language, sounds like a lack of motivation more than anything else.

860fe478a2545d6c07b88c759292499e
0
SmokingRope 101 May 08, 2006 at 17:46

Need More Links …

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 08, 2006 at 19:39

@SmokingRope

Need More Links …

Intel Developer’s Manuals (The Bible)
MMX/SSE Primers (Nice quick reference too)
x86 Architecture (The alphabet starts with ACDB)

B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 May 10, 2006 at 01:06

Integer Log2 with *that* much code?

// taken from http://www.stereopsis.com/log2.html
static inline int32 ilog2(float x)
{
    uint32 ix = (uint32&)x;
    uint32 exp = (ix >> 23) & 0xFF;
    int32 log2 = int32(exp) - 127;

    return log2;
}
B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 May 10, 2006 at 01:52

Ok, here is something unexpected:

I timed bsr against the float version.. guess which one was faster (at least on my 1.6ghz athlon). I even didn’t tried to optimize the float-version.

int bsr_test (void)
{
  int ret = 0;
  _asm {
    mov ecx, 0x00400000
    xor edx, edx
again:
    bsr eax, ecx
    add edx, eax
    dec ecx
    jnz again
    mov ret, edx
  }
  return ret;
}

int flt_test (void)
{
  int ret = 0;
  float temp;
  _asm {
    mov ecx, 0x00400000
    xor edx, edx
again:
    mov  temp, ecx
    fild temp
    fstp temp
    mov eax, [temp]
    shr eax, 23
    and eax, 255
    sub eax, 127
    add edx, eax
    dec ecx
    jnz again
    mov ret, edx
  }
  return ret;
}


void main (void)
{
  LARGE_INTEGER t1,t2,t3;
  int i=0;
  int j=0;

  QueryPerformanceCounter (&t1);
  for (int k=0; k<10; k++)
    i += bsr_test();
  QueryPerformanceCounter (&t2);
  t2.QuadPart -= t1.QuadPart;

  QueryPerformanceCounter (&t1);
  for (int k=0; k<10; k++)
    j += flt_test();
  QueryPerformanceCounter (&t3);
  t3.QuadPart -= t1.QuadPart;

  printf ("time bsr = %d, result = %d\n", (int)t2.QuadPart,i);
  printf ("time flt = %d, result = %d\n", (int)t3.QuadPart,j);
}
99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 10, 2006 at 12:53

@Nils Pipenbrinck

I timed bsr against the float version.. guess which one was faster (at least on my 1.6ghz athlon). I even didn’t tried to optimize the float-version.

Yeah I get the same results on an Athlon 64. Even a simple loop to shift the value one bit at a time is faster…

I checked the AMD manuals and it turns out the bsr instruction has a high latency and uses a slow decoder. So it was a bad example (at least for Athlon processors). But the results could have been totally opposite. I should have checked. :blush:

46462f88a1670d7e9cbbfa360aa20134
0
juhnu 101 May 10, 2006 at 13:32

@Nick

Yeah I get the same results on an Athlon 64. Even a simple loop to shift the value one bit at a time is faster… I checked the AMD manuals and it turns out the bsr instruction has a high latency and uses a slow decoder. So it was a bad example (at least for Athlon processors). But the results could have been totally opposite. I should have checked. :blush:

Well, at least it was a good example that hand-written asm is not necessarily 300% faster ;)

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 10, 2006 at 14:05

Touché :wub:

BTW, anyone care to test my piece of code as well? I could type it in myself, if only I weren’t that lazy.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 10, 2006 at 19:27

@juhnu

Well, at least it was a good example that hand-written asm is not necessarily 300% faster ;)

Hehe. :worthy:

Well, it doesn’t prove anything either. The most important thing about low-level performance tuning is to profile. That counts for C++ as well. For example bubble sort can be fast for a small number of elements. And I could swear bsr was fast on older processors (and maybe it still is on Intel processors). :closedeye

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 10, 2006 at 19:32

@.oisyn

BTW, anyone care to test my piece of code as well? I could type it in myself, if only I weren’t that lazy.

It’s about two times slower than bsr on an Athlon 64. Oddly a non-branchless version is the fastest I got for now (about two times faster - using random input):

int log2(int value)
{
    int log = 0;

    if(value > 0x0000FFFF) {value = value >> 16; log += 16;}
    if(value > 0x000000FF) {value = value >> 8;  log += 8;}
    if(value > 0x0000000F) {value = value >> 4;  log += 4;}
    if(value > 0x00000003) {value = value >> 2;  log += 2;}
    if(value > 0x00000001) {                     log += 1;}

    return log;
}

Haven’t compared with the floating-point version yet…

B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 May 10, 2006 at 22:46

@Nick

Hehe. :worthy: Well, it doesn’t prove anything either. The most important thing about low-level performance tuning is to profile. That counts for C++ as well. For example bubble sort can be fast for a small number of elements. And I could swear bsr was fast on older processors (and maybe it still is on Intel processors). :closedeye

This evening I tested the code on a P4 (hyperthreading) machine at work.. the bsr-version was 20% faster than the float version :)

Back, when I did assembly programming on the 386 (when bsr had it’s debut) it was a horrible slow instruction as well. Almost as slow as a multiply, but still the fastest way to to get the highest set bit. I’ve never found a good use for it though. I’ve used it once to detect mipmap levels for a software rasterizer.

On XScale CPU’s however there’s a very important instruction that counts the leading zeros of a dword (almost the same as bsr, just the reverse), it’s damn important for this cpu since the XScale doesn’t has a hardware divide.

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 11, 2006 at 07:05

Quick note, if you’re doing a micro-bench taking a couple of cycles then you’d better stay clear from system call like PerfCounters (context switch, PIC resolution, >5000 cycles last time i measured… ok, that depends on your HAL, but still…).

PS: Writing a modern raytracer in asm? :wacko:

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 11, 2006 at 08:57

@tbp

PS: Writing a modern raytracer in asm? :wacko:

What do you mean?

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 11, 2006 at 09:13

I mean it doesn’t make sense. But we’ve already had that discussion on the defunct flipcode.

Much like you’re better using _BitScanForward than some inline asm bsr with MSVC (or anything that follow its silly inline asm mechanism), it’s much more productive to use intrinsics if you’re trying to write a modern SSE aware raytracer.

Because i haven’t seen yet any serious discussion about how that asm integrates with the rest (call conventions, inlining etc).

I’m not saying that you shouldn’t look at the generated code (hence requiring at least some notion of asm) or fix things with ad-hoc asm when the compiler derails.
But those broad 3x speedup claims are just uter bollocks; of course as someone said: You can write Fortran in any language.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 11, 2006 at 12:40

Intrinsics are still assembly. It just integrates with the compiler pipeline and performs register allocation. But you still needs ‘pure’ assembly knowledge to make good use of it. And like you said sometimes the compiler derails so writing inline assembly still has its merits. If you write whole loops in inline assembly it doesn’t matter how it integrates anyway and you can control register allocation (and spilling) yourself for maximum control. You can even use __declspec(naked)…

So is assembly useful for raytracers? No doubt about it, using either intrinsics or inline assembly.

And 3x speedup is definitely possible for some classes of applications. Think about the psadbw operation, which can compute the sum of the absolute difference of 16 bytes in just a couple clock cycles. This is extremely useful for matching operations, like motion estimation in video encoding (cfr. webcam). Also, with Intel Core 2 Duo released this summer, SSE will be four times faster than the FPU in every situation. Furthermore, the extra registers lower pressure on the cache.

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 11, 2006 at 13:44

@Nick

Intrinsics are still assembly. It just integrates with the compiler pipeline and performs register allocation. But you still needs ‘pure’ assembly knowledge to make good use of it.

’++x’, ‘a += b’, ‘p->’, ‘m’ etc… the whole - ok most - C language directly maps to “asm” by design (and by extension C++).
So that’s not news.
The real problem is indeed
a) to know what really flies on a given hardware
b) express it

@Nick

And like you said sometimes the compiler derails so writing inline assembly still has its merits. If you write whole loops in inline assembly it doesn’t matter how it integrates anyway and you can control register allocation (and spilling) yourself for maximum control. You can even use __declspec(naked)…

That’s where we disagree. Unless you have access to better asm inlining mechanisms (ie constraints with gcc, which are a freaking pain to deal with), there’s no way your asm can properly integrate with the rest of the flow: you *will* step on the toes of the optimizer. You force it to kludge around the registers you clobber (remember it’s its code that surrounds you, not the other way around). You enforce a calling convention, naked, which prohibits any inlining or shortcuts through prologue & epilogue. No dead store removal can happen. Etc… In fact it’s just a big opaque blob of untouchable bits.
So, you better be sure you’re going to reap enough benefits from your enlighted hand coding to pay for pessimization happening all over.
I’m not saying it’s not possible or probable, just that it’s no panacea. And that i find more productive to work with the compiler :)
@Nick

Also, with Intel Core 2 Duo released this summer, SSE will be four times faster than the FPU in every situation. Furthermore, the extra registers lower pressure on the cache.

Hmmk. But most compilers already produce excellent scalar SSE code.
Granted they auto-vectorize like crap; now unless you switch to SoA, your vector code will perform like <bleep> anyway because fundamentally that not how things are meant to be.
And with higher level code you don’t need to rewrite all your code because your register file has doubled, or not to the same extent.

Granted, the situation isn’t as rosy with the integer side of SSE.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 11, 2006 at 14:42

@tbp

That’s where we disagree.

Do we? ;) I never said inline assembly is superiour or anything. In fact my opinion is to use use high-level as much as possible.

But all code has its benefits and so does inline assembly. When manual register allocation matters (the programmer mostly knows best which data needs fast access and what can be spilled) it’s a useful option. And by including the surrounding code into the assembly block (mostly loops and function prologue/epilogue) the overhead can be minimized. Also, intrinics for SSE use a general-purpose register for a 16-byte aligned stack frame, so they also have an overhead. I mostly set up an aligned stack frame myself using ebp.

I absolutely agree it’s no panacea at all. And I won’t hesitate to use another approach if that’s more suited for the situation. Simple as that. In fact my performance critical application uses extremely little inline assembly. I do use tons of SIMD run-time intrinsics (dynamic code generation) though, because that’s what makes my application tick. I’d avoid that as well if it was an option. It always comes down to using the best tool for the job, and inline assembly is definitely in my toolbox.

But most compilers already produce excellent scalar SSE code.

Indeed, but that doesn’t gain us much. There’s huge potential in future processors for both integer and floating-point SIMD processing that compilers will not be able to use efficiently any time soon. So using assembly, in whichever form most applicable, is very valuable.

Personally I think it would be most useful to add SIMD support directly to the C++ language, using the syntax from HLSL. So we’d primarily have a float4 type, which automatically takes care of all alignment requirements. Intrinsics can take care of exotic instructions (or sequences of instructions) but they need a proper namespace and simple syntax, like simd::sad(byte8, byte8) instead of _mm_sad_pu8(_m64, _m64). And HLSL swizzling and masking syntax would be very useful as well. But now I’m just dreaming awake I think… :whistle:

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 11, 2006 at 15:23

Right. I still don’t think that generally going for straight asm is worth the cost you pay for annoying the compiler, but of course that’s the same kind of broad statement i was whining about to begin with ;)
@Nick

Also, intrinics for SSE use a general-purpose register for a 16-byte aligned stack frame, so they also have an overhead. I mostly set up an aligned stack frame myself using ebp.

Err, nope, the whole stack is promoted to 16-byte alignment, on doze/linux with msvc/gcc/icc, on proper condition.

Like you i find those intrinsics incredibly verbose, they make my eyes go on strike. It’s almost as noisy as Java *ducks*
So i never get out without my macro kit. Macros, now that’s bleeding edge!

GCC has the kind of builtin float4 type you’re looking for. But that doesn’t solve the fundamental problem. I mean it will only make the 3-component-vector-class-mapped-to-SSE syndrom more prevalent than it is (with it’s usual corollary question: why isn’t it faster?). On that front only better auto-vectorization will help i think.

946a7917ecd7a6fc9ffb881dfdf23067
0
Sephir 101 May 11, 2006 at 16:05

Well, I am sorry if I was unclear or not clear enough for that matter…
As I said, I got tired of C++. As I mentioned I also code as a hobby so the answers for:
Q.: “Do you want asm for performance?”
A.: No. I code it just to read something else but C++.

Q.: “Why not JAVA, Csharp, other crap?”
A.: Well, a hobbyist programmer has own rights and priviledges to choose the language s/he wants to code in. I Choose asm. That’s all. Not only that but the others are very alike which would kill the whole point in programming in a different language.

Q.: “How long do you program?”
A.: C++? Just 2 years. It was enough tho. I can say it’s a perfect tool. But still got tired of it.

Q.: “Want to learn how to program assembly?”
A.: Actually not. I know what I need. I just wanted something to get going without having to go all that long way before blitting something useful in the screen. That’s why I asked an engine or just a simple renderer not tutorials.
I dont know everything I should and I am not a master neither of asm nor cpp but since I dont do it ‘for a living’ I can actually endure bugs.
@cypher543

If you are still interested, here is an open source 3D RTS created entirely in 32bit Assembly: http://www.oby.ro/rts/index.html

Thank you very much I will check it out. Looks promising.

Anyway, thanks you all for your input. Very nice to see people active.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 11, 2006 at 16:31

I’m programming C++ myself for over 10 years now and I’m still not getting tired of it ;)

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 12, 2006 at 07:35

@tbp

Err, nope, the whole stack is promoted to 16-byte alignment, on doze/linux with msvc/gcc/icc, on proper condition.

Compiling the following in Visual C++ 2005 in Debug mode I get “warning C4731: ‘main’ : frame pointer register ‘ebx’ modified by inline assembly code”.

void main()
{
    __declspec(align(16)) float x[4] = {0, 0, 0, 0};

    __asm
    {
        mov ebx, 0
        movaps xmm0, x
    }
}

Ironically it doesn’t really use ebx and just aligns esp like you said. So it appears to be a legacy limitation. I’m positive this caused trouble a couple years ago. Good to know it has been resolved! :yes:

4e70f904a74bd2aa8773733b25b77d41
0
SigKILL 101 May 12, 2006 at 17:07

@Nick

Compiling the following in Visual C++ 2005 in Debug mode I get “warning C4731: ‘main’ : frame pointer register ‘ebx’ modified by inline assembly code”.

void main()
{
    __declspec(align(16)) float x[4] = {0, 0, 0, 0};

    __asm
    {
        mov ebx, 0
        movaps xmm0, x
    }
}

Ironically it doesn’t really use ebx and just aligns esp like you said. So it appears to be a legacy limitation. I’m positive this caused trouble a couple years ago. Good to know it has been resolved! :yes:

Just curious, but could you explain this? I clearly see a “mov ebx,0”, but you claim it doesn’t use ebx.

6f0a333c785da81d479a0f58c2ccb203
0
monjardin 102 May 12, 2006 at 17:39

I think he is refering to the compiled C code that wraps the assembly block.

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 12, 2006 at 18:20

No. That ‘x’ is allocated on the stack, and it’s the compiler that resolves that C/C++ symbol from the inline asm part, but through esp, not ebx… because the whole stack is properly aligned as said earlier.

BTW Nick, that’s

int main()

;)

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 12, 2006 at 18:23

@SigKILL

Just curious, but could you explain this? I clearly see a “mov ebx,0”, but you claim it doesn’t use ebx.

It doesn’t use ebx as an aligned stack pointer. On older Visual C++ compilers the “movaps xmm0, x” translates to “movaps xmm0, [ebx+<offset of ‘x’ on aligned stack>]”. So that would cause trouble if I used ebx myself (like assigning zero in my example). Luckily Visual Studio 2005 uses the standard esp and ebp registers for stack access, so ebx is available for us. But apparently the warning still exists.

6f0a333c785da81d479a0f58c2ccb203
0
monjardin 102 May 12, 2006 at 18:27

Well, that’s what I meant. I’m a bit out of my element. :wacko: I haven’t done much x86 assembly in years.
I did get a 3-4x speed-up in some code I rewrote in assembly for a microcontroller earlier this week. :clapping:

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 12, 2006 at 18:36

@tbp

BTW Nick, that’s

int main()

;)

I like consistency. :whistle:

647e430ca6b2c38d008dc55b1c3a7ecc
0
karligula 101 May 12, 2006 at 19:05

One good and undeniable reason to learn assembler is to write compilers… somebody’s got to do it, they don’t write themselves!

6f0a333c785da81d479a0f58c2ccb203
0
monjardin 102 May 12, 2006 at 19:11

Oddly enough, I think most C compilers are written in C. It’s not unusual for a compiler to be written in the language that it compiles.

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 12, 2006 at 19:22

@monjardin

Oddly enough, I think most C compilers are written in C. It’s not unusual for a compiler to be written in the language that it compiles.

Yeah so it can boostrap/compile itself. Now much are written in C because historicaly there was no C++ when they’ve seen the light.
Unlike http://llvm.org/

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 12, 2006 at 21:14

@monjardin

Oddly enough, I think most C compilers are written in C. It’s not unusual for a compiler to be written in the language that it compiles.

And what kind of output do you think a C compiler has? ;)

4e70f904a74bd2aa8773733b25b77d41
0
SigKILL 101 May 12, 2006 at 21:50

@Nick

It doesn’t use ebx as an aligned stack pointer. On older Visual C++ compilers the “movaps xmm0, x” translates to “movaps xmm0, [ebx+<offset of ‘x’ on aligned stack>]”. So that would cause trouble if I used ebx myself (like assigning zero in my example). Luckily Visual Studio 2005 uses the standard esp and ebp registers for stack access, so ebx is available for us. But apparently the warning still exists.

Well, google didn’t give me an answer so please correct me if I’m wrong. But, if I understand you correct vc will use an aligned stack (in addition to the ‘regular’ stack) when using non-standard alignment (using ebx as the stack pointer). But, why doesn’t the compiler use the ‘regular’ stack and simply skip bytes to get everything properly aligned? It also seems likely that having alot of local variables with different alignments would make this quite messy… I have a feeling there is something I’ve misunderstood in this matter…

6b7e1a4b42e4b47d92fdef8bf2bd8e2c
0
Jare 101 May 12, 2006 at 23:22

@SigKILL

why doesn’t the compiler use the ‘regular’ stack and simply skip bytes to get everything properly aligned?

I suppose alignment happens at runtime, and therefore the amount of padding bytes in the stack varies, even if the total allocation doesn’t - it would just be the largest amount of padding needed. The compiler would emit a code sequence that assigns ebx to the correctly aligned address, and index from there.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 May 13, 2006 at 01:24

Compilers (well, at least gcc) usually tend to align new stack frames on 16-byte or 32-byte boundaries anyway, right? In looking at the assembly from gcc you frequently see “sub $8, %esp” or something similiar immediately before parameters are pushed and a call made. I always assumed that extra stack space was padding for alignment reasons…

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 13, 2006 at 01:44

For gcc it’s a bit different as -mpreferred-stack-boundary has been there for ages.

http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_2.html#SEC31

Works better with a saner ABI :whistle:

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 May 13, 2006 at 19:57

@Nick

I like consistency. :whistle:

It is consistent. The spec says main has an implicit zero return value if you don’t return anything yourself.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 13, 2006 at 23:36

@Axel

It is consistent. The spec says main has an implicit zero return value if you don’t return anything yourself.

That’s exactly what I meant. The C++ specifications say the return type is int, yet it’s not required to return anything (not consistent). So specifying void is strictly speaking out of spec although lots of compilers support it. But I like consistency more than obliging some stupid standard. ;)

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 May 14, 2006 at 02:03

Why not just declare int main() and throw in the return 0 at the end? Then you’d have consistency, and keep the standard-yappers quiet too ;)

Da26e799270ce5e8b62659ed77b11cef
0
Axel 101 May 14, 2006 at 08:56

Especially because every C-program returns a value to the operating system, so “void” makes no sense either.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 14, 2006 at 09:27

@Reedbeta

Why not just declare int main() and throw in the return 0 at the end? Then you’d have consistency, and keep the standard-yappers quiet too ;)

Because that would make every mini-example two lines longer for no practical reason.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 14, 2006 at 09:36

@Axel

Especially because every C-program returns a value to the operating system, so “void” makes no sense either.

Specifying void just means you don’t care about that.

And it’s not part of the language, nor does it alter the program’s behaviour. If an O.S. decided that every program has to reset every register to zero, is that our concern? Likewise, if there’s an O.S. that doesn’t use the return value mechanism then I don’t want it to be forced in the language.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 15, 2006 at 09:44

How is it not part of the language? The specification specifically says: main() shall return int.

main isn’t just a function like any other. You are not allowed to call main() yourself, and compilers are allowed to interpret the ‘return x’ as ‘exit(x)’ (although local objects will be destroyed first, as opposed to when calling exit() yourself)

C4b4ac681e11772d2e07ed9a84cffe3f
0
kusma 101 May 15, 2006 at 10:05

it’s not part of the grammar, but it is indeed part of the language.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 15, 2006 at 13:00

@.oisyn

How is it not part of the language? The specification specifically says: main() shall return int.

Sure, it’s in the specification. But it’s in there because the O.S. expects it, not because the language would be impossible without it. As kusma suggests, I should probably have said “it’s not part of the grammar”. You can write a perfectly valid program without returning anything to the O.S. (even if behind our backs the compiler adds a ‘return 0’ to keep the O.S. happy). There are plenty of languages that don’t return anything. And for example C# allows either void or int in a consitent way. I see little reason why I can’t expect the same from C++, even if that means breaking compatiblity with some legacy compilers. Deprecation is a necessary step to improve things. I just like to give it a push. :happy:

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 15, 2006 at 14:07

It’s there because the specification describes how a well-formed program entrypoint should be defined, not whether you need one (one can be supplied by the implementation, for example). You are not allowed to predefine main yourself, nor are you allowed to use it for any other entity (such as a variable) in the global namespace. If a framework needs an entry point that doesn’t comply to the definition of main, it shouldn’t use main, it’s as simple as that :). Main is reserved for a function that returns int, basta.

And grammar isn’t the correct term, that’s only about parsing C++. That void* can’t be implicitely cast to MyType* isn’t in the grammar either, because it is part of the semantics ;)

2fcd95b0b62d18275c6b5a6f23f29791
0
tbp 101 May 15, 2006 at 15:44

@Nick

I see little reason why I can’t expect the same from C++, even if that means breaking compatiblity with some legacy compilers. Deprecation is a necessary step to improve things.

Your honour,
given that it wasn’t mandated by the C spec but it is by the C++ spec, wouldn’t that rather be superprecation? Or undeprecation? :blink:

87e614b8b888bb2c4485c1ac16d8c779
0
moe 101 May 15, 2006 at 18:40

Good point tbp.

I’am not sure about the
__declspec(align(16)) float x[4] = {0, 0, 0, 0};
but I think Nick’s code would compile as both C and C++. Since there is the same asm-code in both cases, it is safe to say that .oisyn as well as Nick are right even though they are having opposite opinions. It’s a perfect example how you can literally create anything with C or C++. Even a paradox. You could not have done the same directly in asm!

Sorry, I couldn’t resist… :)

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 15, 2006 at 23:37

I just love pointless discussions… :wub:

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 16, 2006 at 00:33

@tbp

Your honour,
given that it wasn’t mandated by the C spec but it is by the C++ spec, wouldn’t that rather be superprecation? Or undeprecation? :blink:

btw, main is restricted to return int in C as well since C99 ;)

moe: Given the fact that implementation-specific features are used anyway (__declspec, __asm), I don’t think whether or not it should compile is really an issue, as most compilers allow main returning void anyway :P

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 16, 2006 at 00:51

Anyway, maybe a more practical argument not to use void (or anything other than int for that matter) is that the runtime expects main to return an int, to pass the return value to the exit() function. If you define main as returning void, the result is actually undefined (the eax register is not set to 0 on x86 for example), which may lead to unexpected behaviour when another program is monitoring the exit code of your process :blink:

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 May 16, 2006 at 05:48

@.oisyn

If you define main as returning void, the result is actually undefined (the eax register is not set to 0 on x86 for example)

@Axel

The spec says main has an implicit zero return value if you don’t return anything yourself.

:whistle:
(this message is at least 10 characters)

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 16, 2006 at 08:44

Why would it return 0 if the return-type is void? The implicit return value only holds when the return-type is int, you can’t return 0 from a void function :blink:

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 16, 2006 at 12:30

@.oisyn

Why would it return 0 if the return-type is void? The implicit return value only holds when the return-type is int, you can’t return 0 from a void function :)

Yet that’s exactly what C# does it.

For a C++ console project in Visual C++ ‘main’ is not the actual entry point, it’s ‘mainCRTStartup’. So if ‘main’s return type is void then ‘mainCRTStartup’ can still return a useful value to the O.S. Every compiler supporting ‘void main’ has to use a similar mechanism. Not the slightest problem. And fully consistent.

Besides, let’s face it, what’s the worst that could happen? If a compiler doesn’t support it, and these are practically extinct, then it will give you an error message. In this exceptional situation you can simply write ‘int main’ and you’re done. People using such strict compilers are always aware of this. And I’m perfectly ok if anyone writes ‘int main’, I just think it’s silly to argument that it’s wrong when I write ‘void main’.

Heck, only GCC truely attempts to be ANSI compliant, and only when including the -ansi option.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 16, 2006 at 12:51

Trying to get back on topic, at least a little, here’s some intriguing C/C++ code:

int i = 0;
function(i++, i++);

Will this call function(0, 0), function(0, 1) or function(1, 0)? The standards do not specify! Talking about consistency…

Assembly makes everything explicit. Even though that’s not a reason to write everything in assembly, it’s extremely useful to know assembly and step into the assembly debugger when you don’t understand what a C/C++ statement is really doing. It also helps getting a deeper understanding of high-level concepts like virtual function calls. So you’ll make a lot less mistakes, write more efficient C/C++ code, and it speeds up debugging.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 16, 2006 at 15:19

@Nick

For a C++ console project in Visual C++ ‘main’ is not the actual entry point, it’s ‘mainCRTStartup’. So if ‘main’s return type is void then ‘mainCRTStartup’ can still return a useful value to the O.S. Every compiler supporting ‘void main’ has to use a similar mechanism. Not the slightest problem. And fully consistent.

No, because the runtime can’t detect whether you specify main returning an int or a void. So the compiler should return a 0 anyway (VC++ actually does this)

Besides, let’s face it, what’s the worst that could happen? If a compiler doesn’t support it, and these are practically extinct, then it will give you an error message. In this exceptional situation you can simply write ‘int main’ and you’re done. People using such strict compilers are always aware of this. And I’m perfectly ok if anyone writes ‘int main’, I just think it’s silly to argument that it’s wrong when I write ‘void main’.

You can find it silly all you want, but you can find it just as silly that, for example, void pointers can’t be implicetely cast to any other pointer even if your compiler supports it. The fact is that there is a standard, and that strictness counts. If you don’t agree with the standard, that’s fine by me, and by all means write a letter to the committee, but until it’s changed writing void main() isn’t pure C++ and you’ll get remarks on forums like these when you do :blink:

Heck, only GCC truely attempts to be ANSI compliant, and only when including the -ansi option.

*cough*Comeau*cough*
A very nice fully compliant compiler that compiles to C code compatible with all the major compilers. It even supports export on templates (now that is something that should be deprecated).
@Nick

Trying to get back on topic, at least a little, here’s some intriguing C/C++ code:

int i = 0;
function(i++, i++);

Will this call function(0, 0), function(0, 1) or function(1, 0)? The standards do not specify! Talking about consistency…

It has nothing to do with consistency (what actually isn’t it consistent with?), and besides, the standard does specify it, it specifies that a call like that is undefined. It may as well format your harddrive :P, so don’t use it. And assembly has it’s undefinedness as well, like doing a bsf or bsr on a zero input.

B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 May 16, 2006 at 18:45

@.oisyn

And assembly has it’s undefinedness as well, like doing a bsf or bsr on a zero input.

Exactly. Ever tried bswap eax with a operand-size override prefix?

db 0x66
bswap eax

It does funny things on different cpus.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 16, 2006 at 19:36

Isn’t that just ‘bswap ax’? Or do you mean it is the top 16 bits of eax that are undefined?

B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 May 16, 2006 at 19:58

@.oisyn

Isn’t that just ‘bswap ax’? Or do you mean it is the top 16 bits of eax that are undefined?

It is just bswap ax. Sometimes it swaps the upper bytes of eax, and zeroes out the lower two bytes.. sometimes it swaps ah with al… sometimes it’s the same as bswap eax, sometimes it does nothing at all..

Highly undefined behaviour.

68b0a5876289e2b5351cc1956ba80dc8
0
Almos 101 May 17, 2006 at 07:31

As a side note… Alessandro Ghuinola created a 3d engine in assembly by embedding asm into C++ code. Good news for us is that he released the code, which can be downloaded HERE:

http://anywherebb.com/postline/posts.php?t=409&p=2

Look under “Noctis IV source code”.

Good luck trying to figure it out.

647e430ca6b2c38d008dc55b1c3a7ecc
0
karligula 101 May 26, 2006 at 17:55

Regarding the

int i=0;
function(i++,i++);

thing… isn’t the comma a sequence point, meaning that the expressions are evaluated in that sequence… so it should become function(0,1) and leave i=2…?

6f0a333c785da81d479a0f58c2ccb203
0
monjardin 102 May 26, 2006 at 18:49

No, the compiler has leeway to rearrange the order in which function parameters are evaluated for optimization purposes.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 May 26, 2006 at 19:20

As a matter of fact Visual C++ 2005 evaluates them from right to left. The most probable reason is that most calling conventions require arguments to be pushed on the stack right to left. And the reason for that is so the arguments are in a logical left to right order on the stack, because the stack ‘grows’ downward. :surrender

6f0a333c785da81d479a0f58c2ccb203
0
monjardin 102 May 26, 2006 at 20:24

It also allows functions like sprintf to work since the first argument is always on top of the stack.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 May 26, 2006 at 21:24

@karligula

Regarding the

int i=0;
function(i++,i++);

thing… isn’t the comma a sequence point, meaning that the expressions are evaluated in that sequence… so it should become function(0,1) and leave i=2…?

No. While the comma operator is indeed a sequence point, the comma in the function call expression does not behave like the comma operator.