0
103 Jan 25, 2009 at 19:01

Greetings. Long time no post. Long post.

I’m writing a routine to convert a float number from float to string format.
I’m aware there’s sprintf() to do the job already, no worry.

My goal is to make conversion from float to string without calling sprintf(), AND without using any float datatype nor the pow() function.

My function is actually ready and working.
Now I would like to make it faster, because there’s lot of room for improvement.
Full code is very long to show, so I won’t, unless you ask.

My first issue was to calculate a power of 2, or an inverse power of 2 where required, by using only integer math.
Given the limited amount of significant digits to show, for the final rapresentation of a float, I concluded that using unsigned __int64 to calculate these powers was enough.

Power of 2 are calculated in a simple manner: keep multipling by 2 so long you are not about to cause an overflow. If overflow is about to happen, discard the last digits of the current power ( Pow2 /= 10; ), take note that you lost a digit, perform rounding if needed, and resume the multiplications by 2.

The inverse of a power of 2 is identical in mechanics, with the differece that I calculate powers of 5 and then adjust the exponent by dividing for the corresponding power of 10.
In other words:

//     -n    n     n
//    2   = 5  / 10


Of course I adopt basic optimizations to cut the calculation times of the first powers.
For example the first 27 powers of 5 are pre-generated and held in a lookup array of DWORD64.

The first place to optimize is where I discard the last digit of a power, then perform the rounding.

// If current power is any greater than this, the next power will overflow.
#define kThreshold ((DWORD64 (1) << 63) - 1)

DWORD64 Pow2;
.
.
DWORD Lost = 0;
while (Exponent--)
{
// Are we about to overflow?
if (Pow2 > kThreshold)
{
//////////////////////////////////
// Yes, discard the last digit.

// Reduce current power to 1/10th.
DWORD64 _10th = Pow2 / 10;

// Retrieve the lost digit.
const DWORD Digit = DWORD (Pow2 - (_10th * 10));

// Perform rounding (if necessary).
if (Digit > 4) ++_10th;

++Lost; // Keep note we lost another digit.
Pow2 = _10th;
}

// Calculate next power.
Pow2 *= 2;
}

#undef kThreshold


Very basic, as you see.
I know “Pow2 / 10;” can be rewritten as “(Pow2 >> 1) / 5;”, which should (in theory) speed up the division, but it also creates 1 extra temporary value from “(Pow2 >> 1)”.
Profiling showed no sensible performance gain =/
The same goes for the “(_10th * 10)” which may become “((_10th * 5) << 1)”.

What bothers me most, here, is the approach I follow to get the value of Digit.
I could obtain it by doing “Pow2 % 10”, but that would be a(nother) division.
And I need to perform “Pow2 / 10” anyway, so I prefer the multiplication method.

I was wondering if there’s a clever trick to isolate the last digit of a number in base 10.
Can’t think of any…

Another candidate for optimization is where I actually sum the calculated powers to reconstruct the Integer and Fractional parts of the float.
Using DWORD64 to store the powers, give me only 19 digits of precision.

So I thought to convert those powers to string form, and sum them, digit by digit.
The sum process itself is not as slow as one might imagine.
The real problem is the conversion from integers to string form.

The idea to use strings was born to finish the routine quickly.
Now I need to make things better.

Any suggestions?

Ciao ciao : )

[EDIT]
Oops… after writing this I went back to the program and ‘noticed’ I was doing a big stupid thing.
There’s no need to sum the individual powers one by one. I get the same result by offsetting the bits for the integer or fractional part.

I got carried away with the calculation of powers and lost sight of the whole :p

Now the times required for converting to string format have decreased (roughly) from 1/3rd to 1/7th.

The problem of extracting the least significant digit from a number in base 10 remains, though.
Can’t stop thinking there must be a faster way than what I do now…

Ciao ciao : )
[/EDIT]

#### 5 Replies

0
103 Jan 26, 2009 at 22:05

I played with bits and Calc.exe. That is, I experimented loosely like I do when I’m lost (everyone has his own way to deal with problems).
Looking for a way to simplify the >> 1, then / 5 thing, I noticed a bizarre property.
(I call it ‘property’ now that I see it is not the coincidence I thought in the beginning).

Suppose you want to divide a 16 bits number by 10.

Example:
59561 (binary: 1110.1000 | 1010.1001)

Goal is: 59561 / 10 = 5956 (binary: 1.0111 | 0100.0100)

If we take 16 bits all ON, we obtain this value: 65535 (such a discovery :p )
Divide this number by 5, and obtain 13107. A magic number.

Now, if we multiply 59561 by 13107, we obtain:
780666027 (binary: 10.1110 | 1000.1000 | 0000.0100 | 1010.1011).

We can almost recognize in this thing the bits for the 5956 we are after…
In fact if we add 1, and then divide by 2 (or shift right by 1), we obtain:
390333014 (binary: 1.0111 | 0100.0100 | 0000.0010 | 0101.0110)

Divide this by 65535 (or shift it right by 16 bits), and what remain will spell: 5956.1000etc, the integer part of which is 5956 : )
And the division by 10 is done without using division.

Honestly I don’t understand why it is so. But I see it is so every time.

What I described works for any 16 bit number to divide by 10.

If I want to use it for 8 bits numbers, the magic number is: ((1 << 8) - 1) / 5
And then take the higher 8 bits of the final 16 bits value.

Similarly for a 32 bits number to divide by 10, the magic number is: ((1 << 32) - 1) / 5
And then take the higher 32 bits of the final 64 bits value.

So… for a 64 bits number to divide by 10 (what I’m after), I have to take the higher 64 bits of the final 128 bits value.
Switching to 128 bits integers is too much, I think.

I’m lost again. Help?

Ciao ciao : )

0
165 Jan 26, 2009 at 22:39

I think you’ve (re)discovered the fact that in modular arithmetic, division can be expressed as multiplication (just like subtraction can be expressed as addition). The number 13107 is the multiplicative inverse of 5, in mod 2\^16 arithmetic.

One curious thing about this kind of inverse is that in mod p arithmetic, inverses only exist for numbers relatively prime to p. So in this case p = 2\^16, and so inverses exist for all odd numbers, but no even ones. Therefore you can divide by any natural number with a combination of bit-shifting and multiplication. Isn’t math fun? :)

0
103 Jan 26, 2009 at 23:16

I had the strong feeling this wasn’t new. It just couldn’t be : )
I happened to find out, by luck, something I don’t comprehend.

Modular Multiplicative Inverse???
All I knew was that by multiplying a number by some other big number, eventually an overflow would lead somewhere interesting.

Shocking reasoning to hear, I suppose, but Math was never in my school program.
And I have no math-geek close friend, everyone seem to have, I can ask to.

I need to stick to 64 bits integers.
So I need to find those higher 64 bits I’m after, already placed in the lower 64 bits of the integer used for calculations.

I imagine there’s a magic number that does the trick.
In a decade or so I will find it…

Ciao ciao : )

0
101 Jan 27, 2009 at 10:37

Hey Nautilus,

Have you taken a look at the compiler output? Most likely the compiler will already do the multiplicative inverse trick for you if you divide by a constant.

A recent GCC for example turns the following oode.

int t1 (unsigned int n)
{
return n/10;
}


Into this assembler code:

_t1     LABEL NEAR
push    ebp                                     ; 0000 _ 55
mov     eax, 3435973837                         ; 0001 _ B8, CCCCCCCD
mov     ebp, esp                                ; 0006 _ 89. E5
mul     dword ptr [ebp+8H]                      ; 0008 _ F7. 65, 08
pop     ebp                                     ; 000B _ 5D
shr     edx, 3                                  ; 000C _ C1. EA, 03
mov     eax, edx                                ; 000F _ 89. D0
ret                                             ; 0011 _ C3


So x/10 is the same as:

 (x*0xCCCCCCCD) >> (32+3)


(little hint for x86 assembly: after mul the top 32 bit of the result end up in EDX, and the low 32 bit in EAX.. That’s where the shift by 32 comes from that you won’t find in the asm-code. The compiler picks the correct register at the first place).

0
103 Jan 27, 2009 at 15:10

@ Nils:

Aye, you are right.
Here is my output for division of a 32 bits integer.

//-------
// source
DWORD DivideBy10 (DWORD n)
{
return n / 10;
}

//-------------------
// code inside main()
DWORD In  = DWORD (::rand ());
DWORD Out = DivideBy10 (In);

//-----------
// asm output
PUBLIC  ?DivideBy10@@YAKK@Z             ; DivideBy10
; Function compile flags: /Ogtpy
; File e:\programs\vs 2005\projects\test\main.cpp
;   COMDAT ?DivideBy10@@YAKK@Z
_TEXT   SEGMENT
_n$= 8 ; size = 4 ?DivideBy10@@YAKK@Z PROC ; DivideBy10, COMDAT ; 11 : return n / 10; mov eax, -858993459 ; cccccccdH mul DWORD PTR _n$[esp-4]
mov eax, edx
shr eax, 3

; 12   : }

ret 0
?DivideBy10@@YAKK@Z ENDP                ; DivideBy10


But look what happens when it’s a division of a 64 bits integer:

//-------
// source
DWORD64 DivideBy10 (DWORD64 n)
{
return n / 10;
}

//-------------------
// code inside main()
DWORD64 In  = DWORD64 (::rand ());
DWORD64 Out = DivideBy10 (In);

//-----------
// asm output
PUBLIC  ?DivideBy10@@YA_K_K@Z               ; DivideBy10
; Function compile flags: /Ogtpy
; File e:\programs\vs 2005\projects\test\main.cpp
;   COMDAT ?DivideBy10@@YA_K_K@Z
_TEXT   SEGMENT
_n$= 8 ; size = 8 ?DivideBy10@@YA_K_K@Z PROC ; DivideBy10, COMDAT ; 11 : return n / 10; mov eax, DWORD PTR _n$[esp]
mov ecx, DWORD PTR _n\$[esp-4]
push    0
push    10                  ; 0000000aH
push    eax
push    ecx
call    __aulldiv

; 12   : }

ret 0
?DivideBy10@@YA_K_K@Z ENDP              ; DivideBy10


It calls a __aulldiv function (“ull” = Unsigned Long Long, like the suffix for literals?).
I’m sure it’s a generic function, in that it doesn’t specialize in any division.
I changed source code to divide by 17 or 73 (prime numbers), the asm output was the same (save for the “push 10”).

Searching for __aulldiv, I have found this [ link ]
Looks interesting, but it’s all greek to me. How sad…

Ciao ciao : )

[EDIT]
I see the asm invokes an __allmul even when multiplying a DWORD64 by 5, like I do when calculating powers of 5:

DWORD64 Pow5 = /*...*/
.
.
Pow5 *= 5;


I tried to substitute the multiplication with “Pow5 = (Pow5 << 2) + Pow5;”, but the compiler sees what I’m doing and reverts to __allmul again.

But if I do this (in place of “Pow5 *= 5;” or “Pow5 = (Pow5 << 2) + Pow5;”):

DWORD64 Pow5 = /*...*/
.
.

union xPOW5
{
DWORD64 _64;
struct sPOW5
{
DWORD l;
DWORD h;
} _32;
};

xPOW5&  pow5 = (xPOW5&) Pow5; // Reference the actual Pow5.
DWORD64 _t64 = Pow5;          // Backup of actual Pow5.

//   doing     this
// |----------------|
// Pow5 = (Pow5 << 2) + Pow5;
pow5._32.h <<= 2;
pow5._32.h |= pow5._32.l >> 30;
pow5._32.l <<= 2;

//                    now this
//                    |-----|
// Pow5 = (Pow5 << 2) + Pow5;
pow5._64 += _t64;


The compiler won’t see I’m multiplying by 5, won’t call __allmul, and will produce a faster asm.
Only a slight improvement, TBH, but consistent.

Now, if I can get rid of __aulldiv as well…

Ciao ciao : )
[/EDIT]