Jump to content


Fast and accurate sine/cosine


103 replies to this topic

#61 BabyUniverse

    New Member

  • Members
  • Pip
  • 1 posts

Posted 24 February 2009 - 03:01 PM

For the arcsine guy, one big problem is that the derivative of arcsin/arccos(which is pi/2-arcsin) goes to infinity at the boundary, so polynomials will never work there. One trick is to let yourself take a square root if you're close to the edge, then those evaulations will be slow (but still ~4 times faster than an arcsine), but for most of the range you are much faster. Yes, ugly if, but not much of a way around it...

Sample code, good to <10^-4, average abs err of 10^-5, ~10 times faster on average than arcsine, and apologies for the line breaks:

typedef double mytype;  //try float here too, if you like

const mytype asin4_params1[5]={   6.32559537178112e-05,   9.97002719101181e-01,   3.23729856176963e-02,   3.89287300071597e-02,   1.93549238398372e-01};
const mytype asin4_params2[7]={   2.09625797161885e+01,  -1.74835553411477e+02,   6.13575281494908e+02,  -1.14033116228467e+03,   1.19159992307311e+03,  -6.63957441058529e+02,   1.54421991537526e+02};
const mytype asin4_params3[4]={   1.57080010233116e+00,  -1.41437401362252e+00,   1.84777752400778e-03,  -1.24625163381900e-01};
const mytype asin4_split1=0.6;
const mytype asin4_split2=0.925;
static inline mytype asin4(mytype x)
{
  if (x<asin4_split1)
    return asin4_params1[0]+x*(asin4_params1[1]+x*(asin4_params1[2]+x*(asin4_params1[3]+x*(asin4_params1[4]))));
  if (x<asin4_split2)
    return asin4_params2[0]+x*(asin4_params2[1]+x*(asin4_params2[2]+x*(asin4_params2[3]+x*(asin4_params2[4]+x*(asin4_params2[5]+x*asin4_params2[6])))));
  mytype xx=sqrt(1-x);
  return asin4_params3[0]+xx*(asin4_params3[1]+xx*(asin4_params3[2]+xx*asin4_params3[3]));
}


#62 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 24 February 2009 - 07:16 PM

Great idea to use a square root in just the steep part of the curve!

#63 aPackOfWankers

    New Member

  • Members
  • Pip
  • 1 posts

Posted 09 March 2009 - 02:33 AM

Am a bit late to the party here. Nicks sin() is great.

Ive implemented it for the iPhone in ARM assembler using their VFP SIMD unit. Getting about 10.5 times the std lib sinf() performance.

You can save some work by combining range reduction with the evaluation, and by making the evaluation operate on the range -0.5...0.5 rather than -pi/2...pi/2.

The ARM VFP unit is heavily pipelined, and one issue with this code is that its not very friendly on deep pipelines - lots of dataflow dependencies in there.

float fast_sin(float x)
{
	// simplify the range reduction
	// reduce to a -0.5..0.5 range, then apply polynomial directly
	const float P = 0.225; // const float Q = 0.775;
	
	x = x * M_1_PI;
	int k = (int) round(x);
	x = x - k;

	float y = (4 - 4 * abs(x)) * x;

	y = P * (y * abs(y) - y) + y;

	return (k&1) ? -y : y;
}


#64 Andy201

    New Member

  • Members
  • Pip
  • 3 posts

Posted 26 March 2009 - 02:25 PM

I ended up using this in the end:

inline float fastAcos(float x)
{
return sqrt(1-x)*(1.5707963267948966192313216916398f + x*(-0.213300989f + x*(0.077980478f + x*-0.02164095f)));
}


When interleaved with other code the Arm/VFP assembly version pipelines pretty good.

#65 Reedbeta

    DevMaster Staff

  • Administrators
  • 5309 posts
  • LocationSanta Clara, CA

Posted 26 March 2009 - 04:38 PM

Andy201 said:

1.5707963267948966192313216916398f

LOL. You realize float only has about 7 or 8 decimal digits of precision, right?
reedbeta.com - developer blog, OpenGL demos, and other projects

#66 Andy201

    New Member

  • Members
  • Pip
  • 3 posts

Posted 28 March 2009 - 01:13 PM

Instead of LOL, the compiler will take that into account... The thing is the code works, it gives the correct results. I dare you to try it. Jesus, you are a prized prick.

If you can come up with anything as fast as that, which will pipeline with an ARM vfp, then go ahead - that square root is almost free when pipelined. But I guess you can't...

#67 Reedbeta

    DevMaster Staff

  • Administrators
  • 5309 posts
  • LocationSanta Clara, CA

Posted 28 March 2009 - 04:34 PM

Andy201 said:

the compiler will take that into account...

Of course it will. I'm not claiming the code doesn't work. Don't take it so personally, I was just amused by the unnecessary number of digits. ;)
reedbeta.com - developer blog, OpenGL demos, and other projects

#68 UnrealSolo

    Member

  • Members
  • PipPip
  • 30 posts

Posted 28 May 2010 - 02:42 PM

I have a sincos routine using 16bit fractional integers, and using the extra term mentioned here the results are quite impressive.

it was also based on code i found elsewhere that ive lost the reference to,
using a method to calculate sin and cos at the same time sharing some of the calculations, I dont claim to know in detail how the underling thoery of it works.

it gives close to 4 bit error when measured RMS, in fact strangly enough, the fractional int scores a lower rms error than with double.

as its for FFT on 10 bit adc this should be sufficient.

I tested the accuracy of the code using c# for ease, c# is a bit picky about casting, if you multiply two 16 bit numbers it becomes 32 bits so you have to cast it back to 16 bits to make sure. I also had run time check for overflow turned on for thorough checking

ive not done timing analysys yet on the target, as im porting some code to a different cpu.



/// <summary>

/// test for fast sincos functions using signed fractional ints,

/// results compared well to using double.

/// fractional int is stored as n*0x8000 where n is from -1 to less than 1 

/// ie fixed point where there are no places before the point.

/// +1.0 cant be stored, but -1.0 can , so the functions have to be adjusted slightly so they dont ever return +1.0

/// multiply two fractional ints and you have a 30bit number plus 2 sign bits

/// so you need to left shit to get 31 bits+sign, then take the top 16 bits to get a signed fractional int.

/// </summary>

static bool accurate = true;

/// <summary>

/// based on  sin=3/4 -x^2 +x (where x=angle-1/4 turn)

/// </summary>

/// <param name="phase">angle 0-0xffff for 0-360' </param>

/// <returns>maxint-minint</returns>

public static SinCos sin_i(Int16 phase)  

{

    Int16 x;

    x = (Int16)((phase & 0x3fff) - 0x2000);

    x *=2;

    Int16 x2 = (Int16)((x * (Int32)x)/0x8000);  

    //instead of dividing by 0x8000 efficiency can be made by shifting left 1 then taking high word.

    Int16 y = (Int16)(0x5fff - x2);

    SinCos r;

    if ((phase & 0x4000) != 0)

        r = new SinCos(y - x, -x - y);

    else

        r = new SinCos(y + x, -x + y);

    if (accurate)

    {   //efficiences can be made by putting this in the above, but its clearer here

        if ((phase & 0x4000) != 0)

            r.cos = (int)((r.cos * (((Int16)(0.225 * 0x10000)) * -r.cos / 0x10000 + (Int16)(0x8000 * .775))) / 0x8000);

        else

            r.cos = (int)((r.cos * (((Int16)(0.225 * 0x10000)) * r.cos / 0x10000 + (Int16)(0x8000 * .775))) / 0x8000);

        r.sin = (int)((r.sin * (((int)(0.225 * 0x10000)) * r.sin / 0x10000 + (int)(0x8000 * .775))) / 0x8000);

        //instead of dividing by 0x8000 efficiency can be made by shifting right 15 bits

        //(not enough bits to do shift left first as above)

    }

    if ((phase & 0x8000) != 0)

    {

        r.sin = -r.sin;

        r.cos = -r.cos;

    }

    return r;

}


#69 BrettP

    New Member

  • Members
  • Pip
  • 1 posts

Posted 22 July 2010 - 03:41 AM

Here is a version of fast_sin in ARM NEON Assembly. Please feel to point out improvements.

float scalers2[4] = {-1.f, 1.f / 16.f, 0.225f, M_1_PI};
float fours[4] = {4.f, 4.f, 4.f, 4.f};
int addOne[4] = {1, 1, 1, 1};
int negativeOne[4] = {-1, -1, -1, -1};
float fin[4] = {x, 0.f,0.f,0.f};
float fout[4];
float toRet;
asm volatile(		 
//Load in data to start
	"vld1.32 {q6}, [%[scalers2]]		\n\t"
	"vld1.32 {q3}, [%[fours]]		\n\t"
	"vld1.32 {q10}, [%[addone]]		\n\t"
	"vld1.32 {q11}, [%[negativeOne]]		\n\t"
	//"vld1.32 {q2}, [%[toRet]]		\n\t"
	"vld1.32 {q1}, [%[input]]		\n\t"
	//Need sin of PMt
	//P = d13[0]
	//M_1_PI = d13[1]

	//x = x * M_1_PI; (scalers2[3])
	"vmul.f32 q0, q1, d13[1]		\n\t"

	//int k = (int) x; //Was rounded
	//Convert to int and back to float (q9 retains int version)
	"vcvt.s32.f32 q9, q0		\n\t"

	//Convert to float 
	"vcvt.f32.s32 q2, q9		\n\t"

	//x = x - k;
	"vsub.f32 q0, q0, q2		\n\t"

	//4 * fabsf2(x)  we can ignore this fabsf as we know that our x is always positive
	"vmul.f32 q2, q0, q3		\n\t"

	//(4 - 4 * fabsf2(x))  
	"vsub.f32 q2, q3, q2		\n\t"

	//(4 - 4 * fabsf2(x)) * x;
	"vmul.f32 q0, q0, q2		\n\t"

	//fabsf2(y)
	"vabs.f32 q2, q0		\n\t"

	//y * fabsf2(y)
	"vmul.f32 q2, q0, q2		\n\t"

	//(y * fabsf2(y) - y)
	"vsub.f32 q2, q2, q0		\n\t"

	// P * (y * fabsf2(y) - y) (scalers[2]
	"vmul.f32 q2, q2, d13[0]		\n\t"

	//P * (y * fabsf2(y) - y) + y;
	"vadd.f32 q0, q0, q2				\n\t"


	////If K is odd, then we have a negative number, else positive 
	//convert to int
	//"vcvt.s32.f32 q8, q0		\n\t"

	//Perform an And 
	"vand.I32 q2, q9, q10				\n\t"

	//Negate
	"vmul.I32 q2, q2, q11		\n\t"

	//Make sure it is +1 or -1
	"vorr.i32 q2, q2, q10		\n\t"

	//if we are 1, we want a negative (convert to float)
	"vcvt.f32.s32 q2, q2			\n\t"

	//multiply by -1 (scalers2[0])	
	//"vmul.f32 q9, q9, d12[0]		\n\t"

	//return (k&1) ? -y : y;
	"vmul.f32 q0, q0, q2			\n\t"

	"vst1.f32 {q0}, [%[toRet]]				\n\t"

	: 
	:[scalers2] "r" (scalers2),
	[fours] "r" (fours),
	[addone] "r" (addOne),
	[negativeOne] "r"  (negativeOne),
	[toRet] "r" (fout),
	[input] "r" (fin)

	:"q0","q1", "q2", "q3", "q4", "q6", "q8", "q9", "q10", "q11", "memory", "cc"
	);


#70 akreider

    New Member

  • Members
  • Pip
  • 1 posts

Posted 11 October 2010 - 08:16 PM

I tried implementing the cos() faster/less accurate function in php (5.3.0) and it was actually 5-20% slower.

I used the implementation from: http://lab.polygonal...-approximation/

Using Windows Vista. Wampserver (php/mysql/apache). CPU: Intel Q6600.

#71 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 11 October 2010 - 10:35 PM

akreider said:

I tried implementing the cos() faster/less accurate function in php (5.3.0) and it was actually 5-20% slower.
If the script is interpreted then that's to be expected. It takes longer to interpret many fast operations than to interpret one slow operation.

You probably want to look at ASP.NET for higher performance.

#72 DroidFan1

    New Member

  • Members
  • Pip
  • 7 posts

Posted 27 October 2010 - 03:01 PM

Hi Nick,

Thank you very much for posting this great tip. Sorry for this late reply to such an old post, but I just came across it. I've been trying to reproduce your results but I'm getting some strange output and I'm hopeful that you will take a look and let me know if you see my error.

When I plug your equations into an Excel spreadsheet, the graph looks great. Here is a link to a screenshot of the Excel graph:
http://www.flickr.co...an1/5120869190/

But when I write to a csv file from a C program, then graph the output, I get this strange output:
http://www.flickr.co...an1/5120869214/

I'm using gcc as my compiler. Here is the C program:
http://www.flickr.co...an1/5120266081/

Any suggestions are very much appreciated.

Best regards,
Rich

#73 DroidFan1

    New Member

  • Members
  • Pip
  • 7 posts

Posted 27 October 2010 - 04:56 PM

DroidFan1 said:

Hi Nick,

Thank you very much for posting this great tip. Sorry for this late reply to such an old post, but I just came across it. I've been trying to reproduce your results but I'm getting some strange output and I'm hopeful that you will take a look and let me know if you see my error.

When I plug your equations into an Excel spreadsheet, the graph looks great. Here is a link to a screenshot of the Excel graph:
http://www.flickr.co...an1/5120869190/

But when I write to a csv file from a C program, then graph the output, I get this strange output:
http://www.flickr.co...an1/5120869214/

I'm using gcc as my compiler. Here is the C program:
http://www.flickr.co...an1/5120266081/

Any suggestions are very much appreciated.

Best regards,
Rich


My previous post included only a screenshot of the C source code. Here is a link to the actual text of the source file:

http://db.tt/EwmS5Ed

Rich

#74 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 28 October 2010 - 07:12 AM

Hi Rich, Your code works fine here. The csv file contains a beautiful sine.

So something else must be going on. Does your csv file contain 0.5 at 30 degrees, 1.0 at 90 degrees and 0.5 at 150 degrees? If not then something must have gone wrong in this code on your platform. Else it's an issue with displaying it in Excel.

#75 DroidFan1

    New Member

  • Members
  • Pip
  • 7 posts

Posted 28 October 2010 - 12:52 PM

Nick said:

Hi Rich, Your code works fine here. The csv file contains a beautiful sine.

So something else must be going on. Does your csv file contain 0.5 at 30 degrees, 1.0 at 90 degrees and 0.5 at 150 degrees? If not then something must have gone wrong in this code on your platform. Else it's an issue with displaying it in Excel.

Thanks for the reply Nick.

No my csv file does not contain the expected data. Here it is:

0,0.000000
1,0.017222
2,0.034444
3,0.051667
4,0.068889
5,0.086111
6,0.103333
7,0.120556
8,0.137778
9,0.155000
10,0.172222
11,0.189444
12,0.206667
13,0.223889
14,0.241111
15,0.258333
16,0.275556
17,0.292778
18,0.310000
19,0.327222
20,0.344444
21,0.361667
22,0.378889
23,0.396111
24,0.413333
25,0.430556
26,0.447778
27,0.465000
28,0.482222
29,0.499444
30,0.516667
31,0.533889
32,0.551111
33,0.568333
34,0.585556
35,0.602778
36,0.620000
37,0.637222
38,0.654444
39,0.671667
40,0.688889
41,0.706111
42,0.723333
43,0.740556
44,0.757778
45,1.000000
46,1.022222
47,1.044444
48,1.066667
49,1.088889
50,1.111111
51,1.133333
52,1.155556
53,1.177778
54,1.200000
55,1.222222
56,1.244444
57,1.266667
58,0.680933
59,0.692673
60,0.704413
61,0.716153
62,0.727894
63,0.739634
64,0.751374
65,0.763114
66,0.774854
67,1.014961
68,1.030110
69,1.045258
70,1.060407
71,1.075556
72,1.090704
73,1.105853
74,1.121001
75,1.136150
76,1.151299
77,1.166448
78,1.181596
79,1.196745
80,1.211894
81,1.227042
82,1.242191
83,1.257339
84,1.272488
85,1.287637
86,1.302786
87,1.317934
88,1.333083
89,1.348232
90,1.363380
91,1.378529
92,1.393678
93,1.408826
94,1.423975
95,1.439124
96,1.454272
97,1.469421
98,1.484570
99,1.499718
100,1.514867
101,1.530016
102,1.545164
103,1.560313
104,1.575462
105,1.590611
106,1.605759
107,1.620908
108,1.636056
109,1.651205
110,1.666354
111,1.681502
112,1.696651
113,1.711800
114,1.726948
115,0.719695
116,0.725953
117,0.732211
118,0.738469
119,0.744728
120,0.750986
121,0.757244
122,0.763502
123,0.769760
124,1.001315
125,1.009390
126,1.017465
127,1.025540
128,1.033615
129,1.041690
130,1.049765
131,1.057840
132,1.065915
133,1.073990
134,1.082066
135,1.090141
136,1.098216
137,1.106291
138,1.114366
139,1.122441
140,1.130516
141,1.138592
142,1.146667
143,1.154742
144,1.162817
145,1.170892
146,1.178967
147,1.187042
148,1.195117
149,1.203192
150,1.211267
151,1.219343
152,1.227418
153,1.235493
154,1.243568
155,1.251643
156,1.259718
157,1.267793
158,1.275868
159,1.283944
160,1.292019
161,1.300094
162,1.308169
163,1.316244
164,1.324319
165,1.332394
166,1.340469
167,1.348545
168,1.356620
169,1.364695
170,1.372770
171,1.380845
172,0.133508
173,0.134285
174,0.135061
175,0.135837
176,0.136613
177,0.137390
178,0.138166
179,0.138942
180,0.139718


Rich

#76 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 28 October 2010 - 01:18 PM

You'll have to use a debugger to see where the calculation goes wrong on your machine.

#77 DroidFan1

    New Member

  • Members
  • Pip
  • 7 posts

Posted 28 October 2010 - 01:51 PM

Nick said:

Hi Rich, Your code works fine here. The csv file contains a beautiful sine.

So something else must be going on. Does your csv file contain 0.5 at 30 degrees, 1.0 at 90 degrees and 0.5 at 150 degrees? If not then something must have gone wrong in this code on your platform. Else it's an issue with displaying it in Excel.


Nick,

I changed the two abs() function calls to fabs(), and it's working great. Thanks for the feedback. It was helpful to know that it was working properly on another machine.

Best regards,
Rich

#78 DroidFan1

    New Member

  • Members
  • Pip
  • 7 posts

Posted 28 October 2010 - 01:57 PM

Nick said:

You'll have to use a debugger to see where the calculation goes wrong on your machine.

Nick,

I'm using gcc. Which compiler are you using?

Rich

#79 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 28 October 2010 - 02:13 PM

DroidFan1 said:

I changed the two abs() function calls to fabs(), and it's working great. Thanks for the feedback. It was helpful to know that it was working properly on another machine.
Oh, I get it now. You're compiling it as C code while I'm compiling as C++. Because C doesn't support function overloading, abs() is only defined for integers.

Quote

I'm using gcc. Which compiler are you using?
I'm using Visual Studio 2005. If I set it to compile as C code I can replicate the results you're seeing.

I also sometimes use GCC, for which I prefer the Code::Blocks IDE. It conventiently integrates the GDB debugger.

But if you're developing on and for Windows I would definitely recommend using the free Visual Studio Express compiler.

#80 DroidFan1

    New Member

  • Members
  • Pip
  • 7 posts

Posted 28 October 2010 - 02:41 PM

Thanks Nick, yes that makes sense. I will check out Code::Blocks and Visual Studio Express. Thanks for the tips.

Rich





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users