[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: long double broken on i386?



On Wed, Oct 03, 2007 at 11:13:38AM +1000, Bruce Evans wrote:
> On Tue, 2 Oct 2007, Steve Kargl wrote:
> 
> >On Tue, Oct 02, 2007 at 01:23:18PM -0400, David Schultz wrote:
> >>...
> >>Anyway, my point is that if you have something that works
> >>reasonably well and doesn't have egregious errors, my suggestion
> >>is to just commit it and not kill yourself over whether the
> >>argument reduction is correct in the last ulp.
> 
> >There are a few problems: 1) I don't have commit access.  2) I
> >need to do style(9) clean-up pass.  3) I've only tested these
> >on i386 and amd64, and I know these fail for sparc64. 4) Most
> >importantly, I don't know how the project wants to handle the
> >53 vs 64 bit default fpu setting on i386.
> 
> (3) is most important.  The C versions will never even be used for
> i386 or amd64, since these arches have sin and cos in hardware and (I
> believe) it is impossible to beat the hardware on these arches.  The
> asm versions have always been trivial to implement (change 2 bytes in
> s_sin.S (compare e_sqrt.S with e_sqrtf.S; the long double changes go
> the other way).  amd64 actually uses the C versions for double precision
> becase the C versions are competitive with the hardware for speed and
> beat it for accuracy on large args and on small args aear a multiple
> of pi/2.

I've given up on i386.  All of my attempts to make my C implementations
work on both amd64 and i386 have failed.  The code when run on i386 
results in ULP's that exceed 10^5.  Yes, that is 1E5.  Results for amd64
are reported below.  As far as assembly, someone else will need to write
those routines because I do not know assembly.

> (4): These functions are useless unless the default is changed.  Users
> have always been able to get equivalent _in_accuracy on i386 by just
> using the double precision versions.  Hacks like #define sinl sin
> almost work for just getting things to compile.

AFAIK, C99 says the prototype in math.h is "long double sinl(long double);".
The #define hack simply won't cut it.

> In fact, on i386, due
> to the bugfeatures outlined in my previous mail, users can even get
> full long double accuracy using hacks like that, even if the rounding
> precision is 53 bits:

(code removed)

> On i386, this stores the full long double result for sinl(1) in x, due
> to the bugfeatures -- sin(1) returns extra precision in a register, and
> gcc doesn't understand the extra precision so it just stores all bits
> of the register in x.  The behaviour would be similar if sin(1) were used
> in an expression.

Given GCC's history, relying on the accidental full long double
accuracy because FP registers are 80 bits seems to be folly.  It
also isn't clear to me what sin(x) will do with x > DBL_MAX.  Will
1e400L be interpreted as +Inf or the long double value 1e400?  Yes,
I'm fully aware that such a large argument may not be the greatest
idea due to argument reduction.


> >PS: There is also the minor possibility of giving bde either
> >a heartache or death via laughter when he finally sees what I
> >have done. :)
> 
> I would be unhappy with any versions that aren't either:
> - as identical as possible in algorithm and style to the double precision
>   versions, in the same way that the float versions are as identical as
>   possible, or
> - much better in algorithm and style and efficiency and accuracy than the
>   double precision versions.  I don't want to maintain different versions.


I'll go with your second option see comments below and attached
code.  BTW, if I add

#ifdef sinl
#undef sinl
#endif
#define sinl sin

to the top of my sin_test.c code, I get ULPs that exceed 10^9.

>   I tried to turn your original efforts for logl() into the latter for
>   logf(), log() and logl(), but so far they are only a little better.

I lost my copy of logl() due to fat fingering a rm command and
various hard drive/file system issues.  It probably is contained
on one of my several back up tapes, but I've been too lazy to 
do the restore.

Back to sinl, cosl, tanl.  The attached diff contains everything
that one needs for these functions on an architecture with 64-bit
significand long double.  Note that portions of the diff for Makefile
and math.h contain irrelevant changes that are only relevant for
my ccosh, sqrtl, and cbrtl changes.  Also note that the Makefile only
includes my long double routines when LDBL_PREC == 64.  My testing on
FreeBSD-amd64 shows

troutmask:sgk[219] ./sin_test 4000000 10 400000000
      Number of test loops to run: 10
Number of PRN in [0,4.000000e+08] per run: 4000000
  ULP
1.0000 (5)
1.0000 (4)
1.0000 (6)
1.0000 (4)
1.0000 (3)
1.0000 (7)
1.0000 (3)
1.0000 (7)
1.0000 (5)
1.0000 (5)
       ---
        49

The number in parenthesis is the count of values that give a sinl(x)
that exceeds 0.5 ULP.  On the reduced range of [0,pi/4] where
__kernel_cosl and __kernel_sinl are used, in all my testing no value
of x in [0,pi/4] generated a sinl(x) that exceeded 0.5 ULP.  Thus, I
suspect (but haven't tried to confirm), the 1.0 ULP observed above is
due to the argument reduction.

My timings on the basic routines using gprof are:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  0.6     804.42     5.67 40000000     0.00     0.00  sinl [18]
  0.1     934.41     0.74 19998751     0.00     0.00  __kernel_sinl [95]
  0.1     935.82     0.66 20001249     0.00     0.00  __kernel_cosl [96]

where the system contains a dual-cpu opteron tyan motherboard running
at 2.2 GHz.


The results for cosl() are

troutmask:sgk[224] ./cos_test 4000000 10 400000000
      Number of test loops to run: 10
Number of PRN in [0,4.000000e+08] per run: 4000000
  ULP
1.0000 (3)
1.0000 (4)
1.0000 (2)
1.0000 (1)
1.0000 (7)
1.0000 (5)
0.5000 (0)
1.0000 (3)
1.0000 (9)
1.0000 (5)
       ---
        39

Note, the 40 million PRN are not the same as those used in the
the testing of sinl().

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  0.6     829.38     5.73 40000000     0.00     0.00  cosl [18]
  0.1     936.85     0.70 20000178     0.00     0.00  __kernel_sinl [80]
  0.1     938.20     0.67 19999822     0.00     0.00  __kernel_cosl [81]


The testing of tanl() is a little more interesting.

troutmask:sgk[227] ./tan_test 4000000 10 400000000
      Number of test loops to run: 10
Number of PRN in [0,4.000000e+08] per run: 4000000
  ULP
1.0000 (24788)
1.0000 (24633)
1.5000 (24626)
1.0000 (24820)
1.0000 (24652)
1.5000 (24526)
1.0000 (24749)
1.0000 (24873)
1.0000 (24656)
1.0000 (24701)
       -------
        247024

  0.5     911.88     5.80 40000000     0.00     0.00  tanl [16]
  0.5     933.18     5.01 40000000     0.00     0.00  __kernel_tanl [53]

At first glance, the 1.5 ULP and the total count of values of x
that generate a result with a ULP that exceeds 0.5 seems disconcerting.
However, testing of __kernel_tanl() shows that for all tested values of x
in [0,pi/4] the maximum ULP did exceed 0.5.  This suggests that tanl() is
more sensitive to errors in the argument reduction than sinl() and cosl().

Do with the code as you see fit.  I'll be moving on to hypothl and
other functions as I have time.

-- 
Steve
Index: msun/Makefile
===================================================================
RCS file: /home/ncvs/src/lib/msun/Makefile,v
retrieving revision 1.78
diff -u -p -r1.78 Makefile
--- msun/Makefile	21 May 2007 02:49:08 -0000	1.78
+++ msun/Makefile	9 Oct 2007 21:52:32 -0000
@@ -54,7 +54,8 @@ COMMON_SRCS= b_exp.c b_log.c b_tgamma.c 
 	s_nexttowardf.c s_remquo.c s_remquof.c \
 	s_rint.c s_rintf.c s_round.c s_roundf.c s_roundl.c \
 	s_scalbln.c s_scalbn.c s_scalbnf.c s_signbit.c \
-	s_signgam.c s_significand.c s_significandf.c s_sin.c s_sinf.c s_tan.c \
+	s_signgam.c s_significand.c s_significandf.c s_sin.c s_sinf.c \
+	s_tan.c \
 	s_tanf.c s_tanh.c s_tanhf.c s_trunc.c s_truncf.c s_truncl.c \
 	w_cabs.c w_cabsf.c w_drem.c w_dremf.c
 
@@ -68,13 +69,20 @@ SYMBOL_MAPS=	${SYM_MAPS}
 
 # C99 long double functions
 COMMON_SRCS+=	s_copysignl.c s_fabsl.c s_modfl.c
+
 .if ${LDBL_PREC} != 53
 # If long double != double use these; otherwise, we alias the double versions.
 COMMON_SRCS+=	s_fmal.c s_frexpl.c s_nextafterl.c s_nexttoward.c s_scalbnl.c
 .endif
 
+.if ${LDBL_PREC} == 64
+COMMON_SRCS+=	e_sqrtl.c k_cosl.c k_sinl.c k_tanl.c s_cbrtl.c \
+	s_cosl.c s_sinl.c s_tanl.c
+.endif
+
 # C99 complex functions
-COMMON_SRCS+=	s_cimag.c s_cimagf.c s_cimagl.c s_conj.c s_conjf.c s_conjl.c \
+COMMON_SRCS+=	s_ccosh.c \
+	s_cimag.c s_cimagf.c s_cimagl.c s_conj.c s_conjf.c s_conjl.c \
 	s_creal.c s_crealf.c s_creall.c
 
 # FreeBSD's C library supplies these functions:
Index: msun/Symbol.map
===================================================================
RCS file: /home/ncvs/src/lib/msun/Symbol.map,v
retrieving revision 1.4
diff -u -p -r1.4 Symbol.map
--- msun/Symbol.map	29 Apr 2007 14:05:20 -0000	1.4
+++ msun/Symbol.map	9 Oct 2007 21:52:32 -0000
@@ -76,6 +79,7 @@ FBSD_1.0 {
 	copysignl;
 	cos;
 	cosf;
+	cosl;
 	creal;
 	crealf;
 	creall;
@@ -168,8 +172,10 @@ FBSD_1.0 {
 	significandf;
 	sin;
 	sinf;
+	sinl;
 	tan;
 	tanf;
+	tanl;
 	tanh;
 	tanhf;
 	trunc;
Index: msun/man/cos.3
===================================================================
RCS file: /home/ncvs/src/lib/msun/man/cos.3,v
retrieving revision 1.12
diff -u -p -r1.12 cos.3
--- msun/man/cos.3	9 Jan 2007 01:02:05 -0000	1.12
+++ msun/man/cos.3	9 Oct 2007 21:52:32 -0000
@@ -33,7 +33,8 @@
 .Os
 .Sh NAME
 .Nm cos ,
-.Nm cosf
+.Nm cosf ,
+.Nm cosl
 .Nd cosine functions
 .Sh LIBRARY
 .Lb libm
@@ -43,11 +44,14 @@
 .Fn cos "double x"
 .Ft float
 .Fn cosf "float x"
+.Ft "long double"
+.Fn cosl "long double x"
 .Sh DESCRIPTION
 The
-.Fn cos
-and the
+.Fn cos ,
 .Fn cosf
+and the
+.Fn cosl
 functions compute the cosine of
 .Fa x
 (measured in radians).
@@ -57,9 +61,10 @@ For a discussion of error due to roundof
 .Xr math 3 .
 .Sh RETURN VALUES
 The
-.Fn cos
-and the
+.Fn cos ,
 .Fn cosf
+and the
+.Fn cosl
 functions return the cosine value.
 .Sh SEE ALSO
 .Xr acos 3 ,
Index: msun/man/sin.3
===================================================================
RCS file: /home/ncvs/src/lib/msun/man/sin.3,v
retrieving revision 1.10
diff -u -p -r1.10 sin.3
--- msun/man/sin.3	9 Jan 2007 01:02:06 -0000	1.10
+++ msun/man/sin.3	9 Oct 2007 21:52:32 -0000
@@ -34,7 +34,8 @@
 .Os
 .Sh NAME
 .Nm sin ,
-.Nm sinf
+.Nm sinf ,
+.Nm sinl
 .Nd sine functions
 .Sh LIBRARY
 .Lb libm
@@ -44,11 +45,14 @@
 .Fn sin "double x"
 .Ft float
 .Fn sinf "float x"
+.Ft "long double"
+.Fn sinf "long double x"
 .Sh DESCRIPTION
 The
-.Fn sin
-and the
+.Fn sin ,
 .Fn sinf
+and the
+.Fn sinl
 functions compute the sine of
 .Fa x
 (measured in radians).
@@ -56,9 +60,10 @@ A large magnitude argument may yield a r
 or no significance.
 .Sh RETURN VALUES
 The
-.Fn sin
-and the
+.Fn sin ,
 .Fn sinf
+and the
+.Fn sinl
 functions return the sine value.
 .Sh SEE ALSO
 .Xr acos 3 ,
Index: msun/man/tan.3
===================================================================
RCS file: /home/ncvs/src/lib/msun/man/tan.3,v
retrieving revision 1.10
diff -u -p -r1.10 tan.3
--- msun/man/tan.3	9 Jan 2007 01:02:06 -0000	1.10
+++ msun/man/tan.3	9 Oct 2007 21:52:32 -0000
@@ -33,7 +33,8 @@
 .Os
 .Sh NAME
 .Nm tan ,
-.Nm tanf
+.Nm tanf ,
+.Nm tanl
 .Nd tangent functions
 .Sh LIBRARY
 .Lb libm
@@ -43,11 +44,14 @@
 .Fn tan "double x"
 .Ft float
 .Fn tanf "float x"
+.Ft "long double"
+.Fn tanf "long double x"
 .Sh DESCRIPTION
 The
-.Fn tan
-and the
+.Fn tan ,
 .Fn tanf
+and the
+.Fn tanl
 functions compute the tangent of
 .Fa x
 (measured in radians).
@@ -57,8 +61,11 @@ For a discussion of error due to roundof
 .Xr math 3 .
 .Sh RETURN VALUES
 The
-.Fn tan
-function returns the tangent value.
+.Fn tan ,
+.Fn tanf
+and
+.Fn tanl
+functions return the tangent value.
 .Sh SEE ALSO
 .Xr acos 3 ,
 .Xr asin 3 ,
Index: msun/src/k_cosl.c
===================================================================
RCS file: msun/src/k_cosl.c
diff -N msun/src/k_cosl.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ msun/src/k_cosl.c	9 Oct 2007 21:52:32 -0000
@@ -0,0 +1,62 @@
+/*-
+ * Copyright (c) 2007 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Compute cos(x) on the interval [-1:1] with a polynomial approximation.
+ * The soefficients are determined from a Chebyshev interpolation scheme,
+ * which claims to provide a nearly minimax polynomial approximation.
+ * The scheme was implemented with GMP/MPFR at 512 bits of precision.
+ *
+ * cos(x) = c0 + c1 * x**2 + c2 * x**4 + c3 * x**6 + ... 
+ *        = c0 + x**2 * (c1 + x**2 * (c2 + x**2 * (c3 + ...
+ *
+ * In practice, this routine is called only for x in the interval [0:pi/4].
+ * Limited testing on pseudorandom numbers drawn within [0:pi/4] shows
+ * an accuracy of <= 0.5 ULP.
+ */
+#define C0  (long double)  0x1.000000000000000p0L
+#define C1  (long double) -0x8.000000000000000p-4L
+#define C2  (long double)  0xa.aaaaaaaaaaaaaabp-8L
+#define C3  (long double) -0x5.b05b05b05b05b05p-12L
+#define C4  (long double)  0x1.a01a01a01a019dbp-16L
+#define C5  (long double) -0x4.9f93edde27cbed7p-24L
+#define C6  (long double)  0x8.f76c77fc4bbff37p-32L
+#define C7  (long double) -0xc.9cba54236b4e97fp-40L
+#define C8  (long double)  0xd.73f9aa92060766ap-48L
+#define C9  (long double) -0xb.4105ba69deeed1ap-56L
+
+long double
+__kernel_cosl(long double x)
+{
+	long double ys, fs;
+
+	ys = x * x;
+
+	fs = C0 + ys * (C1 + ys * (C2 + ys * (C3 + ys * (C4 +
+	    ys * (C5 + ys * (C6 + ys * (C7 + ys * (C8 + ys * C9))))))));
+
+	return (fs);
+}
Index: msun/src/k_sinl.c
===================================================================
RCS file: msun/src/k_sinl.c
diff -N msun/src/k_sinl.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ msun/src/k_sinl.c	9 Oct 2007 21:52:32 -0000
@@ -0,0 +1,64 @@
+/*-
+ * Copyright (c) 2007 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Compute sin(x) on the interval [-1:1] with a polynomial approximation
+ * for sin(x)/x.  Coefficients are determined from a Chebyshev interpolation
+ * scheme, which claims to provide a nearly minimax polynomial approximation.
+ * The scheme was implemented with GMP/MPFR at 512 bits of precision.
+ *
+ * sin(x)
+ * ------ = c0 + c1 * x**2 + c2 * x**4 + c3 * x**6 + ... 
+ *   x
+ *        = c0 + x**2 * (c1 + x**2 * (c2 + x**2 * (c3 + ...
+ *
+ * In practice, this routine is called only for x in the interval [0:pi/4].
+ * Limited testing on pseudorandom numbers drawn within [0:pi/4] shows
+ * an accuracy of <= 0.5 ULP.
+ */
+#define C0  (long double)  0x1.000000000000000p0L
+#define C1  (long double) -0x2.aaaaaaaaaaaaaabp-4L
+#define C2  (long double)  0x2.222222222222222p-8L
+#define C3  (long double) -0xd.00d00d00d00cf21p-16L
+#define C4  (long double)  0x2.e3bc74aad8e0742p-20L
+#define C5  (long double) -0x6.b99159fd3ad8f13p-28L
+#define C6  (long double)  0xb.092309a1577e374p-36L
+#define C7  (long double) -0xd.73f9ac01491043cp-44L
+#define C8  (long double)  0xc.a926cdf5d22710ep-52L
+#define C9  (long double) -0x9.5d953b89a4d49b1p-60L
+
+long double
+__kernel_sinl(long double x)
+{
+	long double ys, fs;
+
+	ys = x * x;
+
+	fs = C0 + ys * (C1 + ys * (C2 + ys * (C3 + ys * (C4 + ys * (C5 +
+	    ys * (C6 + ys * (C7 + ys * (C8 + ys * C9))))))));
+
+	return (x * fs);
+}
Index: msun/src/k_tanl.c
===================================================================
RCS file: msun/src/k_tanl.c
diff -N msun/src/k_tanl.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ msun/src/k_tanl.c	9 Oct 2007 21:52:32 -0000
@@ -0,0 +1,88 @@
+/*-
+ * Copyright (c) 2007 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Compute tan(x) for x in the interval [-1:1].  In practice, this function
+ * is only called for x in the interval [0:pi/4].  The coefficients in the
+ * nearly minimax polynomial approximation were determined from a Chebyshev
+ * interpolation scheme.  The scheme was implemented with GMP/MPFR at 512
+ * bits of precision.  Limited testing on pseudorandom numbers drawn within
+ * [0:pi/4] shows an accuracy of <= 0.5 ULP.
+ */
+
+#define C0  (long double)  0x1.000000000000000p0L
+#define C1  (long double)  0x5.555555555555555p-4L
+#define C2  (long double)  0x2.222222222222222p-4L
+#define C3  (long double)  0xd.d0dd0dd0dd0dd0ep-8L
+#define C4  (long double)  0x5.993d220b043e7c9p-8L
+#define C5  (long double)  0x2.44dc6abcd8479d4p-8L
+#define C6  (long double)  0xe.b69e870abed7bf1p-12L
+#define C7  (long double)  0x5.f68d914adfc47a3p-12L
+#define C8  (long double)  0x2.6ab0490042f9ab5p-12L
+#define C9  (long double)  0xf.abebb9cb8cc2036p-16L
+#define C10  (long double)  0x6.59f8612142fae16p-16L
+#define C11  (long double)  0x2.92fb2c6db573835p-16L
+#define C12  (long double)  0x1.0b12c13038ec343p-16L
+#define C13  (long double)  0x6.c40627af9caa355p-20L
+#define C14  (long double)  0x2.bd109c09597d8adp-20L
+#define C15  (long double)  0x1.2014a0592e18316p-20L
+#define C16  (long double)  0x6.5fa57993c4c813ap-24L
+#define C17  (long double)  0x5.896339c2dabba90p-24L
+#define C18  (long double) -0x5.dc121de49c2d260p-24L
+#define C19  (long double)  0x1.0bf745f0cc96f03p-20L
+#define C20  (long double) -0x2.0141b9a2c7a07fap-20L
+#define C21  (long double)  0x3.70ed6f452e482fcp-20L
+#define C22  (long double) -0x5.04879bfb1589212p-20L
+#define C23  (long double)  0x6.456251926b172eap-20L
+#define C24  (long double) -0x6.aac0ab6dc0588a1p-20L
+#define C25  (long double)  0x5.ff08654751186f8p-20L
+#define C26  (long double) -0x4.84ff6f8e7414ddap-20L
+#define C27  (long double)  0x2.d1d7cdc3082c3b2p-20L
+#define C28  (long double) -0x1.6e486511b1fb457p-20L
+#define C29  (long double)  0x9.36f96f36ec8aee5p-24L
+#define C30  (long double) -0x2.d564e445a53c9c8p-24L
+#define C31  (long double)  0xa.054b3ccb4a63326p-28L
+#define C32  (long double) -0x1.6bbcae821a3de9fp-28L
+#define C33  (long double)  0x1.8f939ed1883f595p-32L
+
+long double
+__kernel_tanl(long double x)
+{
+	long double t, xs;
+  
+	xs = x * x;
+
+	t = C0 + xs * (C1 + xs * (C2 + xs * (C3 + xs * (C4 + xs *
+	    (C5  + xs * (C6  + xs * (C7  + xs * (C8  + xs * (C9 + xs *
+	    (C10 + xs * (C11 + xs * (C12 + xs * (C13 + xs * (C14 + xs *
+	    (C15 + xs * (C16 + xs * (C17 + xs * (C18 + xs * (C19 + xs *
+	    (C20 + xs * (C21 + xs * (C22 + xs * (C23 + xs * (C24 + xs *
+	    (C25 + xs * (C26 + xs * (C27 + xs * (C28 + xs * (C29 + xs *
+	    (C30 + xs * (C31 + xs *
+	    (C32 + xs * C33))))))))))))))))))))))))))))))));
+
+	return (x * t);
+}
Index: msun/src/math.h
===================================================================
RCS file: /home/ncvs/src/lib/msun/src/math.h,v
retrieving revision 1.62
diff -u -p -r1.62 math.h
--- msun/src/math.h	7 Jan 2007 07:54:21 -0000	1.62
+++ msun/src/math.h	9 Oct 2007 21:52:32 -0000
@@ -397,13 +397,15 @@ long double	asinl(long double);
 long double	atan2l(long double, long double);
 long double	atanhl(long double);
 long double	atanl(long double);
-long double	cbrtl(long double);
 #endif
+long double	cbrtl(long double);
 long double	ceill(long double);
 long double	copysignl(long double, long double) __pure2;
 #if 0
 long double	coshl(long double);
+#endif
 long double	cosl(long double);
+#if 0
 long double	erfcl(long double);
 long double	erfl(long double);
 long double	exp2l(long double);
@@ -459,10 +461,14 @@ long double	scalblnl(long double, long);
 long double	scalbnl(long double, int);
 #if 0
 long double	sinhl(long double);
+#endif
 long double	sinl(long double);
 long double	sqrtl(long double);
+#if 0
 long double	tanhl(long double);
+#endif
 long double	tanl(long double);
+#if 0
 long double	tgammal(long double);
 #endif
 long double	truncl(long double);
Index: msun/src/math_private.h
===================================================================
RCS file: /home/ncvs/src/lib/msun/src/math_private.h,v
retrieving revision 1.20
diff -u -p -r1.20 math_private.h
--- msun/src/math_private.h	28 Nov 2005 04:58:57 -0000	1.20
+++ msun/src/math_private.h	9 Oct 2007 21:52:32 -0000
@@ -269,4 +269,9 @@ float	__kernel_cosdf(double);
 float	__kernel_tandf(double,int);
 int	__kernel_rem_pio2f(float*,float*,int,int,int,const int*);
 
+/* long double versions of fdlibm kernel functions */
+long double	__kernel_cosl(long double);
+long double	__kernel_sinl(long double);
+long double	__kernel_tanl(long double);
+
 #endif /* !_MATH_PRIVATE_H_ */
Index: msun/src/s_cosl.c
===================================================================
RCS file: msun/src/s_cosl.c
diff -N msun/src/s_cosl.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ msun/src/s_cosl.c	9 Oct 2007 21:52:32 -0000
@@ -0,0 +1,118 @@
+/*-
+ * Copyright (c) 2007 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Compute cos(x) for x where x is reduced to y = x - k * pi / 2.
+ * Limited testing on pseudorandom numbers drawn within [0:4e8] shows
+ * an accuracy of <= 1.0 ULP where 39 values of x out of 40 million
+ * possibles resulted in cos(x) that exceeded 0.5 ULP (ie., 0.0000975%).
+ */
+
+#include "math.h"
+#include "math_private.h"
+#include "fpmath.h"
+
+/*
+ * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi.
+ * The table was taken from e_rem_pio2.c.
+ */
+static const int32_t two_over_pi[] = {
+0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, 
+0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A, 
+0x424DD2, 0xE00649, 0x2EEA09, 0xD1921C, 0xFE1DEB, 0x1CB129, 
+0xA73EE8, 0x8235F5, 0x2EBB44, 0x84E99C, 0x7026B4, 0x5F7E41, 
+0x3991D6, 0x398353, 0x39F49C, 0x845F8B, 0xBDF928, 0x3B1FF8, 
+0x97FFDE, 0x05980F, 0xEF2F11, 0x8B5A0A, 0x6D1F6D, 0x367ECF, 
+0x27CB09, 0xB74F46, 0x3F669E, 0x5FEA2D, 0x7527BA, 0xC7EBE5, 
+0xF17B3D, 0x0739F7, 0x8A5292, 0xEA6BFB, 0x5FB11F, 0x8D5D08, 
+0x560330, 0x46FC7B, 0x6BABF0, 0xCFBC20, 0x9AF436, 0x1DA9E3, 
+0x91615E, 0xE61B08, 0x659985, 0x5F14A0, 0x68408D, 0xFFD880, 
+0x4D7327, 0x310606, 0x1556CA, 0x73A8C9, 0x60E27B, 0xC08C6B, 
+};
+
+#define PIO4	(long double) 0xc.90fdaa22168c235p-4L     /* pi/4 */
+#define TWO24	(long double) 1.67772160000000000000e+07L /* 2**24 */
+
+long double
+cosl(long double x)
+{
+	union IEEEl2bits z;
+	int i, e0;
+	int nx = 3, prec = 2;
+	double xd[3], yd[2];
+	long double arg;
+
+	z.e = x;
+	z.bits.sign = 0;
+
+	/* If x = +-0, then cos(x) = 1. */
+	if ((z.bits.manh | z.bits.manl) == 0)
+		return (x);
+
+	/* If x = NaN or Inf, then cos(x) = NaN. */
+	if (z.bits.exp == 32767)
+		return ((x - x) / (x - x));
+
+	/* If x is a subnormal number, then cos(x) = 1 */
+	if (z.bits.exp == 0)
+		return (1);
+
+	/* Optimize the case where x is already within range. */
+	if (z.e < PIO4)
+		return  __kernel_cosl(x);
+
+	/* Split z.e into a 24-bit representation. */
+	e0 = ilogbl(z.e) - 23;
+	z.e = scalbnl(z.e, -e0);
+	for (i = 0; i < nx; i++) {
+		xd[i] = (double) ((int32_t) z.e);
+		z.e = (z.e - xd[i]) * TWO24;
+	}
+	xd[i] = (double) ((int32_t) z.e);
+	
+	/* yd contains the pieces of xd rem pi/2 such that |yd| < pi/4. */
+	e0 = __kernel_rem_pio2(xd, yd, e0, nx, prec, two_over_pi);
+
+	for (z.e = 0, i = nx - 1; i >= 0; i--)
+		z.e += (long double) yd[i];
+
+	switch (e0 & 3) {
+	case 0:
+	    z.e = __kernel_cosl(z.e);
+	    break;
+	case 1:
+	    z.e = - __kernel_sinl(z.e);
+	    break;
+	case 2:
+	    z.e = - __kernel_cosl(z.e);
+	    break;
+	case 3:
+	    z.e = __kernel_sinl(z.e);
+	    break;
+	}
+	
+	return (z.e);
+}
Index: msun/src/s_sinl.c
===================================================================
RCS file: msun/src/s_sinl.c
diff -N msun/src/s_sinl.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ msun/src/s_sinl.c	9 Oct 2007 21:52:32 -0000
@@ -0,0 +1,118 @@
+/*-
+ * Copyright (c) 2007 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Compute sin(x) for x where x is reduced to y = x - k * pi / 2.
+ * Limited testing on pseudorandom numbers drawn within [0:4e8] shows
+ * an accuracy of <= 1.0 ULP where 49 values of x out of 40 million
+ * possibles resulted in cos(x) that exceeded 0.5 ULP (ie., 0.000123%).
+ */
+#include "math.h"
+#include "math_private.h"
+#include "fpmath.h"
+
+/*
+ * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi.
+ * The table was taken from e_rem_pio2.c.
+ */
+static const int32_t two_over_pi[] = {
+0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, 
+0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A, 
+0x424DD2, 0xE00649, 0x2EEA09, 0xD1921C, 0xFE1DEB, 0x1CB129, 
+0xA73EE8, 0x8235F5, 0x2EBB44, 0x84E99C, 0x7026B4, 0x5F7E41, 
+0x3991D6, 0x398353, 0x39F49C, 0x845F8B, 0xBDF928, 0x3B1FF8, 
+0x97FFDE, 0x05980F, 0xEF2F11, 0x8B5A0A, 0x6D1F6D, 0x367ECF, 
+0x27CB09, 0xB74F46, 0x3F669E, 0x5FEA2D, 0x7527BA, 0xC7EBE5, 
+0xF17B3D, 0x0739F7, 0x8A5292, 0xEA6BFB, 0x5FB11F, 0x8D5D08, 
+0x560330, 0x46FC7B, 0x6BABF0, 0xCFBC20, 0x9AF436, 0x1DA9E3, 
+0x91615E, 0xE61B08, 0x659985, 0x5F14A0, 0x68408D, 0xFFD880, 
+0x4D7327, 0x310606, 0x1556CA, 0x73A8C9, 0x60E27B, 0xC08C6B, 
+};
+
+#define PIO4	(long double) 0xc.90fdaa22168c235p-4L     /* pi/4 */
+#define TWO24	(long double) 1.67772160000000000000e+07L /* 2**24 */
+
+long double
+sinl(long double x)
+{
+	union IEEEl2bits z;
+	int i, e0, s;
+	int nx = 3, prec = 2;
+	double xd[3], yd[2];
+	long double arg;
+
+	z.e = x;
+	s = z.bits.sign;
+	z.bits.sign = 0;
+
+	/* If x = +-0, then sin(x) = +-0. */
+	if ((z.bits.manh | z.bits.manl) == 0)
+		return (x);
+
+	/* If x = NaN or Inf, then sin(x) = NaN. */
+	if (z.bits.exp == 32767)
+		return ((x - x) / (x - x));
+
+	/* If x is a subnormal number, then sin(x) = x */
+	if (z.bits.exp == 0)
+		return (x);
+
+	/* Optimize the case where x is already within range. */
+	if (z.e < PIO4)
+		return  __kernel_sinl(x);
+
+	/* Split z.e into a 24-bit representation. */
+	e0 = ilogbl(z.e) - 23;
+	z.e = scalbnl(z.e, -e0);
+	for (i = 0; i < nx; i++) {
+		xd[i] = (double) ((int32_t) z.e);
+		z.e = (z.e - xd[i]) * TWO24;
+	}
+	xd[i] = (double) ((int32_t) z.e);
+	
+	/* yd contains the pieces of xd rem pi/2 such that |yd| < pi/4. */
+	e0 = __kernel_rem_pio2(xd, yd, e0, nx, prec, two_over_pi);
+
+	for (z.e = 0, i = nx - 1; i >= 0; i--)
+		z.e += (long double) yd[i];
+
+	switch (e0 & 3) {
+	case 0:
+	    z.e = __kernel_sinl(z.e);
+	    break;
+	case 1:
+	    z.e = __kernel_cosl(z.e);
+	    break;
+	case 2:
+	    z.e = - __kernel_sinl(z.e);
+	    break;
+	case 3:
+	    z.e = - __kernel_cosl(z.e);
+	    break;
+	}
+	
+	return (s ? - z.e : z.e);
+}
Index: msun/src/s_tanl.c
===================================================================
RCS file: msun/src/s_tanl.c
diff -N msun/src/s_tanl.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ msun/src/s_tanl.c	9 Oct 2007 21:52:32 -0000
@@ -0,0 +1,115 @@
+/*-
+ * Copyright (c) 2007 Steven G. Kargl
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Compute tan(x) for x where x is reduced to y = x - k * pi / 2.
+ * Limited testing on pseudorandom numbers drawn within [0:4e8] shows
+ * an accuracy of <= 1.5 ULP where 247024 values of x out of 40 million
+ * possibles resulted in tan(x) that exceeded 0.5 ULP (ie., 0.6%).
+ */
+
+#include "math.h"
+#include "math_private.h"
+#include "fpmath.h"
+
+/*
+ * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi.
+ * The table was taken from e_rem_pio2.c.
+ */
+static const int32_t two_over_pi[] = {
+0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, 
+0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A, 
+0x424DD2, 0xE00649, 0x2EEA09, 0xD1921C, 0xFE1DEB, 0x1CB129, 
+0xA73EE8, 0x8235F5, 0x2EBB44, 0x84E99C, 0x7026B4, 0x5F7E41, 
+0x3991D6, 0x398353, 0x39F49C, 0x845F8B, 0xBDF928, 0x3B1FF8, 
+0x97FFDE, 0x05980F, 0xEF2F11, 0x8B5A0A, 0x6D1F6D, 0x367ECF, 
+0x27CB09, 0xB74F46, 0x3F669E, 0x5FEA2D, 0x7527BA, 0xC7EBE5, 
+0xF17B3D, 0x0739F7, 0x8A5292, 0xEA6BFB, 0x5FB11F, 0x8D5D08, 
+0x560330, 0x46FC7B, 0x6BABF0, 0xCFBC20, 0x9AF436, 0x1DA9E3, 
+0x91615E, 0xE61B08, 0x659985, 0x5F14A0, 0x68408D, 0xFFD880, 
+0x4D7327, 0x310606, 0x1556CA, 0x73A8C9, 0x60E27B, 0xC08C6B, 
+};
+
+#define PIO4	(long double) 0xc.90fdaa22168c235p-4L     /* pi/4 */
+#define TWO24	(long double) 1.67772160000000000000e+07L /* 2**24 */
+
+long double
+tanl(long double x)
+{
+	union IEEEl2bits z;
+	int i, e0, s;
+	int nx = 3, prec = 2;
+	double xd[3], yd[2];
+	long double arg;
+
+	z.e = x;
+	s = z.bits.sign;
+	z.bits.sign = 0;
+
+	/* If x = +-0, then tan(x) = +-0. */
+	if ((z.bits.manh | z.bits.manl) == 0)
+		return (x);
+
+	/* If x = NaN or Inf, then tan(x) = NaN. */
+	if (z.bits.exp == 32767)
+		return ((x - x) / (x - x));
+
+	/* If x is a subnormal number, then tan(x) = x */
+	if (z.bits.exp == 0)
+		return (x);
+
+	/* Optimize the case where x is already within range. */
+	if (z.e < PIO4)
+		return  __kernel_tanl(x);
+
+	/* Split z.e into a 24-bit representation. */
+	e0 = ilogbl(z.e) - 23;
+	z.e = scalbnl(z.e, -e0);
+	for (i = 0; i < nx; i++) {
+		xd[i] = (double) ((int32_t) z.e);
+		z.e = (z.e - xd[i]) * TWO24;
+	}
+	xd[i] = (double) ((int32_t) z.e);
+	
+	/* yd contains the pieces of xd rem pi/2 such that |yd| < pi/4. */
+	e0 = __kernel_rem_pio2(xd, yd, e0, nx, prec, two_over_pi);
+
+	for (z.e = 0, i = nx - 1; i >= 0; i--)
+		z.e += (long double) yd[i];
+
+	switch (e0 & 3) {
+	case 0:
+	case 2:
+	    z.e = __kernel_tanl(z.e);
+	    break;
+	case 1:
+	case 3:
+	    z.e = - 1.L / __kernel_tanl(z.e);
+	    break;
+	}
+	
+	return (s ? - z.e : z.e);
+}
_______________________________________________
freebsd-standards_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-standards
To unsubscribe, send any mail to "freebsd-standards-unsubscribe_(_at_)_freebsd_(_dot_)_org"