Last update: Thu Dec 5 17:59:33 2002
Since IEEE 754 arithmetic is often implemented by a combination of hardware and software, operands that are exceptional values (subnormal, Infinity, or NaN), or that result in exceptional values, can be expensive at run time, compared to normal operands, because they must be handled in software.
To investigate this further, the benchmark program timops.c and the shell script timops.sh to run it for each supported precision, together with the associated files Makefile, ieeeftn.h, second.c, and store.c, were used to measure the performance hit from exceptional values in a wide range of architectures.
The benchmark program contains a loop whose trip count for normal operands is adjusted to be at least one second, and then the same trip count is used to run that loop again with up to six types of operands whose
The particular operand values depend on the floating-point precision and on a IEEE 754 floating-point system, but are otherwise independent of the host CPU architecture.
On most current RISC systems, exceptional values are handled by software, but the trap to that software is transparent to the user, apart from taking longer than a hardware implementation would require.
One notable exception to user transparency is the Compaq (formerly DEC) Alpha architecture. Its designers chose to implement a heavily pipelined CPU that (except for the most recent Alpha 21264 and 21364 CPUs) cannot handle exceptional values. The default for C, C++, and Fortran compilers under both Compaq/DEC OSF/1 and GNU/Linux operating systems is to flush underflows abruptly to zero, and to immediately terminate execution on encountering an operand that is subnormal, Infinity, or NaN, or for which the instruction would generate a NaN or Infinity.
In order to produce IEEE 754 nonstop behavior on Compaq/DEC Alpha systems, special compilation options are required:
These options cause the compilers to generate different floating-point instructions that cause traps to software for exceptional operands or results, and in addition, cause the insertion of trap barrier instructions after floating-point operations. The purpose of the latter is to flush the instruction pipeline, allowing precise determination of the interrupt location, so that the software handler can find the instruction and its operands, and complete the job.
Because instruction pipelining is extremely critical for modern high-performance CPUs, it should be expected that the performance hit from IEEE 754 nonstop behavior on Alpha processors may be severe, and that expectation is clearly demonstrated in the tables below.
The complete output data from which the tables below are derived are recorded in timops.raw, which should be consulted for details of operating systems, and absolute times. The timops.awk program filters that file to produce the table entries. Numerical entries in the last 5 columns are the slowdown (when > 1) compared to the loop with normal values.
There are several observations to make about the data in the tables below:
#if defined(__sgi) #includestatic void flush_to_zero(int on_off) /* see "man sigfpe" on SGI IRIX 6.x for documentation */ { union fpc_csr n; n.fc_word = get_fpc_csr(); n.fc_struct.flush = (on_off ? 1 : 0); set_fpc_csr(n.fc_word); } #endif ... #if defined(__sgi) flush_to_zero(0); /* to get support for subnormals! */ #endif
-----------------------------------------------------------------------------------------------
CPU MHz Cmpiler fp_size ufl-> ufl-> ofl-> NaN Inf
subnorm zero Inf
-----------------------------------------------------------------------------------------------
AMD Athlon 1400 gcc 4 4.974 3.376 1.000 1.000 0.991
AMD Athlon 1400 gcc 8 4.802 3.198 1.009 1.000 1.009
AMD Athlon 1400 gcc 12 1.007 1.000 1.000 1.007 1.013
DEC Alpha 21064 EV4 100 gcc 4 1.001 1.007 -n/a- -n/a- -n/a-
DEC Alpha 21064 EV4 100 gcc 4 8.252 8.139 8.261 7.532 7.517
DEC Alpha 21064 EV4 100 gcc 8 1.006 0.999 -n/a- -n/a- -n/a-
DEC Alpha 21064 EV4 100 gcc 8 8.835 8.697 9.572 7.886 7.815
DEC Alpha 21164 EV5 466 c89 4 1.000 1.000 -n/a- -n/a- -n/a-
DEC Alpha 21164 EV5 466 c89 4 43.636 34.727 21.879 21.121 21.439
DEC Alpha 21164 EV5 466 c89 8 1.000 1.000 -n/a- -n/a- -n/a-
DEC Alpha 21164 EV5 466 c89 8 66.043 49.217 27.913 27.130 27.333
DEC Alpha 21264 667 c89 4 1.000 0.989 -n/a- -n/a- -n/a-
DEC Alpha 21264 667 c89 4 53.359 42.239 0.989 1.000 0.989
DEC Alpha 21264 667 c89 8 1.000 1.010 -n/a- -n/a- -n/a-
DEC Alpha 21264 667 c89 8 78.552 57.885 1.000 1.021 1.000
DEC Alpha 21264 667 c89 16 0.986 1.014 -n/a- -n/a- -n/a-
DEC Alpha 21264 667 c89 16 1.057 1.000 0.986 0.957 0.986
HP PA-RISC 1.1 7100LC 80 cc 4 12.058 12.178 1.000 92.251 1.000
HP PA-RISC 1.1 7100LC 80 cc 8 16.955 16.841 1.000 11.278 1.000
IBM PowerPC 133 cc 4 0.981 1.000 0.981 0.991 0.981
IBM PowerPC 133 cc 8 1.007 1.007 1.014 1.000 1.007
IBM PowerPC 133 cc 8 1.014 1.014 1.014 1.007 1.000
IBM PowerPC 166 cc 4 0.991 1.000 0.991 0.991 0.991
IBM PowerPC 166 cc 8 1.014 1.020 1.020 1.014 1.000
IBM PowerPC 166 cc 16 1.009 1.009 0.991 0.991 0.991
IBM PowerPC 233 gcc 4 1.006 1.013 1.026 1.000 1.000
IBM PowerPC 233 gcc 8 1.006 1.013 1.019 1.000 1.000
IBM PowerPC 533 cc 4 1.009 1.009 1.018 1.000 1.009
IBM PowerPC 533 cc 8 0.991 0.991 0.991 0.983 0.991
IBM PowerPC 533 cc 8 0.991 1.000 1.000 0.991 1.000
Intel IA-64 (emulated on IA-32) 600 gcc 4 1.012 1.018 1.009 0.941 0.953
Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.009 0.994 0.915 0.921
Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.011 0.998 0.917 0.923
Intel Pentium II 450 cc 4 5.967 3.383 3.367 3.333 3.083
Intel Pentium II 450 cc 8 3.655 2.236 2.227 2.291 2.145
Intel Pentium II 450 cc 12 0.984 1.000 1.000 2.129 2.000
Intel Pentium II (Klamath) 300 cc 4 5.982 3.390 3.373 3.302 3.035
Intel Pentium II (Klamath) 300 cc 8 5.824 3.249 3.213 3.301 3.036
Intel Pentium II (Klamath) 300 cc 12 1.000 0.999 2.468 2.659 2.467
Intel Pentium III 1266 gcc 4 6.014 3.408 3.394 3.317 3.056
Intel Pentium III 1266 gcc 8 5.852 3.268 3.232 3.317 3.056
Intel Pentium III 1266 gcc 12 1.010 1.010 1.000 2.588 2.402
Intel Pentium III (Katmai) 600 gcc 4 6.266 3.538 3.545 3.490 3.224
Intel Pentium III (Katmai) 600 gcc 8 6.176 3.437 3.423 3.514 3.246
Intel Pentium III (Katmai) 600 gcc 12 1.036 1.018 2.518 2.491 2.321
MIPS R10000 180 c89 4 0.991 1.000 1.000 27.596 0.991
MIPS R10000 180 c89 8 0.991 0.991 1.000 27.254 1.000
MIPS R10000 180 c89 16 1.134 1.134 1.127 0.606 0.606
MIPS R10000 195 c89 4 1.010 1.010 1.000 26.346 1.000
MIPS R10000 195 c89 8 1.010 1.010 1.000 26.798 1.000
MIPS R10000 195 c89 16 1.113 1.113 1.120 0.624 0.632
MIPS R4400 150 c89 4 25.635 25.912 1.081 22.858 0.993
MIPS R4400 150 c89 8 26.074 26.007 1.074 22.107 1.013
MIPS R4400 150 c89 16 31.128 10.701 1.137 8.493 0.531
MIPS R4400 175 c89 4 27.354 27.562 1.054 24.492 0.977
MIPS R4400 175 c89 8 27.902 27.826 1.045 23.977 0.962
MIPS R4400 175 c89 16 33.945 11.522 1.132 9.495 0.533
MIPS R5000 180 c89 4 1.062 1.076 1.055 26.090 1.055
MIPS R5000 180 c89 4 1.076 1.076 1.069 31.472 1.083
MIPS R5000 180 c89 8 1.047 1.068 1.054 24.223 1.061
MIPS R5000 180 c89 8 1.054 1.068 1.047 24.439 1.054
MIPS R5000 180 c89 16 1.206 1.198 1.222 0.532 0.540
MIPS R5000 180 c89 16 1.222 1.198 1.230 0.532 0.540
Sun UltraSPARC 400 c89 4 16.675 1.031 0.995 1.015 1.015
Sun UltraSPARC 400 c89 8 14.527 1.015 1.053 1.008 1.015
Sun UltraSPARC 400 c89 16 1.015 1.026 1.031 0.701 0.716
Sun UltraSPARC II 167 c89 4 16.761 1.017 1.009 1.009 1.017
Sun UltraSPARC II 167 c89 8 12.586 1.006 0.994 1.000 1.006
Sun UltraSPARC II 167 c89 16 1.002 1.000 1.002 0.705 0.701
Sun UltraSPARC II 270 c89 4 18.618 0.993 0.993 1.000 0.986
Sun UltraSPARC II 270 c89 8 14.640 0.995 1.000 0.995 1.000
Sun UltraSPARC II 270 c89 16 1.000 1.000 1.000 0.697 0.701
Sun UltraSPARC II 300 c89 4 16.961 1.008 1.008 1.008 1.008
Sun UltraSPARC II 300 c89 8 13.068 1.000 1.000 1.011 1.000
Sun UltraSPARC II 300 c89 16 1.000 1.004 0.989 0.706 0.709
Sun UltraSPARC II 400 c89 4 16.777 1.005 1.000 1.000 1.000
Sun UltraSPARC II 400 c89 8 12.818 1.008 1.000 1.000 1.000
Sun UltraSPARC II 400 c89 16 1.010 1.010 1.010 0.694 0.694
Sun UltraSPARC II 440 c89 4 16.824 0.995 0.995 1.000 1.000
Sun UltraSPARC II 440 c89 8 13.198 1.008 1.016 1.016 1.000
Sun UltraSPARC II 440 c89 16 1.021 1.000 1.021 0.688 0.704
Sun UltraSPARC IIe 500 c89 4 16.981 1.000 1.013 1.000 1.006
Sun UltraSPARC IIe 500 c89 8 13.179 1.000 1.000 1.009 1.000
Sun UltraSPARC IIe 500 c89 16 0.994 1.000 0.994 0.697 0.690
Sun UltraSPARC III 750 c89 4 13.417 0.942 0.897 0.942 0.910
Sun UltraSPARC III 750 c89 8 11.223 1.000 0.995 1.000 1.000
Sun UltraSPARC III 750 c89 16 0.992 1.000 0.992 0.659 0.675
TI SuperSPARC Viking 40 gcc 4 1.000 1.009 0.991 0.991 0.991
TI SuperSPARC Viking 40 gcc 8 0.996 1.000 0.984 0.988 0.984
TI SuperSPARC Viking 40 gcc 8 1.016 1.012 1.000 1.000 1.000
TI SuperSPARC Viking/MXCC 50 gcc 4 1.005 1.000 0.995 0.995 0.989
TI SuperSPARC Viking/MXCC 50 gcc 8 1.005 1.000 0.990 0.990 0.995
TI SuperSPARC Viking/MXCC 50 gcc 8 1.010 1.010 1.000 1.000 0.995
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
CPU MHz Cmpiler fp_size ufl-> ufl-> ofl-> NaN Inf
subnorm zero Inf
-----------------------------------------------------------------------------------------------
AMD Athlon 1400 gcc 4 4.974 3.376 1.000 1.000 0.991
DEC Alpha 21064 EV4 100 gcc 4 1.001 1.007 -n/a- -n/a- -n/a-
DEC Alpha 21064 EV4 100 gcc 4 8.252 8.139 8.261 7.532 7.517
DEC Alpha 21164 EV5 466 c89 4 1.000 1.000 -n/a- -n/a- -n/a-
DEC Alpha 21164 EV5 466 c89 4 43.636 34.727 21.879 21.121 21.439
DEC Alpha 21264 667 c89 4 1.000 0.989 -n/a- -n/a- -n/a-
DEC Alpha 21264 667 c89 4 53.359 42.239 0.989 1.000 0.989
HP PA-RISC 1.1 7100LC 80 cc 4 12.058 12.178 1.000 92.251 1.000
IBM PowerPC 133 cc 4 0.981 1.000 0.981 0.991 0.981
IBM PowerPC 166 cc 4 0.991 1.000 0.991 0.991 0.991
IBM PowerPC 233 gcc 4 1.006 1.013 1.026 1.000 1.000
IBM PowerPC 533 cc 4 1.009 1.009 1.018 1.000 1.009
Intel IA-64 (emulated on IA-32) 600 gcc 4 1.012 1.018 1.009 0.941 0.953
Intel Pentium II 450 cc 4 5.967 3.383 3.367 3.333 3.083
Intel Pentium II (Klamath) 300 cc 4 5.982 3.390 3.373 3.302 3.035
Intel Pentium III 1266 gcc 4 6.014 3.408 3.394 3.317 3.056
Intel Pentium III (Katmai) 600 gcc 4 6.266 3.538 3.545 3.490 3.224
MIPS R10000 180 c89 4 0.991 1.000 1.000 27.596 0.991
MIPS R10000 195 c89 4 1.010 1.010 1.000 26.346 1.000
MIPS R4400 150 c89 4 25.635 25.912 1.081 22.858 0.993
MIPS R4400 175 c89 4 27.354 27.562 1.054 24.492 0.977
MIPS R5000 180 c89 4 1.062 1.076 1.055 26.090 1.055
MIPS R5000 180 c89 4 1.076 1.076 1.069 31.472 1.083
Sun UltraSPARC 400 c89 4 16.675 1.031 0.995 1.015 1.015
Sun UltraSPARC II 167 c89 4 16.761 1.017 1.009 1.009 1.017
Sun UltraSPARC II 270 c89 4 18.618 0.993 0.993 1.000 0.986
Sun UltraSPARC II 300 c89 4 16.961 1.008 1.008 1.008 1.008
Sun UltraSPARC II 400 c89 4 16.777 1.005 1.000 1.000 1.000
Sun UltraSPARC II 440 c89 4 16.824 0.995 0.995 1.000 1.000
Sun UltraSPARC IIe 500 c89 4 16.981 1.000 1.013 1.000 1.006
Sun UltraSPARC III 750 c89 4 13.417 0.942 0.897 0.942 0.910
TI SuperSPARC Viking 40 gcc 4 1.000 1.009 0.991 0.991 0.991
TI SuperSPARC Viking/MXCC 50 gcc 4 1.005 1.000 0.995 0.995 0.989
AMD Athlon 1400 gcc 8 4.802 3.198 1.009 1.000 1.009
DEC Alpha 21064 EV4 100 gcc 8 1.006 0.999 -n/a- -n/a- -n/a-
DEC Alpha 21064 EV4 100 gcc 8 8.835 8.697 9.572 7.886 7.815
DEC Alpha 21164 EV5 466 c89 8 1.000 1.000 -n/a- -n/a- -n/a-
DEC Alpha 21164 EV5 466 c89 8 66.043 49.217 27.913 27.130 27.333
DEC Alpha 21264 667 c89 8 1.000 1.010 -n/a- -n/a- -n/a-
DEC Alpha 21264 667 c89 8 78.552 57.885 1.000 1.021 1.000
HP PA-RISC 1.1 7100LC 80 cc 8 16.955 16.841 1.000 11.278 1.000
IBM PowerPC 133 cc 8 1.007 1.007 1.014 1.000 1.007
IBM PowerPC 133 cc 8 1.014 1.014 1.014 1.007 1.000
IBM PowerPC 166 cc 8 1.014 1.020 1.020 1.014 1.000
IBM PowerPC 233 gcc 8 1.006 1.013 1.019 1.000 1.000
IBM PowerPC 533 cc 8 0.991 0.991 0.991 0.983 0.991
IBM PowerPC 533 cc 8 0.991 1.000 1.000 0.991 1.000
Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.009 0.994 0.915 0.921
Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.011 0.998 0.917 0.923
Intel Pentium II 450 cc 8 3.655 2.236 2.227 2.291 2.145
Intel Pentium II (Klamath) 300 cc 8 5.824 3.249 3.213 3.301 3.036
Intel Pentium III 1266 gcc 8 5.852 3.268 3.232 3.317 3.056
Intel Pentium III (Katmai) 600 gcc 8 6.176 3.437 3.423 3.514 3.246
MIPS R10000 180 c89 8 0.991 0.991 1.000 27.254 1.000
MIPS R10000 195 c89 8 1.010 1.010 1.000 26.798 1.000
MIPS R4400 150 c89 8 26.074 26.007 1.074 22.107 1.013
MIPS R4400 175 c89 8 27.902 27.826 1.045 23.977 0.962
MIPS R5000 180 c89 8 1.047 1.068 1.054 24.223 1.061
MIPS R5000 180 c89 8 1.054 1.068 1.047 24.439 1.054
Sun UltraSPARC 400 c89 8 14.527 1.015 1.053 1.008 1.015
Sun UltraSPARC II 167 c89 8 12.586 1.006 0.994 1.000 1.006
Sun UltraSPARC II 270 c89 8 14.640 0.995 1.000 0.995 1.000
Sun UltraSPARC II 300 c89 8 13.068 1.000 1.000 1.011 1.000
Sun UltraSPARC II 400 c89 8 12.818 1.008 1.000 1.000 1.000
Sun UltraSPARC II 440 c89 8 13.198 1.008 1.016 1.016 1.000
Sun UltraSPARC IIe 500 c89 8 13.179 1.000 1.000 1.009 1.000
Sun UltraSPARC III 750 c89 8 11.223 1.000 0.995 1.000 1.000
TI SuperSPARC Viking 40 gcc 8 0.996 1.000 0.984 0.988 0.984
TI SuperSPARC Viking 40 gcc 8 1.016 1.012 1.000 1.000 1.000
TI SuperSPARC Viking/MXCC 50 gcc 8 1.005 1.000 0.990 0.990 0.995
TI SuperSPARC Viking/MXCC 50 gcc 8 1.010 1.010 1.000 1.000 0.995
AMD Athlon 1400 gcc 12 1.007 1.000 1.000 1.007 1.013
Intel Pentium II 450 cc 12 0.984 1.000 1.000 2.129 2.000
Intel Pentium II (Klamath) 300 cc 12 1.000 0.999 2.468 2.659 2.467
Intel Pentium III 1266 gcc 12 1.010 1.010 1.000 2.588 2.402
Intel Pentium III (Katmai) 600 gcc 12 1.036 1.018 2.518 2.491 2.321
DEC Alpha 21264 667 c89 16 0.986 1.014 -n/a- -n/a- -n/a-
DEC Alpha 21264 667 c89 16 1.057 1.000 0.986 0.957 0.986
IBM PowerPC 166 cc 16 1.009 1.009 0.991 0.991 0.991
MIPS R10000 180 c89 16 1.134 1.134 1.127 0.606 0.606
MIPS R10000 195 c89 16 1.113 1.113 1.120 0.624 0.632
MIPS R4400 150 c89 16 31.128 10.701 1.137 8.493 0.531
MIPS R4400 175 c89 16 33.945 11.522 1.132 9.495 0.533
MIPS R5000 180 c89 16 1.206 1.198 1.222 0.532 0.540
MIPS R5000 180 c89 16 1.222 1.198 1.230 0.532 0.540
Sun UltraSPARC 400 c89 16 1.015 1.026 1.031 0.701 0.716
Sun UltraSPARC II 167 c89 16 1.002 1.000 1.002 0.705 0.701
Sun UltraSPARC II 270 c89 16 1.000 1.000 1.000 0.697 0.701
Sun UltraSPARC II 300 c89 16 1.000 1.004 0.989 0.706 0.709
Sun UltraSPARC II 400 c89 16 1.010 1.010 1.010 0.694 0.694
Sun UltraSPARC II 440 c89 16 1.021 1.000 1.021 0.688 0.704
Sun UltraSPARC IIe 500 c89 16 0.994 1.000 0.994 0.697 0.690
Sun UltraSPARC III 750 c89 16 0.992 1.000 0.992 0.659 0.675
-----------------------------------------------------------------------------------------------