AA-sort is an integer sorting algorithm, which exploits SIMD and multi-core.
It is proposed by H. Inoue, T. Moriyama, H. Komatsu, and T. Nakatani at 2007(see "A high-performance sorting algorithm for multicore single-instruction multiple-data processors").
I tried to implement it on x86/x64 with SSE4(for only one processor) and verified that it is 2.8~4 times faster than std::sort(STL) for random data.
The source code is https://github.com/herumi/opti/blob/master/intsort.hpp and implementation detail is AA-sort with SSE4.1.
Showing posts with label x64. Show all posts
Showing posts with label x64. Show all posts
2012-06-20
2011-08-27
fast double precision exponential function with SSE
I make a fast double precision exponential function using SSE2.
fmath.hpp (https://github.com/herumi/fmath, fast approximate float function fmath)
The function double fmath::expd(double) defined in fmath.hpp is about five time faster than std::exp of gcc-4.6 on 64-bit Linux and about two point five faster than that of Visual Studio 2010 on 64-bit Windows.
The error of rms (Root Mean Square) for 1000000 points generated from standard normal distribution is about 1.117645e-16.
The source code for benchmark is fastexp.cpp, which requires Xbyak.
I write some results for various environments in the comment of the header of fastexp.cpp.
Moreover, fmath.hpp provies fmath::exp(float) and fmath::log(float).
These functions are also 2~5 times faster than those of standard library.
Let's try it if you want speed.
fmath.hpp (https://github.com/herumi/fmath, fast approximate float function fmath)
CPU | OS | compiler | std::exp | fmath::expd | one element for fmath::expd_v(array version) |
---|---|---|---|---|---|
Xeon X5650 2.67GHz | 64-bit Linux | gcc 4.6.0 | 128.89 | 27.38 | 17.84 |
i7-2600 3.4GHz | 64-bit Linux | gcc 4.4.5 | 69.11 | 12.10 | 8.25 |
i7-2600 3.4GHz | 64-bit Windows 7 | VC10 | 36.33 | 14.37 | 7.08 |
The function double fmath::expd(double) defined in fmath.hpp is about five time faster than std::exp of gcc-4.6 on 64-bit Linux and about two point five faster than that of Visual Studio 2010 on 64-bit Windows.
The error of rms (Root Mean Square) for 1000000 points generated from standard normal distribution is about 1.117645e-16.
The source code for benchmark is fastexp.cpp, which requires Xbyak.
I write some results for various environments in the comment of the header of fastexp.cpp.
Moreover, fmath.hpp provies fmath::exp(float) and fmath::log(float).
These functions are also 2~5 times faster than those of standard library.
Let's try it if you want speed.
Subscribe to:
Posts (Atom)