TY - GEN
T1 - High performance non-uniform FFT on modern X86-based multi-core systems
AU - Kalamkar, Dhiraj D.
AU - Trzaskoz, Joshua D.
AU - Sridharan, Srinivas
AU - Smelyanskiy, Mikhail
AU - Kim, Daehyun
AU - Manduca, Armando
AU - Shu, Yunhong
AU - Bernstein, Matt A.
AU - Kaul, Bharat
AU - Dubey, Pradeep
PY - 2012/10/4
Y1 - 2012/10/4
N2 - The Non-Uniform Fast Fourier Transform (NUFFT) is a generalization of FFT to non-equidistant samples. It has many applications which vary from medical imaging to radio astronomy to the numerical solution of partial differential equations. Despite recent advances in speeding up NUFFT on various platforms, its practical applications are still limited, due to its high computational cost, which is significantly dominated by the convolution of a signal between a non-uniform and uniform grids. The computational cost of the NUFFT is particularly detrimental in cases which require fast reconstruction times, such as iterative 3D non-Cartesian MRI reconstruction. We propose novel and highly scalable parallel algorithm for performing NUFFT on x86-based multi-core CPUs. The high performance of our algorithm relies on good SIMD utilization and high parallel efficiency. On convolution, we demonstrate on average 90% SIMD efficiency using SSE, as well up to linear scalability using a quad-socket 40-core Intel® Xeon® E7-4870 Processors based system. As a result, on dual socket Intel® Xeon® X5670 based server, our NUFFT implementation is more than 4x faster compared to the best available NUFFT3D implementation, when run on the same hardware. On Intel® Xeon® E5-2670 processor based server, our NUFFT implementation is 1.5X faster than any published NUFFT implementation today. Such speed improvement opens new usages for NUFFT. For example, iterative multi channel reconstruction of a 240x240x240 image could execute in just over 3 minutes, which is on the same order as contemporary non-iterative (and thus less-accurate) 3D NUFFT-based MRI reconstructions.
AB - The Non-Uniform Fast Fourier Transform (NUFFT) is a generalization of FFT to non-equidistant samples. It has many applications which vary from medical imaging to radio astronomy to the numerical solution of partial differential equations. Despite recent advances in speeding up NUFFT on various platforms, its practical applications are still limited, due to its high computational cost, which is significantly dominated by the convolution of a signal between a non-uniform and uniform grids. The computational cost of the NUFFT is particularly detrimental in cases which require fast reconstruction times, such as iterative 3D non-Cartesian MRI reconstruction. We propose novel and highly scalable parallel algorithm for performing NUFFT on x86-based multi-core CPUs. The high performance of our algorithm relies on good SIMD utilization and high parallel efficiency. On convolution, we demonstrate on average 90% SIMD efficiency using SSE, as well up to linear scalability using a quad-socket 40-core Intel® Xeon® E7-4870 Processors based system. As a result, on dual socket Intel® Xeon® X5670 based server, our NUFFT implementation is more than 4x faster compared to the best available NUFFT3D implementation, when run on the same hardware. On Intel® Xeon® E5-2670 processor based server, our NUFFT implementation is 1.5X faster than any published NUFFT implementation today. Such speed improvement opens new usages for NUFFT. For example, iterative multi channel reconstruction of a 240x240x240 image could execute in just over 3 minutes, which is on the same order as contemporary non-iterative (and thus less-accurate) 3D NUFFT-based MRI reconstructions.
KW - Non-uniform FFT
KW - Parallelization
KW - Scalability
KW - Vectorization
UR - http://www.scopus.com/inward/record.url?scp=84866847512&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866847512&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2012.49
DO - 10.1109/IPDPS.2012.49
M3 - Conference contribution
AN - SCOPUS:84866847512
SN - 9780769546759
T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
SP - 449
EP - 460
BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Y2 - 21 May 2012 through 25 May 2012
ER -