For historical benchmarks go here. The benchmarks below have been developed in detail in the following reference:
Performance Results with Respect to Machine SizeFFT kernels (codelets) on 1.1 GHz ARM A57: Large FFT on Intel. SPIRAL implements a scratchpad-style double-buffering scheme for better memory performance: References
FFT results on the BlueGene/P supercomputer "Intrepid" at Argonne National Laboratory. 32 racks (32R) are 128k cores: References
Performance Results Across Machine TypesBatch ID FFT on NVIDIA 480GTX Fermi GPU: SPIRAL vs. SUDA 4.0 CUFFT: References
Streaming ID DFT256 on Xilinx Virtex-y XC6VLS760 FPGA: SPIRAL vs. Xilinx LogiCore IP library 12.0: References
ONETEP 2x2x2 upsampling kernel with small odd-sized 3D batch FFTs on 3.5 GHz Intel Haswell 4770K: SPIRAL vs. FFTW and Intel MKL: References
|
Performance Based on Kernel and Application TypePerformance of SPIRAL-generated polar formatting SAR image formation on 3.0 GHz Intel 5160, the 3.0 GHz Intel X9560, and the 2.66 GHz Intel Core i7 920 for 16 and 100 megapixel: References
Software Viterbi decoder on a 3 GHz Intel Core 2 Extreme X9650 for a range of codes: SPIRAL vs. Karn's library: References
End-to-end stencil performance on a 3.4 GHz Intell Core i7-26000K for a range of stencil kernels: PLuTo/PTile [100] together with the SPIRAL backend vs. PTile plus Intel C/C++ compiler: References
Performance of: (a) a rank-4 update generated with LGen vs. MKL, Intel C compiler (icc) compiled code, and LGen disabling structure support; (b) a single iteration of the Kalman filter generated with SLinGen vs. MKL, Eigen, and Intel C compiler (icc) compiled code. Code tested on an Intel Core i7-2600K (Sandy Bridge microarchitecture): (a) (b) References
|