Benchmarks: SPIRAL Generated Programs

For historical benchmarks go here. The benchmarks below have been developed in detail in the following reference:

Performance Results with Respect to Machine Size

FFT kernels (codelets) on 1.1 GHz ARM A57:


Large FFT on Intel. SPIRAL implements a scratchpad-style double-buffering scheme for better memory performance:

References


FFT results on the BlueGene/P supercomputer "Intrepid" at Argonne National Laboratory. 32 racks (32R) are 128k cores:

References

 

Performance Results Across Machine Types

Batch ID FFT on NVIDIA 480GTX Fermi GPU: SPIRAL vs. SUDA 4.0 CUFFT:

References


Streaming ID DFT256 on Xilinx Virtex-y XC6VLS760 FPGA: SPIRAL vs. Xilinx LogiCore IP library 12.0:

References


ONETEP 2x2x2 upsampling kernel with small odd-sized 3D batch FFTs on 3.5 GHz Intel Haswell 4770K: SPIRAL vs. FFTW and Intel MKL:

References

 

Performance Based on Kernel and Application Type

Performance of SPIRAL-generated polar formatting SAR image formation on 3.0 GHz Intel 5160, the 3.0 GHz Intel X9560, and the 2.66 GHz Intel Core i7 920 for 16 and 100 megapixel:

References


Software Viterbi decoder on a 3 GHz Intel Core 2 Extreme X9650 for a range of codes: SPIRAL vs. Karn's library:

References


End-to-end stencil performance on a 3.4 GHz Intell Core i7-26000K for a range of stencil kernels: PLuTo/PTile [100] together with the SPIRAL backend vs. PTile plus Intel C/C++ compiler:

References

  • T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan.
    A stencil compiler for short-vector SIMD architectures
    In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ser. ICS '13. New York, NY, USA: ACM, 2013, pp. 13-24. [Online].

Performance of: (a) a rank-4 update generated with LGen vs. MKL, Intel C compiler (icc) compiled code, and LGen disabling structure support; (b) a single iteration of the Kalman filter generated with SLinGen vs. MKL, Eigen, and Intel C compiler (icc) compiled code. Code tested on an Intel Core i7-2600K (Sandy Bridge microarchitecture):


(a)



(b)


References