SPIRAL Software Research

On this page we overview software that we developed in this project. We have three types of projects: (a) program generators for particular problem domains, (b) hand-optimized software for certain problems, and (c) tools related to performance

Open Source SPIRAL System

Open Source SPIRAL is available here under non-viral license (BSD-style license). This is an initial version, and there will be an ongoing effort to open source whole system. Please let us know which parts of SPIRAL you are most interested in. Commercial support is available via SpiralGen, Inc.

SPIRAL was developed over 20 years by the SPIRAL team under funding from DARPA (OPAL, DESA, HACMS, PERFECT, BRASS), NSF, ONR, DoD HPC, JPL, DoE, CMU SEI, Intel, Nvidia, and Merury. The open sourcing of SPIRAL is an ongoing effort. The initial open source version of SPIRAL was supported by DARPA PERFECT.

Please subscibe to spiral-info@lists.andrew.cmu.edu to stay up-to-date regarding SPIRAL updates and new releases.

Spiral

Energy Efficient High Performance through Application-Specific Processor/Program Co-Synthesis

In DARPA PERFECT we pursue the goal of developing technology to achieve power efficiency of 75 GFLOPS/W at 7nm for DoD-related applications by deploying algorithm/architecture co-synthesis that leverages 3D chip stacking technology.
Go to the PERFECT project page.

FFTE and SPIRAL-Generated Kernels

FFTE is a Fortran subroutine library for computing the FFT in one or more dimensions. It includes real, complex, mixed-radix, and parallel transforms. We proposed an implementation of the fast Fourier transform (FFT) targeting the ARM Scalable Vector Extension (SVE), and found that with the ARM compiler, SPIRAL-generated FFT kernels written in SVE intrinsic are up to 3.16 times faster than FFT kernels of FFTE written in Fortran and up to 5.62 times faster than SPIRAL-generated FFT kernels written in C. [From Takahashi and Franchetti, "FFTE on SVE: SPIRAL-Generated Kernels," Proceedings of HPC Asia 2020, Jan. 2020, pp. 114-122]
Go to the FFTE project page.

FFTX and SPIRAL

FFTX is the exascale follow-on to the FFTW open source discrete FFT package for executing the Fast Fourier Transform as well as higher-level operations composed of linear operations combined with DFT transforms. At the heart of FFTX is a build-time code generator, SPIRAL, that produces very high performance kernels targeted to their specific uses and platform environments.
Go to the FFTX project page.

NTTX and SPIRAL

NTTX is an FFTX-like package that leverages SPIRAL's capability of autonomous code generation and platform-based autotuning to generate high-performance code of Number Theoretic Transforms (NTTs), which are widely used to accelerate Fully Homomorphic Encryption (FHE), a post-quantum encryption scheme. Mirroring the structure of FFTW and FFTX, the NTTX package offers FFTW-style C/C++ API in line with FFTX-style code generation.
Go to the NTTX project page.

Semantics Lifting via SPIRAL

We investigate reversing the well-established code generation process in SPIRAL to stepwisely lift the semantics of source code, with a particular focus on scientific kernels. To achieve this, we develop a novel extension to the SPIRAL system leveraging symbolic execution and theorem proving.
Go to the Semantics Lifting project page.

HELIX: Formally Verified SPIRAL

HELIX is a formally verified language and rewriting engine for generating high-performance implementations for a variety of numerical algorithms. Based on the existing SPIRAL system, HELIX adds the rigor of formal verification of its correctness using the Coq proof assistant.
Go to the HELIX project page.

Quantum Circuit Optimization with SPIRAL

In the SPIRAL quantum package, we define the primitives and directives necessary to leverage SPIRAL’s code generation framework for quantum circuit optimization. The input to our system is a circuit description written in terms of qubit transform objects. SPIRAL outputs both a simplified tensor expression representing the optimized circuit and QASM code that can be executed on a quantum computer.
Go to the Quantum Circuit Optimization page.

The SnowWhite System

Since the inception of compiler research the Holy Grail has been to devise a system that provides high level abstraction (programmers express their intent as concisely as in an algorithms textbook), and an automatic system that translates these programs or specifications into executables targeting an ever-evolving landscape of platforms, extracting close-to-optimal performance on all these platforms. The SnowWhite effort works towards addressing this problem by adding a novel AI approach to compiler analysis: It introduces high level reasoning to orchestrate the complex components and enables the systems to “understand” the computation much like human experts would do.
Go to the SnowWhite System page.

Generating Hyper-Portable Future-Proof Computational Kernels with SPIRAL

In the DARPA BRASS project we propose a framework for the development of long-lived, DARPA-relevant, complex software systems that is robust to changes in the physical and logical resources provided by their ecosystem.
Go to the BRASS project page.

High Assurance SPIRAL: Scalable and Performance Portable Domain-Specific Control System Synthesis

High-Assurance SPIRAL is a system for the synthesis of high-assurance implementations of controllers for vehicular systems that are executed in today's and future high-performance embedded system processors.
Go to the High Assurance SPIRAL (HACMS project) page.

SPIRAL in Scala

We are doing research on language environments that support the constructions of program generators for performance. As an example, we built a Spiral prototype n Scala using LMS (staging). Doing so, various transformations and features can be implemented elegantly.
Go to the Spiral-Scala page.

SPIRAL Program Generator for Transforms

The SPIRAL Code Generator is our flagship. Completely autonomous code generation, optimization, and platform adaptation for discrete signal transforms.
Go to the code generator site.

SPIRAL Program Generator for Linear Algebra

We developed a program generator, called LGen, for basic linear algebra computations. The design is very similar to the design of Spiral for transforms (see above).
Go to LGen site.

SPIRAL Program Generator for Viterbi Decoding

We provide an online generator for Viterbi decoders (in software) for convolutional codes.
Go to the Viterbi decoder site
.

Applying the Roofline Model

We have implemented a measurement framework to create roofline plots with measured performance data.
Go to the roofline site.

Also note our generalization of the rofline model.

Adaptive Sorting Library

We have developed a sorting library that adapts itself to platform characteristics and to the structure or statistics to the input to achieve better performance. Go to the sorting site.

Program Generation for the All-Pairs Shortest Path Problem

We have developed a program generator for the all-pairs shortest path problem. It generates high performance scalar and vector code tuned to a given platform.
Go to the shortest path website.

Multiplierless Constant Multiplication

We have developed algorithms and tools that generated multiplier blocks that multiply by one or several constants "multiplierless," that is, using only additions, subtractions, and shifts.
Go to the multiplierless website.

WHT (Walsh-Hadamard Transform) Package

We developed a self-adaptable, easy-to-install package for efficently computing the Walsh-Hadamard transform on single processor and SMP platforms. The design of the package is based on an FFT package by Sebastian Egner. We use this package to experiment with new optimization techniques to be incorporated into the SPIRAL system (above).
Go to the WHT site
.

Fast Backprojection Package

We have developed a flexible C implementation of the fast hierarchical backprojection algorithm to investigate performance trade-offs with respect to runtime and accuracy.
Go to the backprojection site.

Performance Counter Super-Resolution

We have developed a tool to obtain high resolution performance profiles for a given program. For example, the number of cache misses during the execution recorded in very small time steps.
Go to the website
.

Software Generation for the Cell Broadband Engine (Cell BE)

We have generated software (for the DFT and other kernels) for the Cell BE chip multiprocessor.
Go to the Cell BE site.

Optimized Sparse FFT

We have optimized the recent new sparse FFT algorithm from Hassanieh et al. at MIT.
Go to the sparse FFT site.

Generalized Roofline Model: Bottleneck Analysis

We have developed a generalization of the roofline model to identify performance bottlenecks.
Go to the generalized roofline page.