Coconut: COde CONstructing User Tool
Coconut is a developing system for high-assurance, high-performance software. It was used to develop a library of special functions for the Cell BE processor, which is distributed in the Cell BE SDK 3.0 as MASS. Average performance is 4X better than the alternative hand-tuned C library, SimdMath.
Coconut has been successful where patterns of efficient hardware-specific computation can be captured as higher-order functions and encoded in a Domain Specific Language embedded in Haskell. Patterns include efficient control structures not expressible in C, e.g., the MultiLoop, and efficient uses of SIMD instructions which require significant compile-time computation for pattern specialization. Some patterns interact with a novel instruction scheduler called Explicitly Staged Software Pipelining, based on a min-cut approach, which outperforms SWING modulo scheduling in our tests.
A less developed aspect of Coconut is the parallel production of proofs of correctness along with executables. Current work aims to prove only limited properties about programs---the ones most likely to be broken---creative use of SIMD instructions, and parallelization. Coconut intermediate code is represented as nested code (hyper)graphs. At the lowest level, we transform acyclic loop bodies to remove the effect of SIMDization, and produce machine and/or human readable specifications. This has been used to verify opaque patterns of optimizing linear algebra for SIMD processors.
Such code graphs are embedded in higher levels containing control flow, first single-threaded control flow optimized for ILP, and then parallel control-flow, optimized to hide communication latency. At this level control flow is restricted to allow peak utilization of multi-core hardware, but enable efficient compile-time verification of soundness. Soundness, in this context, means that the parallelized program can be transformed into a code graph without synchronizing control flow, because every execution can be shown to produce the same result. Think of it as reducing the parallel debugging effort to the single-threaded debugging problem by eliminating the non-determinism inherent in parallel code. I will give a formal language description of the language, and the O(n) algorithm which proves soundness and produces the equivalent ``single threaded'' code graph.
Speaker: Christopher Anand
Christopher Anand is a professor of Computing and Software at McMaster University. His main research areas are software correctness, high performance computation, and automatic code generation.
He has also founded the company Optimal Computational Algorithms to provide hardware-specific libraries for scientific applications on novel architectures.
Google Tech Talks
March, 7 2008