# Multi-result supercompilation as a tool for program analysis

## Ilya Klyuchnikov and Sergei A. Romanenko

(Keldysh Institute of Applied Mathematics, Moscow)

### A talk given at Luleå University of Technology, Sweden

April 28, 2011

## Abstract

From the very beginning supercompilation was considered by V.F. Turchin as a tool for both program optimization and program analysis. However, most research work in the field of supercompilation has been devoted to the use of supercompilation for program optimization.

It is evident that a tool is to be adjusted to its intended uses. Hence, a supercompiler meant for program analysis may be different from the one meant for optimization.

For example, when developing HOSC (an analyzing supercompiler for a higher-order, call-by-name functional language), some details of the supercompiler had to be finely tuned (in particular, the definition of the homeomorphic embedding relation).

It was also found that the power of “pure” supercompilation, when used as a program analysis tool is limited. Thus, an attempt was made to combine supercompilation with other technique. In other words, the idea was to consider supercompilation as a “primitive operation” and use supercompilers as “building blocks” for constructing more complex systems, which may be considered as an instance of “metasystem transition” (in V.F. Turchin’s terms). Examples:

- proving the equivalence of expressions by means of supercompilation;
- proving improvement lemmas by means of supercompilation;
- two-level supercompilation.

What is the main problem related to tuning supercompilation for program analysis? In the case of optimizing supercompilation the goal is relatively easy to formalize: residual programs should be as “efficient” as possible (in terms of their size and execution speed). In other words, the criteria are quantitative. But, in the case of program analysis, the goal is to transform the source program into a residual program that, in a sense, is “easier to understand”. This goal is qualitative in nature and difficult to formulate (and, therefore, extremely difficult to formalize).

If we consider the structure of a traditional supercompiler, we can identify a number of “points of choice”, in which the supercompiler has a number of possibilities (to perform a driving step, to fold, to generalize, to split) and has to make a decision, different decisions leading to different (and yet correct) residual programs. Which decisions are “good” and which are “bad”? In the case of analyzing supercompilation, it is difficult to predict the consequences of a decision. Hence, we are unable to formulate good formal strategies for controlling the behavior of the supercompiler.

**A solution:** let us abandon the classic “deterministic” supercompilation in favor of “multi-result” supercompilation by letting a supercompiler produce a set of residual programs.

**A new problem:** the set of residual programs produced by a multi-result supercompiler may be large (e.g. several thousands of programs). Hence, we need automatic tools for analyzing and filtering residual programs, in order to identify and extract the “interesting” ones.