

HOME  COURSES  TALKS  ARTICLES  GENERICS  LAMBDAS  IOSTREAMS  ABOUT  NEWSLETTER  CONTACT     
Expression Templates


Today templates are used in ways that the inventors of C++ templates certainly hadn't anticipated. Template programming these day includes techniques such as generic programming, compiletime computations, expression template libraries, template metaprogramming, and generative programming, just to name a few. In this article we aim to explain some of the principles of expression templates and more specifically the programming techniques that are used to build expression template libraries.
To say it right away: expression template libraries are complex.
For this reason, pretty much every explanation of expression templates
that we've read so far got rather challenging and advanced pretty rapidly.
The ambitious goal of this article is making complex matters easy so that
they can be understood without getting lost in all the details that you
would find otherwise, if for instance you studied the source code of a
template library. We will try to distill some of the principles of
expression templates; yet full coverage of all aspects of expressions templates
is beyond the scope of the article.
Table of Contents
Where it all started ...I vividly remember the day when a colleague of mine, Erwin Unruh, stepped into one of these C++ standards committee meetings proudly presenting a program that did not compile, yet it calculated the prime numbers (see / UNR /). It emitted error messages when it was compiled and with each of the error messages the next prime number was printed. Of course, it is nonsensical to write programs that do not compile, but this program was deliberately set up so that it would not compile in order to point out that this was a computation that was done at compiletime. There was no executable, nothing happening at run time. The prime number computation was a side effect of the compilation.
This tiny silly program treaded off an avalanche of research and experimentation
in the following years which lead to what is known as template metaprogramming
these days. In this article we will discuss some of the resulting programming
techniques and principles.
Basically, the prime number calculation and many of the techniques that
we will explain in this article build on the fact that instantiation of
templates is a recursive process. When the compiler instantiates
a template it might find that for the instantiation of one template it
might need an instantiation of another template. It then instantiates
that other template, which might require the instantiation of yet another
template, and so on and so forth. Many of the template metaprogramming
techniques take advantage of the recursive nature of the template instantiation
process and use it to perform recursive calculations.
A First Example of a CompileTime Computation  FactorialAs a first example let us calculate the factorial of N at compile time. The factorial of n (the mathematical notation of which is n!) is the product of all number from 1 to n., the factorial of 0 being 1. Normally, without use of templates, you would calculate the factorial by means of recursive function calls. Here is a conceivable runtime implementation:
int factorial (int n)
This function calls itself recursively until n drops down to 0. It would be used like this: cout << factorial(4) << endl; Recursive function calls are expensive, especially since the compiler will probably refuse to inline the recursive calls and we will end up with quite a number of function calls. We can do better than that  using compiletime computations instead. It goes like this: replace the recursive function calls by recursive template instantiations. Instead of providing a function named factorial we will implement a class named factorial. To be more precise it will be a class template named factorial. Here it is:
template <int n>
This class template has neither data nor function members, it just defines an unnamed enum type with exactly one enum value. (As it will turn out later, this enum value factorial::ret will serve as the return value of our compile time calculation.) In order to compute the value of this enum the compiler must instantiate another version of the factorial template, namely the version for n1, and this is what starts the recursive template instantiation. Note, that the template argument of the factorial template is kind of unusual: it is a nontype template argument, namely a constant value of type int. Usually, templates have type arguments like in template <class T> class X { ...}; where T is a placeholder for a concrete type that will later replace the placeholder T when a class like X<int> will be generated from the class template. In our example we have a nontype template argument of type int, which means that this template will be instantiated providing a constant value of type int. Using this factorial class template a user would calculate the factorial of n as follows: cout << factorial<4>::ret << endl; The compiler would recursively instantiate factorial<4>, and factorial<3>, and so on and so forth, until ... actually, until when? How does this recursion come to a halt? Every recursion needs an end. Using the template technique the end of the recursion is provided by means of a template specialization. In our example the specialization is for the case that n equals 0:
template <>
As you can tell from the code snippet above the recursion will stop here because the value of the enum value ret does not depend on any further instantiations of the factorial template anymore. If you are not familiar with template specialization, don't worry. Just make a mental note that a special version of a class template can be provided for special template arguments. In our example we provide a special implementation of the factorial template for the case that the template argument n is 0. And this specialization will end the recursion.
Now, what have we achieved using the template version of the computation
of a factorial? Well, the expression factorial<4>::ret will boil
down to a mere 24 (which is the value of 4!) at run time, that is, in the
executable you will not find any computation at all. Instead there
will simply be the constant value 24 in the executable in all places where
factorial<4>::ret appears in the source code.
Another Example of a CompileTime Computation  Square RootLet's try it again and study another compiletime computation of a value. This time we aim to approximate the square root of N, or more precisely we want to find the integral value greater and closest to the square root of N. Example: the square root of 10 is 3.1622776601; that is, we would like to find the integral value 4, which is the next integer greater than 3.1622776601. Using runtime calculations we can calculate the desired value using C library functions as ceil(sqrt(N)). However, we want to use the approximated square root as the size of an array that we need to declare, but a declaration such asint array[ceil(sqrt(N))]; is illegal because the array size must be a compiletime constant value. For this reason we need to approximate the square root at compiletime.
Remember , what we did in our first example of a compiletime computation:
we took advantage of recursive template instantiation. Again, we
will trigger a recursive template instantiation to approximate the desired
value. Again, we will define a class template that takes a nontype template
argument, namely N, and returns the result in form of a nested value.
Let's call the class template for our square root approximation Root. We
could then declare an array of the size of square root of 10 as
Here is the class template Root:
template <size_t N, size_t Low=1, size_t Upp=N>
static const size_t mean = (Low+Upp)/2;
We don't want to delve into the details here. Just a couple of comments: The class template takes 3 nontype template arguments, 2 of which have defaults. The 3 arguments are
Where does the recursion end? Again, the end of the recursion is specified by means of a template specialization that does not require any further template instantiations. Here is the specialization of the Root class template:
template <size_t N, size_t Mid>
Note that the specialization has only 2 template arguments. This is because the recursion ends when the interval in which the result lies collapses to a single value, namely the desired result.
In our example the recursion will instantiate the template for the following
arguments:
The point to take home from these two examples is that compiletime
computations are often recursive processes that take advantage of the recursive
nature of the template instantiation process. What would be a function
in a runtime computation is typically a class template in a compiletime
computation. What would be the argument of a function in a runtime
computation is typically a nontype template argument in a compiletime
computation. What would be the return value of a function in a runtime
computation is typically the a nested constant in the class template in
a compiletime computation. And the condition that ends the recursion in
a runtime computation is typically a template specialization in a compiletime
computation. Equipped with this knowledge, it is no long stretch
anymore to imagine how Erwin Unruh's prime number calculation was built
on the exact same principles that we've been using in our two calculations.
Moving on to Expression TemplatesSo far we have been calculating values at compiletime, which is nice, but not overly exciting. Let's get more ambitious: let's evaluate more complex expressions at compiletime. In a first step, we will implement a compiletime version of a vector dot product. The dot product of two vectors is the sum of the products of its corresponding elements. Example: The dot product of the two 3dimensional vectors (1,2,3) and (4,5,6) would be 1*4 + 2*5 + 3*6, which is 32. The goal is to set up expression templates for the calculation of the dot product of vectors of arbitrary dimensions, like in the following example:
int a[4] = {1,100,0,1};
The dot product is a first primitive example of an expression template, but the techniques that we will apply for the vector dot product have been generalized for arithmetic operations on multidimensional matrices. Naturally, the gain in runtime performance that the compiletime computation will provide, is much more dramatic for a matrix operation than it is for a vector dot product. The techniques however are based on the same principles.
Another powerful generalization is the use of expression templates in
a more dynamic fashion: we eventually want to evaluate an expression repeatedly,
passing in different values for each expression evaluation. By means
of expression templates these evaluations would be performed at compiletime
incurring zero overhead at runtime. Expression evaluations play a
role in the calculation of integrals, for instance. An example:
the integral
template <class ExprT>
ExprT in this example would somehow represent an expression such as
x/(1+x). We will see how exactly it works later in the article.
A First Expression Template  Dot ProductIn order to distill the essence of expression templates let us explain the vector dot product and the arithmetic expressions, which we will explore later, in terms of commonly known patterns as described in the classic patterns book by Gamma et.al. (/ GOF /). A vector dot product can be seen as a special case of a composite.The Composite pattern provides a way to represent a partwhole relationship where the client can ignore the difference between individual objects and compositions of objects. The key elements of the Composite pattern are the leaf and the composite:
The GOF book suggests an objectoriented implementation of the Composite
pattern with an abstract base class, which defines the operation that leaves
and composites have in common and two derived classes, which represent
the leaf and the composite respectively.
The vector dot product can be seen as a special case of the Composite
pattern. A dot product can be split into a leaf (namely the dot product
of two vectors of dimension 1) and a composite (namely the dot product
of the two vectors of dimension N1).
Obviously, this is a degenerated form of the prototypical composite
because each composite consists of exactly one leaf and one composite.
Using the suggested objectoriented implementation technique we would represent
the dot product by a base class and two derived classes as shown in the
class diagram below:
The implementation is straightforward and shown in Listing 1 thru 3.
Listing 4 shows a helper function that eases use of these classes and Listing
5 eventually demonstrates how we can calculate the dot product of
two vectors.
Naturally, this is not the most efficient way of calculating the dot product of two vectors. We can substantially simplify the implementation by eliminating the representation of the composite and the leaf object as data members in the derived classes. Instead of passing the vectors to the leaf and the composite at construction time and memorizing them for subsequent evaluation we could pass the information directly to the evaluation function. Constructor and evaluation function
SimpleDotProduct<T>::SimpleDotProduct (T* a, T* b) :v1(a), v2(b)
{}
would be replaced by an evaluation function that takes arguments: T SimpleDotProduct::eval(T* a, T* b, size_t dim) { return (*a)*(*b); }
The simplified implementation of the derived classes is shown in Listing
6; the base class would remain unchanged as shown in Listing 1; the helper
function must be adjusted.
Figure 5 shows the corresponding class diagram for the simplified solution.
A CompileTime Version of the Dot ProductLet us now take the objectoriented approach and translate it to a compiletime solution. The two classes representing the leaf and the composite have a common base class that defines the operation that they have in common. This is a wellknown technique in objectorientation: commonalities are expressed by means of a common base class. In template programming, commonalities are expressed in terms of naming conventions and name commonalities. What is a virtual function in the objectoriented approach will become a plain nonvirtual function that has a certain name. The two derived classes will no longer be derived from a common base class. Instead they will be standalone classes that happen to have a member function with the same name and a compatible signature. That is, we will eliminate the base class.Next, we will implement the composite as a class template using structural information as template arguments. This structural information is the dimension of the vectors. Remember what we did for the calculation of the factorial and the square root: the former function argument in the runtime implementation became a template argument in the compiletime implementation. We will be doing a similar thing here: the vectors' dimension is passed as an argument to the evaluation function in the objectoriented approach; we will pass it as a template argument in the compiletime implementation. Hence the dimension of the vectors will be become a nontype template argument of the composite class. The leaf will be implemented as a specialization of the composite class template for dimension N = 1. As before we will replace the runtime recursion by a compiletime recursion: we will replace the recursive invocation of the virtual evaluation function by recursive template instantiation of a static evaluation function. Here is the class diagram of our compiletime implementation of the vector dot product:
The implementation is shown in Listing 7. Usage of the implementation
is shown in Listing 8.
Note the difference between the expressions dot(a,b,4) in the runtime implementation and dot<4>(a,b) in the compiletime implementation:
dot(a,b,4)
evaluates to
CompositeDotProduct<size_t>().eval(a,b,4)
which recursively triggers the following function calls as runtime:
dot<4>(a,b)
on the other hand evaluates to
DotProduct<4,size_t>::eval(a,b)
which
recursively triggers instantiation of further class templates and gradually
unfolds as follows:
Evidently the template approach to the problem is significantly more efficient in terms of runtime performance  at the expense of increased compilation time. The recursive template instantiation takes time and not surprisingly the compile time consumption goes up by orders of magnitude for the template solution compared to the objectoriented solution, which compiles fast, but runs slowly. Another limitation of the template solution is that the dimension of the vectors must be known at compiletime because the dimension is a template argument of the DotProduct template. In practice this is not much of a restriction because the vector dimension is often known in advance; in many applications we simply know that we will be performing arithmetic operation on 3dimensional vectors representing points in space, for instance.
The implementation of the vector dot product might no be overly impressive,
after all we could have achieved the same high performance by unfolding
the dot product expression manually. But the techniques demonstrated here
for implementation of the dot product can be generalized to arithmetic
operations on multidimensional matrices. The real benefit of the
solution lies in its expressiveness. Imagine an expression such as
a*b+c, where a, b and c are 10x20 matrices. You would not want to
unfold the resulting expression manually, if the compiler can do it automatically
and reliably.
Another Expression Template  Arithmetic ExpressionsLet us know take the techniques one step further and consider a real composite structure: arithmetic expressions consisting of unary and binary arithmetic operators and variable and constant operands. The GOF book has a pattern for exactly this case  the Interpreter pattern.The Interpreter pattern provides a way to represent a language in form of an abstract syntax tree and an interpreter that uses the syntax tree to interpret language constructs. It is a special case of the Composite pattern. The partwhole relationship of the Composite pattern corresponds to the relationship of an expression and its subexpressions in the Interpreter pattern.
Let us take the concrete example of an expression, say (x+2)*3. The
composite structure, that is, the syntax tree for this expression would
look like this:
The classic objectoriented technique for implementing the Interpreter pattern, as it is suggested in the GOG book, would involve the following classes:
Samples of the corresponding source code of an implementation are shown
in Listing 9. The base class for UnaryExpr is implemented in analogy
to class BinaryExpr and all the concrete unary and binary expressions follow
the example of class Sum.
Listing 10 shows how the interpreter would be used for the evaluation
of the expression (x+2)*3.
First an expression object expr is created that represents the expression (x+2)*3, and then the expression object is told to evaluate itself. Naturally, this is an utterly inefficient way to calculate the result of a primitive expression such as (x+2)*3. But hold on; we will now turn the objectoriented approach into a templatebased solution. As before in the case of the dot product, we will eliminate all abstract base classes, because template solutions are based on name commonality rather than inheritance. For this reason, we need no base classes. Instead all terminal and nonterminal expressions will be represented by standalone classes that happen to have an evaluation function named eval(). Next we will express all the nonterminal expressions, such as Sum or Product, as classes generated from class templates UnaryExpr and BinaryExpr, both of which are parameterized on structural information. These class template will take the types of their subexpression(s) as type template arguments. In addition, we will parameterize the expression class templates on the type of the operation that they represent, that is, the actual operation (+,,*,/,++,,abs,exp,log) will be provided as a function object and its type is one of the template arguments of the expression class template. The terminal expressions will be implemented as regular (nontemplate) classes, pretty much like they are in the objectoriented approach. Instead of run time recursion we will again use compile time recursion: we will replace the recursive invocation of the virtual evaluation function by recursive template instantiation of the expression class templates. Figure 9 below shows the classes in the templatebased solution: Figure 9: Class Diagram of a TemplateBased Implementation of an Interpreter for Arithmetic Expressions
The source code of the implementation is shown in Listing 11.
The class template for UnaryExpr is implemented in analogy to class
BinaryExpr. As operations we can use the predefined STL function
object types plus, minus, multiplies, divides, etc. or we define our own
function object types as needed. A binary expression representing
a sum for instance would be of type BinaryExpr< ExprT1, ExprT2, plus<double>
>. Since this type name is rather unwieldy we add a creator function
for more convenient use of our solution.
We will use the creator function technique to ease creation of our expression
objects. Listing 12 below shows two examples of these creator functions:
Listing 13 shows how the templatebased implementation of an interpreter
would be used for the evaluation of the expression (x+2)*3.
First, an expression object expr is created that represents the expression (x+2)*3, and then the expression object is told to evaluate itself. Note, that in this solution the type of the expression object already reflects the structure of the syntax tree. Often the rather long name of the type of the expression object need not even be specified. In the example above we do not need the variable expr and could directly use the result of the creator function makeProd() for evaluation of the expression, as shown below:
cout
EvaluationWhat have we gained by reimplementing the Interpreter pattern using templates instead of inheritance? Well, under the assumption that the compiler inlines all the creator functions and constructors and eval() functions (which is likely since all of them are trivial) the expressionmakeProd(makeSum(Variable(x),Literal(2)),Literal(3)).eval() evaluates to nothing more than (x+2)*3.
Compare this to the effect of
In essence, the templatebased solution is much faster at runtime than
the objectoriented implementation.
Further Elaboration of the Expression Template SolutionLet us now tweak the expression templates a bit and we will see that we turn them into something really useful. The first intended improvement addresses readability of the expression templates. We want to make an expression such asmakeProd(makeSum(Variable(x),Literal(2)),Literal(3)).eval() more readable in the sense that it looks more or less like the expression that is represents, namely (x+2)*3 in this case. This can be achieved by means of operator overloading. We will make it look like eval((v+2)*3.0) with just a few minor modifications.
The first change is to rename the creator functions so that they are
overloaded operator; that is, we rename makeSum() into operator+(), makeProd()
into operator*(), and so on and so forth. The effect is that the
term
That's good, but not good enough. We would like to write it as ((x+2)*3). Hence, our goal is to eliminate the creation of a Variable and a Literal, which still clutter the expression.
In order to find out how we can improve the solution let us examine
what an expression such as x+2 means, now that we renamed the creator function
makeSum() into operator+(). The implementation of operator+() is
shown in Listing 14 below.
We would like for x+2 to be the same as operator+(x,2), which formerly was makeSum(x,2). For this reason x+2 is the result of creating a binary expression object that represents a sum and which was created with the double variable x and the int literal 2 as constructor arguments. More precisely, it is an unnamed object created as BinaryExpr<double,int,plus<double>>(x,2). Note that the type of the object is not quite what we want. We need an object of type BinaryExpr<Variable,Literal,plus<double>>, but the automatic template argument deduction does not know that x is a Variable and 2 is a Literal in our context. The compiler deduces type double from argument x and type int from argument 2, because the compiler examines the types of the arguments passed to the function.
It turns out that we must nudge the compiler a little bit so that it
deduces what we need. If we passed an object of type Variable instead
of the original variable x then the automatic argument deduction would
at least yield type BinaryExpr<Variable,int,plus<double>>, which
is a little closer to the goal. (We will address the remaining inttoLiteral
conversion in a minute). For this reason, a minimal degree of cooperation
from the users' side is inevitable: they must wrap their variables into
objects of type Variable to make it work, as shown in Listing 15 below:
With the use of a Variable object v instead of a plain numeric variable we achieved that an expression such as v+2 evaluates to an unnamed object equivalent to BinaryExpr<Variable,int,plus<double>>(v,2). Such a BinaryExpr object has two data members, which are of type Variable and int respectively. The evaluation function BinaryExpr<Variable,int,plus<double>>::eval() would return the sum of the result of the evaluation of its two data members. The crux is that the int data member does not know how to evaluate itself; we need to turn the literal 2 into an object of type Literal, which does know how to evaluate itself. How can we automatically convert constants of any numerical type to objects of type Literal? In order to solve the problem we'll define expression traits.
We will apply the traits technique to solve our problem with the conversion
of numeric literals to objects of type Literal. We will define expression
traits that provide for every expression type the information about how
they should be stored inside the expression objects that they are operands
of. All entities of numeric types shall be stored as objects of type
Literal; all objects of type Variable shall be stored as they are, namely
as Variables; and all nonterminal expression objects shall also be stored
as they are. Listing 16 shows the definition of the expression traits:
The expression traits class defines a nested type expr_type, which represents the expression type of an expression object. There is a general expression traits template that defines the expression type for all expressions that are class types, such as BinaryExpr, UnaryExpr or Variable. In addition there are specializations of the general class template for all those type that are builtin numeric types such as short, int, long, float, double, etc. For all nonclass expressions the expression type is defined as type Literal.
Inside the definition of the BinaryExpr and UnaryExpr classes we will
use the expression traits to determine the type of the data member that
will hold the subexpression.
Thanks to the use of the expression traits an expression object of type BinaryExpr<Variable,int,plus<double>> will contain its two operands as objects of type Variable and Literal, as desired.
Now we have achieved that an expression such as ((v + 2) * 3).eval(),
where v is a Variable that wraps a double variable x, will evaluate to
(x+2)*3. Let us do some minor tweaking to make it even more readable.
Most people find it weird to invoke a member function of something that
looks like an expression. If we define a helper function we can turn
the expression ((v + 2) * 3).eval() into something like eval((v + 2) *
3), which looks more natural to most people, but is otherwise equivalent.
Listing 18 shows the helper function:
Figures 10 illustrates how the expression ((v + 2) * 3).eval(), where v is a Variable that wraps a double variable x, gradually unfolds at compiletime to the expression (x+2)*3. Figure 10: CompileTime Evaluation of the Expression Object (v+2)*3 Repeated Evaluation of an Expression ObjectYou might still wonder where the benefit of expression objects is. Each expression object represents the syntactical decomposition of an arithmetic expression; it's a syntax tree that knows how to interpret itself to produce a numeric value. Basically, we've set up a machinery for evaluation of expressions  something that is built into the language anyway. So, what's the point? Well, a minimal further adjustment of the solution will get us to the point.So far, the interpretation of the syntax tree is rather static. Each syntax tree is created and interpreted only once. A more dynamic usage model is possible, where a given syntax tree can be evaluated repeatedly for different input values. Remember, that we ultimately want to calculate integrals such as using an integration function such as the one below, which approximates the integral by evaluating an expression for a specifed number of equidistant points in an interval:
template <class ExprT>
This function could be used as shown below to approximate the integral of x/(1+x) from the example above:
Identity<double> x;
Here we need an expression object that can be interpreted repeatedly,
something that our expression templates do not allow yet. It just
takes a minor modification to turn our static syntax tree interpretation
into a repeatable interpretation. We just have to change all evaluation
functions of all our expression class templates so that they accept an
input argument, namely the value for which they evaluate themselves.
The nonterminal expressions will pass on the argument on to their subexpressions.
The Literal will accept the argument and ignore it; it will continue returning
the constant value that it represents. The Variable will no longer
return the value of a variable that it refers to, but will return the value
of its argument. For this reason we rename it to Identity.
Listing 19 below shows the modified classes:
If we add nonterminal expressions for incorporation of numeric functions
such as sqrt(), sqr(), exp(), log(), etc. we can even calculate the Gauss
distribution:
For providing the expressions that represent numeric functions we can
use the functions that come with the standard C library. All we need
to do is adding creator functions for unary or binary expressions whose
operation is one of the predefined C functions. Listing 21 below
shows a couple of examples:
With these additions in place we now have a powerful and highperformance solution for evaluation of arithmetic expressions. It is not much of a stretch to imagine that application of the techniques demonstrated in this article allows setting up expression templates for logical expressions. By renaming the evaluation function from eval() to operator()(), which is the function call operator, we can easily turn the expression objects into function objects, which can then be used in conjunction with STL algorithms. Below is an example of a logical expression that is used as a predicate for counting elements in a list:
list<int> l;
The example nicely illustrates the gain in readability that expression
templates enable, at zero cost in terms of runtime performance. Provided
that the expression templates are already in place they are easy and convenient
to use. Building an expression template library is a different story;
it involves many more template programming techniques than we've discussed
in this article. However, all template libraries are built on principles
like the ones discussed in this article.
ReferencesQuite a number of expression template libraries are available for free download, some of which are mentioned in the list of reference below. However, the list by no means claims to be comprehensive or representative. If you are interested in further information we recommend commencing your search at one of the directories (see / JÜL / and / OON /).
AcknowledgementsOur thanks to Gary Powell, who read our article in CUJ and sent us email. He brought FACT!, FC++, the Boost Lambda Library, and Phoenix to our attention.


© Copyright 19952013 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/Cuj/ExpressionTemplates/ExpressionTemplates.htm> last update: 23 Oct 2013 