Archives for posts with tag: C++

The basic code organization in C++ is fairly straight forward: declarations and type definitions go into the header, definition of objects and functions go into the source. This way a library’s declarations are decoupled nicely from their implementation and source dependencies are kept low. Of course, nothing is as simple as this in C++. That is, some classes are used only locally and thus defined only in in the source file rather than in the header. In the other direction, some “header-only” libraries are quite popular which implement all of the functions inline and thus don’t require any library to be compiled and linked to the program separately. What goes where can become a bit of an art and sometimes people employ techniques specifically to reduce the number of dependencies.

When it comes to templates things are clear-cut on the other hand: there is no option than putting the template into the header as otherwise neither the compiler nor the linker will see the definition of a function template when it gets instantiated. Or is there? In C++2003 templates could be exported from a source and thus made visible somehow to the C++ system. Only the EDG C++ front-end implemented this and some vendors using this C++ front-end either disabled this feature or, at least, didn’t support it in any way. Exported templates were removed, not just deprecated, from the C++2011 standard although allowing vendors to be conforming if they implement exported templates (essentially export is still a keyword although one which isn’t necessarily used). What else can be done?

As it turns out, some templates can only be instantiated with a limited set of template arguments or are normally only used with a known and reasonably small set of template arguments. For example, the I/O stream classes are typically only instantiated for the character types char and wchar_t although even C++2003 supports additional character types (i.e. signed char and unsigned char which are both distinct types from char although the values representable using char are identical to one of the other two types; whether char is signed or unsigned is up to the implementation and there implementations for both; C++2011 adds the character types char16_t and char32_t). In fact, it is relatively hard to use the I/O streams with any other type because it requires setting up certain facets in the used std::locale objects and the implementation of a number of functions which need to be present. The standard class template std::basic_string is similarly generally only instantiated for the character types char and wchar_t. Instantiating it for other character types requires implementation of a suitable character traits class or a specialization of std::char_traits<cT> for the corresponding character type cT. For std::complex<T> the standard explicitly states that its use with other types than float, double, and long double is unspecified.

Generally I found that there are quite a number of class and function templates which only work with a limited set of types. Implementing them as templates is still reasonable because it allows treating the different instantiations the same in code using them. For example, the numerical algorithms used with complex numbers don’t really differ with the size of the type used to represent the values. They may even stay the same when using SSE vectors to represent the values, applying the same algorithm to multiple values at once. Thus, it is a bit sad that there is no explicit support for these types. In any case, if the template is only instantiated with a limited number of template arguments it makes sense to separate the interface from the implementation as is conventionally done for non-templatized code. For this purpose I found it helpful to implement templates using a header file with the declarations, a header file with the template definitions, and multiple files containing instantiations of the templates with the supported types. For example, for std::complex<T> this could look something like this (omitting the usual include guards):

// header <complex>
namespace std {
    template <typename T> class complex {
    public:
        complex();
    ...
    };
    template <typename T>
    complex<T> operator+ (complex<T> const&, complex<T> const&);
    ...
}

This header would only contain the definition of the std::complex<T> class template but no definitions (except, possibly, inline definitions of function which can’t afford the function call overhead). If the implementation indeed only supports a limited set of template arguments it is probably nice to create a compile-time error using e.g. static_assert() for unsupported types (actually, the standard doesn’t really define the primary template for std::complex<T> but only the specializations for float, double, and long double, i.e. it mandates an error for unsupported instantiations). The next file would be another header file including the definitions of the [member] function templates:

// header <complex.tpp>
#include <complex>

template <typename T>
std::complex<T>::complex(): real_(), imaginary_() {}
...
template <typename T>
std::complex<T> std::operator+ (std::complex<T> const& c0, std::complex<T> const& c1) {
    return std::complex<T>(c0.real_ + c1.real_, c0.imaginary_ + c1.imaginary);
}
...

This second header defines all the [member] function templates after including the corresponding header. Users of std::complex<T> don’t include this header. It is only included by a suite of source files, each explicitly instantiating the templates with one supported argument type. For example, the file instantiating the functions for the type float could look something like this:

// source file complex-float.cpp
#include <complex.tpp>

template class std::complex<float>;

template std::complex<float> std::operator+ (std::complex<float> const&, std::complex<float> const&);
...

Explicitly instantiating a class template instantiates all member functions whose definition is currently known. To explicitly instantiate non-member function templates it is necessary to list each one individually. To avoid having to list all of the functions for each type it is possible to create an auxiliary function or class template which explicitly references each of them: explicitly instantiating this template implicitly instantiates all of the non-member functions. Although this is conceptually using a different mechanism the net effect is the same, i.e. there will be an object file which defines all of the instantiations of the function templates.

For a templates only supporting a known set of arguments this works quite nicely. However, things get a bit more interesting when conceptually an infinite set of template arguments can be used. It may still be very desirable to instantiate the template with the commonly used template arguments. For example, the effort for instantiating the I/O stream library in every translation unit which uses it is quite substantial. Here is a subset of the classes getting instantiated when you just use an std::istringstream to format an int if nothing is preinstantiated:

  • std::basic_istringstream<char>
  • std::basic_istream<char>
  • std::basic_ios<char>
  • std::ostreambuf_iterator<char>
  • std::basic_stringbuf<char>
  • std::basic_streambuf<char>
  • std::num_get<char>
  • std::numpunct<char>

For the last four class template most of the functionality is in their respective virtual functions and all of these virtual functions will get instantiated, even if they are not used because they are needed to initialize the “virtual function table” (or whatever technique is actually used to implement virtual functions). In case you ever wondered why C++ code compiles fairly slow, this is definitely part of the reason and these are just the class template which I could immediately think of.

One way to deal with this is to still do a similar separation as was done for std::complex<T> and just make life a bit harder for those few people who already took up the burden of setting up streams for another character type. That is, if the definitions are moved into a separate header the documentation would somewhere state something along the lines of “oh, BTW, if you wonder why the I/O streams don’t work with your character type – you need to also do this:”, obviously followed by some explanation nobody is bound to ever read explaining how to include the proper headers and instantiate all those classes.

An alternative approach is to use C++2011’s extern templates which allow a hybrid approach. That is, you can provide a definition of the function templates and then declare that they shouldn’t be implicitly instantiated because they are explicitly instantiated somewhere else. Unfortunately, this prevents the nice trick of implicitly instantiating the various non-member functions by being referenced from an explicitly instantiated function or class template. Unless, of course, the translation unit causing the implicit instantiation of the function templates doesn’t see the extern template declarations. Here is how this could look like (using std::complex<T> to stick with the example):

// header <complex>
namespace std {
    template <typename T> class complex {
    public:
        complex();
    ...
    };
    template <typename T>
    complex<T> operator+ (complex<T> const&, complex<T> const&);
    ...
}
#include <complex.tpp>

The main change is that the <complex> header now also includes the header for the definitions. Obviously, there isn’t strictly a need to separate the two parts into two files when using extern template declarations. For the time being I find it useful to still use this as not all compilers support extern templates, yet. The definition would conditionally contain the various extern template declarations (also showing a draft version of the auxiliary instantiation function):

// header <complex.tpp>
#include <complex>

template <typename T>
std::complex<T>::complex(): real_(), imaginary_() {}
...
template <typename T>
std::complex<T> std::operator+ (std::complex<T> const& c0, std::complex<T> const& c1) {
    return std::complex<T>(c0.real_ + c1.real_, c0.imaginary_ + c1.imaginary);
}
...
#if defined(INSTANTIATING_COMPLEX)
template <typename T>
void use_all_complex_functions() {
    std::complex<T>() + std::complex<T>();
    ...
}
#else
extern template class complex<float>;
extern template class complex<double>;
extern template class complex<long double>;

extern template
std::complex<float> 
    std::operator+<float> (std::complex<float> const&,
                           std::complex<float> const&);
extern template
std::complex<double>
    std::operator+<double> (std::complex<double> const&,
                            std::complex<double> const&);
extern template
std::complex<long double>
    std::operator+<long double> (std::complex<long double> const&,
                                 std::complex<long double> const&);
#endif

The file explicitly instantiating the various class and function templates looks essentially the same as before except that it would define the macro INSTANTIATING_COMPLEX and then include the header <complex>. Overall this is quite a bit of effort. Of course, all the effort is done by the one guy who is implementing the library and everybody who is using the library is benefiting from his efforts!

When using IOStreams it is common to use manipulators to somehow affect the stream, often just setting up formatting flags
or skipping over characters. For some uses it would be nice to have additional manipulators for specific use cases. Something I frequently
find useful is to skip over an expected character, e.g. when reading a table with a fixed number of comma separated columns. For this it
would be great to just use the familiar idiom for reading values without having to bother much about the separator being correct while
still getting it checked. Here is how I want to read a table with three columns from a stream in:

for (int x, y, z; in >> x >> comma >> y >> comma >> z; ) {
    std::cout << "x=" << x << " y=" << y << " z=" << z << "\n";
}

The approach to deal with something like this is to write a manipulator which is just a function taking a stream as argument and returning
it. The input and output streams have special formatting functions implemented which take a pointer to a function with this signature
and call the corresponding function. Thus, the comma manipulator mentioned above can be implemented like this:

template <typename cT, typename Traits>
std::basic_istream<cT, Traits>& comma(std::basic_istream<cT, Traits>& in)
{
    if ((in >> std::ws).peek() != in.widen(',')) {
        in.setstate(std::ios_base::failbit);
    }
    else {
        in.ignore();
    }
    return in;
}

When called with an input stream this function starts with skipping possibly leading whitespace using the standard maninpulator
std::ws. Normally this manipulator isn’t needed because the formatted input functions automatically skip leading whitespace.
However, in the code above the next character is inspected using peek() which just looks at the current character without
removing it and it doesn’t skip leading whitespace. If the character happens to be a comma it is extracted by ignoring it. Otherwise the
std::ios_base::failbit is set for the stream indicating that something went wrong.

The code doesn’t directly compare the comma character to the result of peek() but instead obtains a value from widen().
The use of the widen() function is necessary to cope with the potential use of wide characters: this function uses the
stream’s std::locale to convert the character into a wide character version. Note that this has nothing to do with any
sort of encoding: the characters internal to a C++ program are already considered to be converted from any encoding into the
execution character set.

Of course, having a comma manipulator immediately raises the question for a semicolon manipulator. This
should be easy to write and would have roughly identical code: there is exactly one character difference. That is, a quick copy-and-paste
job and we are down the road of [manual] code duplication. This is a path I prefer to avoid as much as possible. We could
try to turn the character to be skipped into a template argument but this fails: it would be necessary to name the comma,
semicolon, etc. manipulators somehow and this isn’t really possible without fixing all template arguments. To have the
manipulators be usable with all instantiations of input stream it is necessary to use a separate type for the manipulator and implement
an actual input operator, e.g.:

template <char S> struct separator {};
template <typename cT, typename Traits, char S>
std::basic_istream<cT, Traits>&
operator>> (std::basic_istream<cT, Traits>& in, separator<S>)
{
    return (in >> std::ws).peek() != in.widen(S)
        ? in.setstate(std::ios_base::failbit), in: in.ignore();
}

extern separator<','> const comma;
extern separator<';'> const semicolon;
// ...

The implementation of the operator>>() is just a more compact version of the code used earlier. This manipulator
can be used nearly identical as the original manipulator with one minor change: implementing the manipulator as a function allows the
use with a temporary stream:

if (std::istringstream(",1") >> comma >> x) { ... }

This works because it is allowed to call non-const member functions on temporary objects. Using the explicitly written
input operator doesn’t work because it would require binding the temporary stream to a non-const reference which isn’t allowed.
If this is really needed the obvious work-around is to use some “true” manipulator, e.g. std::ws, first: the result of the
operator applying a manipulator is always a non-const reference, i.e. once one of the member input functions was called
there is a suitable reference available for other input operators to bind the stream to.

Many C++ text books advertise the use of std::endl to terminate lines when writing to an std::ostream. Unfortunately, it seems that few people understand what std::endl does or if they know they underestimate the impact. What most people don’t notice is that std::endl does two things: it inserts a newline character i.e. '\n' and then it flushes the std::ostream. It is this extra flush which makes the operation quite expensive. Here is a very simple example program demonstrating the difference:

#include <fstream>
#include <string>

template <typename cT, typename Traits>
std::basic_ostream<cT, Traits>&
nl(std::basic_ostream<cT, Traits>& out)
{
    return out << out.widen('\n');
}

int main(int ac, char*[])
{
    std::string   text("0123456789012345678901234567890123456789");
    std::ostream& (*end)(std::ostream&) = std::endl;
    if (ac == 1)
        end = ::nl;

    std::ofstream out("file.txt", std::ios_base::binary);
    for (size_t i(0); i != 1000000; ++i)
    {
        out << text << text << end;
    }
}

All this program does is to write a million lines of 80 characters plus a newline. Depending on whether the program received a command line parameter it either uses std::endl or the custom manipulator ::nl to terminate lines. To make a fair comparison ::nl is implement like std::endl i.e. as a function template. Since the character type isn’t known, it needs to widen the newline character '\n'. Running this program on my machine shows a factor of about 6 between the time the programs take! The version using endl take a bit more than 3 second while the version using ::nl only take about 0.5 seconds.

This extra flushing the stream causes the file to write each block multiple times instead of just once when the buffer has become full. This easily slows down I/O intensive operations and my cause a bottleneck. Thus, don’t use std::endl unless the flush is actually intentional. Also note that std::cerr is already setup to flush the stream after every individual output operation. This is done using the flag std::ios_bse::unitbuf which is initially set for std::cerr. To turn this off you can use the formatter std::nounitbuf and to turn it on you would use std::unitbuf.