Archives for posts with tag: iostreams

When using IOStreams it is common to use manipulators to somehow affect the stream, often just setting up formatting flags
or skipping over characters. For some uses it would be nice to have additional manipulators for specific use cases. Something I frequently
find useful is to skip over an expected character, e.g. when reading a table with a fixed number of comma separated columns. For this it
would be great to just use the familiar idiom for reading values without having to bother much about the separator being correct while
still getting it checked. Here is how I want to read a table with three columns from a stream in:

for (int x, y, z; in >> x >> comma >> y >> comma >> z; ) {
    std::cout << "x=" << x << " y=" << y << " z=" << z << "\n";
}

The approach to deal with something like this is to write a manipulator which is just a function taking a stream as argument and returning
it. The input and output streams have special formatting functions implemented which take a pointer to a function with this signature
and call the corresponding function. Thus, the comma manipulator mentioned above can be implemented like this:

template <typename cT, typename Traits>
std::basic_istream<cT, Traits>& comma(std::basic_istream<cT, Traits>& in)
{
    if ((in >> std::ws).peek() != in.widen(',')) {
        in.setstate(std::ios_base::failbit);
    }
    else {
        in.ignore();
    }
    return in;
}

When called with an input stream this function starts with skipping possibly leading whitespace using the standard maninpulator
std::ws. Normally this manipulator isn’t needed because the formatted input functions automatically skip leading whitespace.
However, in the code above the next character is inspected using peek() which just looks at the current character without
removing it and it doesn’t skip leading whitespace. If the character happens to be a comma it is extracted by ignoring it. Otherwise the
std::ios_base::failbit is set for the stream indicating that something went wrong.

The code doesn’t directly compare the comma character to the result of peek() but instead obtains a value from widen().
The use of the widen() function is necessary to cope with the potential use of wide characters: this function uses the
stream’s std::locale to convert the character into a wide character version. Note that this has nothing to do with any
sort of encoding: the characters internal to a C++ program are already considered to be converted from any encoding into the
execution character set.

Of course, having a comma manipulator immediately raises the question for a semicolon manipulator. This
should be easy to write and would have roughly identical code: there is exactly one character difference. That is, a quick copy-and-paste
job and we are down the road of [manual] code duplication. This is a path I prefer to avoid as much as possible. We could
try to turn the character to be skipped into a template argument but this fails: it would be necessary to name the comma,
semicolon, etc. manipulators somehow and this isn’t really possible without fixing all template arguments. To have the
manipulators be usable with all instantiations of input stream it is necessary to use a separate type for the manipulator and implement
an actual input operator, e.g.:

template <char S> struct separator {};
template <typename cT, typename Traits, char S>
std::basic_istream<cT, Traits>&
operator>> (std::basic_istream<cT, Traits>& in, separator<S>)
{
    return (in >> std::ws).peek() != in.widen(S)
        ? in.setstate(std::ios_base::failbit), in: in.ignore();
}

extern separator<','> const comma;
extern separator<';'> const semicolon;
// ...

The implementation of the operator>>() is just a more compact version of the code used earlier. This manipulator
can be used nearly identical as the original manipulator with one minor change: implementing the manipulator as a function allows the
use with a temporary stream:

if (std::istringstream(",1") >> comma >> x) { ... }

This works because it is allowed to call non-const member functions on temporary objects. Using the explicitly written
input operator doesn’t work because it would require binding the temporary stream to a non-const reference which isn’t allowed.
If this is really needed the obvious work-around is to use some “true” manipulator, e.g. std::ws, first: the result of the
operator applying a manipulator is always a non-const reference, i.e. once one of the member input functions was called
there is a suitable reference available for other input operators to bind the stream to.

Many C++ text books advertise the use of std::endl to terminate lines when writing to an std::ostream. Unfortunately, it seems that few people understand what std::endl does or if they know they underestimate the impact. What most people don’t notice is that std::endl does two things: it inserts a newline character i.e. '\n' and then it flushes the std::ostream. It is this extra flush which makes the operation quite expensive. Here is a very simple example program demonstrating the difference:

#include <fstream>
#include <string>

template <typename cT, typename Traits>
std::basic_ostream<cT, Traits>&
nl(std::basic_ostream<cT, Traits>& out)
{
    return out << out.widen('\n');
}

int main(int ac, char*[])
{
    std::string   text("0123456789012345678901234567890123456789");
    std::ostream& (*end)(std::ostream&) = std::endl;
    if (ac == 1)
        end = ::nl;

    std::ofstream out("file.txt", std::ios_base::binary);
    for (size_t i(0); i != 1000000; ++i)
    {
        out << text << text << end;
    }
}

All this program does is to write a million lines of 80 characters plus a newline. Depending on whether the program received a command line parameter it either uses std::endl or the custom manipulator ::nl to terminate lines. To make a fair comparison ::nl is implement like std::endl i.e. as a function template. Since the character type isn’t known, it needs to widen the newline character '\n'. Running this program on my machine shows a factor of about 6 between the time the programs take! The version using endl take a bit more than 3 second while the version using ::nl only take about 0.5 seconds.

This extra flushing the stream causes the file to write each block multiple times instead of just once when the buffer has become full. This easily slows down I/O intensive operations and my cause a bottleneck. Thus, don’t use std::endl unless the flush is actually intentional. Also note that std::cerr is already setup to flush the stream after every individual output operation. This is done using the flag std::ios_bse::unitbuf which is initially set for std::cerr. To turn this off you can use the formatter std::nounitbuf and to turn it on you would use std::unitbuf.