Archives for category: Uncategorized

compare and describe the costs of calling functions and provide guidelines when to use what:

  1. inline call
  2. non-inline/virtual function
  3. call across shared library boundary
  4. execute via an executor in same thread
  5. execute via an executor in another thread
  6. execute on same machine in another process
  7. execute on another machine

This article is in response to a discussion at http://stackoverflow.com/questions/18292619/c-string-array-sorting.

The comparison done by Mattingly does two things:

1. It compares apples to oranges: if the object is to create an array of sorted strings, this isn’t equivalent to an array of sorted pointers to C-strings.
2. It measures the time needed to construct `std::string` objects from string literals when using `std::sort()` but directly uses string literals when using `qsort()`. Although there is certainly a cost involved in create `std::string` objects, a roughly similar amount of time is spent in a real example to hold the corresponding C-strings.

Below I posted some code which uses the same tiny data set to compare `std::sort()` and `qsort()` removing the second of these issues in all cases (i.e., the construction of the data set is always removed from the measurements taken). In addition, I have added a way to deal with a bigger data set by reading in the words from a file `sort.in` (I used source of today’s featured Wikipedia entry (http://en.wikipedia.org/wiki/Banksia_violacea) but I’d think any reasonably natural text exhibits similar behavior).

The code also adds a third version of the comparison which uses `std::sort()` but operates on an array of C-strings partly to create something which doesn’t compare apples to oranges but also because it came as a surprise that sorting a large array of `std::string` objects is slower than I expected. I haven’t quite dug to the bottom of the issue but I suspect that the issue the relatively large footprint of `std::string` objects introduced by the now common small string optimization. My measurements indicate that the version shuffling pointers to strings using `std::sort()` is fastest, followed by `qsort()`, with the version directly sorting a larger array of `std::string` objects is slowest, although definitely not by a factor of four. The time measurements I observed using a recent snapshort of gcc are along the lines of those (those for clang seem to do better when using pointers with `std::sort()`):

    sorting sort content of sort.in
    c-sort:   396765
    cpp-ptrs: 375525
    c++-sort: 558646

Especially the amount time taken by `std::sort()` on `std::string` objects is, indeed a surprise. Similar tests in the past indicated that sorting `std::string` using `std::sort()` was faster than sorting C-strings using `qsort()`. I’d blame that on the small string optimization. Still when comparing the same things, `std::sort()` is faster than `qsort()`.

The code I used is below:

    #include <iostream>
    #include <fstream>
    #include <iterator>
    #include <string>
    #include <algorithm>
    #include <mach/mach_time.h>
    #include <string.h>
    #include <vector>
    
    void cpp_sort(std::vector<std::string> values)
    {
        uint64_t ts = mach_absolute_time();
        std::sort(values.begin(), values.end());
        std::cout << “c++-sort: ” << ( mach_absolute_time() – ts ) << ‘\n’;
    }
    
    struct cmp
    {
        bool operator()(char const* a, char const* b) const
        {
            return strcmp(a, b) < 0;
        }
    };
    
    void cpp_ptrs(std::vector<std::string> values)
    {
        std::vector<char const*> temp(values.size());
        std::transform(values.begin(), values.end(), temp.begin(),
                       std::mem_fn(&std::string::c_str));
    
        if (!temp.empty()) {
            uint64_t ts = mach_absolute_time();
            std::sort(temp.begin(), temp.end(), cmp());
            std::cout << “cpp-ptrs: ” << (mach_absolute_time() – ts) << ‘\n’;
        }
    }
    
    static int _sort( const void* a, const void *b )
    {
        return strcmp( *( const char** )a, *( const char** )b );
    }
    
    void c_sort(std::vector<std::string> values)
    {
        std::vector<char const*> temp(values.size());
        std::transform(values.begin(), values.end(), temp.begin(),
                       std::mem_fn(&std::string::c_str));
    
        if (!temp.empty()) {
            uint64_t ts = mach_absolute_time();
            qsort(&temp[0], values.size(), sizeof( const char* ), _sort );
            std::cout << “c-sort:   ” << (mach_absolute_time() – ts) << ‘\n’;
        }
    }
    
    void compare1(std::vector<std::string> const& array)
    {
        c_sort(array);
        cpp_ptrs(array);
        cpp_sort(array);
    }
    
    void compare2(std::vector<std::string> const& array)
    {
        cpp_sort(array);
        cpp_ptrs(array);
        c_sort(array);
    }
    
    
    template <typename T, std::size_t Size>
    std::size_t size(T(&)[Size])
    {
        return Size;
    }
    
    int main(int ac, char *av[])
    {
        switch (ac == 1? 0: atoi(av[1]))
        {
        case 0:
        case 1:
            {
                std::cout << “simple test of fixed array\n”;
                std::string name[] = {
                    “john”, “bobby”, “dear”, “test1”, “catherine”, “nomi”,
                    “shinta”, “martin”, “abe”, “may”, “zeno”, “zack”, “angeal”,
                    “gabby”
                };
                compare1(std::vector<std::string>(name, name + size(name)));
            }
            break;
        case 2:
            {
                std::cout << “simple test of fixed array (just reversed calls)\n”;
                std::string name[] = {
                    “john”, “bobby”, “dear”, “test1”, “catherine”, “nomi”,
                    “shinta”, “martin”, “abe”, “may”, “zeno”, “zack”, “angeal”,
                    “gabby”
                };
                compare2(std::vector<std::string>(name, name + size(name)));
            }
            break;
        case 3:
            {
                std::cout << “sorting sort content of sort.in\n”;
                std::ifstream in(“sort.in”);
                std::istream_iterator<std::string> begin(in), end;
                std::vector<std::string> values(begin, end);
                compare1(values);
            }
            break;
        case 4:
            {
                std::cout << “sorting sort content of sort.in (just reversed calls\n”;
                std::ifstream in(“sort.in”);
                std::istream_iterator<std::string> begin(in), end;
                std::vector<std::string> values(begin, end);
                compare2(values);
            }
            break;
        }
    }

Being a German living in the United Kingdom yields the constant threat of being run over because the cars tend to come from the unexpected direction. Trying to talk sense with the British is entirely futile: “the cars drive on the right side of the street which is the left side!” is a typical response to pointing out that we should globally standardize on just one side. It can reasonably be argued that the majority of the world’s population is actually used to traffic driving on the left side of the street (although I know from experience that this doesn’t necessarily mean a lot in places like India). However, I think it would be reasonable to think about “cars driving on the right side of the street which is the right side of the street”.

What does this have to do with C++? Well, for some unknown reason the predominant placement of placing the const keyword is UK-style: the right placement for the const is assumed to be on the left. Note, that if T is a type the declarations const T (let’s call this UK-style) and T const (let’s call this continental-style) are identical. However, UK-style const placement yields strange inconsistencies. For example, to indicate that a member function isn’t allowed to change the object pointed to by this you have to place the const on the right. Also, to deduce the type of a declaration you are best off to read it Arabic-style, i.e. right to left. This makes it easy to determine what is qualified by the const qualifier: the type to the left of the const, assuming, of course, that const is used continental-style. That is, continental-style makes the code not only more consistent but also easier to understand.

If you think that things are easy to understand as they are let’s consider a very simple declaration somewhere in a project:

const int* iterator;

Now, someone get’s the bright idea that it may be reasonable to actually give the type of iterators a name indicating what they are and replacing the above line with the following two lines:

typedef int* int_iterator;
const int_iterator iterator;

There was a int* for which we have a typedef now so the code got cleaned-up. Makes sense, doesn’t it? Well, maybe not so much: a constant int_iterator isn’t going to iterate a lot being const: the semantics of the code changed! That is, the new declaration of iterator without using a typedef would look like this:

int* const iterator;

Note that the const in the above declaration has to be continental-style unless an alias (e.g. a typedef) for int* is used. Now, if the original declaration had been continental-style in the first place hopefully nobody would have had the idea of trying to use the typedef:

int const* iterator;

Somehow, the const between the int and the * should have been a clear sign that the type doesn’t want to be replaced – both to the human reader and the global find-and-replace using something akin to the vi-command %s/int *\*/int_iterator/g. Well, I know that the vi-command would get this one right but I’m admittedly a bit worried about some human readers, unfortunately especially about those who wouldn’t even dream of using an editor command like the one above.

After having made a hopefully strong case for continental const placement there is one caveat I shall not leave unmentioned: C has made an attempt to take on some features introduced into C++. One of the features taken up by C as well is const. From what I can tell, it is an absolute failure. I will back the claim that const doesn’t quite fit into C up with just one example signature from the standard C library:

char *strchr(const char *s, int c);

Clearly, this signature has got const-correctness wrong: searching in a string of char const for a character shouldn’t yield a pointer to a modifiable char. Please note that I didn’t pick the one error which is considered to be editorial and which should be corrected to become:

const char *strchr(const char *s, int c); // NOT in C

Since functions can’t be overloaded in C and you would want be able to both pass in a constant string but also get out a pointer to a modifiable char if you passed in a non-const string in the first place, the signature from the standard C library is intentional and as good as it gets (when include <cstring> you’ll get properly overloaded, const-correct version of the function). This is, unfortunately, not the only place where C didn’t get it quite right: aside from the fact you can only declare one level of constness in C (i.e. you can’t have constant pointers, at least not without a typedef) there is no option for proper const placement in C at all: the const qualifier has to be applied UK-style. Ideally, this shouldn’t matter to C++ but in the odd case where you actually have a header file shared between C and C++ this means that you can’t use the correct continental-style const placement. Fortunately, these headers tend to be written by C programmers and C++ programmers can happily put the const on the correct side.