This article is in response to a discussion at http://stackoverflow.com/questions/18292619/c-string-array-sorting.
The comparison done by Mattingly does two things:
1. It compares apples to oranges: if the object is to create an array of sorted strings, this isn’t equivalent to an array of sorted pointers to C-strings.
2. It measures the time needed to construct `std::string` objects from string literals when using `std::sort()` but directly uses string literals when using `qsort()`. Although there is certainly a cost involved in create `std::string` objects, a roughly similar amount of time is spent in a real example to hold the corresponding C-strings.
Below I posted some code which uses the same tiny data set to compare `std::sort()` and `qsort()` removing the second of these issues in all cases (i.e., the construction of the data set is always removed from the measurements taken). In addition, I have added a way to deal with a bigger data set by reading in the words from a file `sort.in` (I used source of today’s featured Wikipedia entry (http://en.wikipedia.org/wiki/Banksia_violacea) but I’d think any reasonably natural text exhibits similar behavior).
The code also adds a third version of the comparison which uses `std::sort()` but operates on an array of C-strings partly to create something which doesn’t compare apples to oranges but also because it came as a surprise that sorting a large array of `std::string` objects is slower than I expected. I haven’t quite dug to the bottom of the issue but I suspect that the issue the relatively large footprint of `std::string` objects introduced by the now common small string optimization. My measurements indicate that the version shuffling pointers to strings using `std::sort()` is fastest, followed by `qsort()`, with the version directly sorting a larger array of `std::string` objects is slowest, although definitely not by a factor of four. The time measurements I observed using a recent snapshort of gcc are along the lines of those (those for clang seem to do better when using pointers with `std::sort()`):
sorting sort content of sort.in
c-sort: 396765
cpp-ptrs: 375525
c++-sort: 558646
Especially the amount time taken by `std::sort()` on `std::string` objects is, indeed a surprise. Similar tests in the past indicated that sorting `std::string` using `std::sort()` was faster than sorting C-strings using `qsort()`. I’d blame that on the small string optimization. Still when comparing the same things, `std::sort()` is faster than `qsort()`.
The code I used is below:
#include <iostream>
#include <fstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <mach/mach_time.h>
#include <string.h>
#include <vector>
void cpp_sort(std::vector<std::string> values)
{
uint64_t ts = mach_absolute_time();
std::sort(values.begin(), values.end());
std::cout << “c++-sort: ” << ( mach_absolute_time() – ts ) << ‘\n’;
}
struct cmp
{
bool operator()(char const* a, char const* b) const
{
return strcmp(a, b) < 0;
}
};
void cpp_ptrs(std::vector<std::string> values)
{
std::vector<char const*> temp(values.size());
std::transform(values.begin(), values.end(), temp.begin(),
std::mem_fn(&std::string::c_str));
if (!temp.empty()) {
uint64_t ts = mach_absolute_time();
std::sort(temp.begin(), temp.end(), cmp());
std::cout << “cpp-ptrs: ” << (mach_absolute_time() – ts) << ‘\n’;
}
}
static int _sort( const void* a, const void *b )
{
return strcmp( *( const char** )a, *( const char** )b );
}
void c_sort(std::vector<std::string> values)
{
std::vector<char const*> temp(values.size());
std::transform(values.begin(), values.end(), temp.begin(),
std::mem_fn(&std::string::c_str));
if (!temp.empty()) {
uint64_t ts = mach_absolute_time();
qsort(&temp[0], values.size(), sizeof( const char* ), _sort );
std::cout << “c-sort: ” << (mach_absolute_time() – ts) << ‘\n’;
}
}
void compare1(std::vector<std::string> const& array)
{
c_sort(array);
cpp_ptrs(array);
cpp_sort(array);
}
void compare2(std::vector<std::string> const& array)
{
cpp_sort(array);
cpp_ptrs(array);
c_sort(array);
}
template <typename T, std::size_t Size>
std::size_t size(T(&)[Size])
{
return Size;
}
int main(int ac, char *av[])
{
switch (ac == 1? 0: atoi(av[1]))
{
case 0:
case 1:
{
std::cout << “simple test of fixed array\n”;
std::string name[] = {
“john”, “bobby”, “dear”, “test1”, “catherine”, “nomi”,
“shinta”, “martin”, “abe”, “may”, “zeno”, “zack”, “angeal”,
“gabby”
};
compare1(std::vector<std::string>(name, name + size(name)));
}
break;
case 2:
{
std::cout << “simple test of fixed array (just reversed calls)\n”;
std::string name[] = {
“john”, “bobby”, “dear”, “test1”, “catherine”, “nomi”,
“shinta”, “martin”, “abe”, “may”, “zeno”, “zack”, “angeal”,
“gabby”
};
compare2(std::vector<std::string>(name, name + size(name)));
}
break;
case 3:
{
std::cout << “sorting sort content of sort.in\n”;
std::ifstream in(“sort.in”);
std::istream_iterator<std::string> begin(in), end;
std::vector<std::string> values(begin, end);
compare1(values);
}
break;
case 4:
{
std::cout << “sorting sort content of sort.in (just reversed calls\n”;
std::ifstream in(“sort.in”);
std::istream_iterator<std::string> begin(in), end;
std::vector<std::string> values(begin, end);
compare2(values);
}
break;
}
}