STL is Slow Template Library


The C++ Standard Template Library (STL) can be pretty intimidating. I used to think that there’s a lot of magic under the covers to make things go really fast. It turns out that while using the STL is convinient for prototype, its not really built for performance.

A few days back, we needed to do some //compare-by-hash// operations on two files. In all we were doing around 32 million hash table lookups (plus of course the overhead of computing the hash values themselves). On a reasonably fast machine, one would expect that 32 million operations shouldn’t take very long. However, this particular program ran for one **whole day**.

Then Amin suggested that we rip out the STL stuff and just work with a statically allocated hash table since we weren’t really concerned about memory management at this stage. And guess what, the runtime fell to around **10 minutes**. Thats //two orders of magnitude// performance improvement!! I had never imagined that the STL could be //soo// slow.

11 comments

  1. Priyendra Deshwal

    I have serious doubts about this claim! Are you sure there wasn’t any other stuff like copy constructors etc coming in your way. STL, as you are probably aware, can be pretty deceptive when it hides big huge copy constructors.

  2. Shashikant

    Doesn’t look right. STL can’t be slower than Java. In Java, I have done hundreds of millions hash look-ups with hash size of size closer to million. I never came across a situation of hash lookup to be bottleneck in spite of the popular perception that Java is slower.

    The only case when I think hash look-ups may take such times is load factor to be closer to 1.0, where look-up can degenerate into O(n).

  3. Diwaker Gupta

    Clearly some clarifications seem to be due! :-) I don’t mean to dismiss STL on the basis of just one experiment, but regardless I was surprised at how slow it was for this particular one. One important detail that I didn’t mention was that before each hash table lookup, we were doing some file I/O — so its not like the entire data was already in memory. Though the files we were reading were not big at all, and would possibly have been pulled into the RAM in the first few reads.

    *@priyendra*: I’m not sure I follow. In our code we just have one or two instances of the hash table object. So why would the copy constructor be a problem? We’re not creating millions of instances of new objects. Am I missing something?

    *@shashikant*: I’m not in a position to argue the statement “STL can’t be slower than Java” either way but I’ll take your word for it :-) I’m just as baffled by the results — maybe I’ll post the code shortly so that you guys can check it out yourself.

  4. Priyendra Deshwal

    So I assume you used the STL hash_map right. Did you have something like:

    hash_map map;

    // insertion
    loop over millions of (k, v) pairs:
    map.insert(k, v); // this will compute the hash value for k
    // and *also* make a copy of v to put into
    // the map.

    // lookups
    loop over millions of key values k:
    ValType v = map[k]; // This copies the value v over
    // from the container to the local variable v

    So if the object which is being put into the table is largish (say c++ strings or any aggregate object) then it would be significantly faster to store pointers to the objects in the hashtable rather than the objects themselves. Alternatively, you could change the lookup to something like:

    const ValType& v = map[k];

    which will also get rid of the copies. Of course if you are storing integers or things of that scale only then this comment doesn’t really apply.

  5. David

    In addition to Priyendra’s comments, using the operator[] syntax in hash_map results in 2 copy-constructions of your object because of the way it is defined. For efficiency, insert() and find() should always be used

  6. Thomas

    I can believe this. About a year ago, I was doing a programming assignment for my university course (a Runge Kutta algorithm if I remember correctly), and for fun I decided to implement it in C, then C++ STL, then Python. I was surprised to find the C++ version took much, much longer to run than the python version.

  7. Gareth

    I was curious about this, so I wrote this program. I added a million items into a hash_map
    and found each item until I had done 32000000 finds and my results are very fast. I tried adding more into the map but I was running out of memory when I tried 32000000

    time = 1253085626.187
    time = 1253085626.661
    time = 1253085629.317

    This means finding 32000000 items took about 4 seconds. I would suggest you check your code again. Here is the source code, you may need to change the includes depending on your version of c++:
    #include
    #include

    typedef __gnu_cxx::hash_map map_type;

    int main(int argc, char** argv) {

    typedef std::pair Int_Pair;

    char buf [80];

    map_type map;

    struct timeb tp;
    ftime(&tp);
    printf(“time = %ld.%d\n”,tp.time,tp.millitm);

    int i = 0;

    for (i = 0; i<1000000; i++) {

    map.insert(Int_Pair(i, i));

    }

    ftime(&tp);
    printf("time = %ld.%d\n",tp.time,tp.millitm);

    if (i != 1000000) {
    printf ("Error!\n");
    exit(1);
    }

    map_type::iterator j;

    for (i = 0; i<32000000; i++) {
    j = map.find(i%1000000);
    }

    ftime(&tp);
    printf("time = %ld.%d\n",tp.time,tp.millitm);

    if (i != 32000000) {
    printf ("Error!\n");
    exit(1);
    }

    return (EXIT_SUCCESS);
    }

    • Diwaker Gupta

      @theoracle: I think it will be a lot more valuable if your comment explained why the post is irrelevant and misleading, rather than simply deleting the post. That way, others who come across this (search engines never forget) can get the context and walk away with new insights.

Leave a Reply