Informal Latency Benchmark for Redis GET/SET using redis_cpp
(From the half-baked-benchmarking-department in conjunction with the I-should-stash-the-results somewhere department…)
Benchmarking is serious business on the internet — you don’t want to misrepresent anyone and their hard work on any given product. That having been said, sometimes I just want ballpark values for a “lazy” estimate on what kind of numbers I’d get with a crude attempt at using a particular product.
My use case was simple — I have a complex calculation that takes tens of milliseconds. Calculating it on the fly when needed is too slow given the scale of work involved. I wanted to precompute the values once and store them in place. I was curious how low-effort I could get if I just stashed my computation results in redis and then fetched them on demand via a get using cpp_redis (available here https://github.com/Cylix/cpp_redis) I used some crude code (very crude — I didn’t dig deep and just copied some sample code in cpp_redis) that looks something like this:
auto keyspace = CreateSampleSet( client );
for( auto& key : keyspace ) {
struct timeval tv1, tv2;
gettimeofday( &tv1, NULL );
auto getreply = client.get( key );
client.commit();
getreply.wait();
gettimeofday( &tv2, NULL );
int64_t sample1 = (tv1.tv_sec * 1000000 + tv1.tv_usec);
int64_t sample2 = (tv2.tv_sec * 1000000 + tv2.tv_usec );
measurements.push_back( sample2 - sample1 );
}
I’ll let the numbers speak for themselves, but this is the histogram I got out of the experiment. I used an i7-3930k on a moderately loaded system with 64GB of RAM (mostly available.) The configuration was just over a local socket, same machine, default cpp_redis configuration using tacopie (accompanies cpp_redis), connecting to localhost over port 6379.
I tossed out the outliers. There were a few, but they were a tiny fraction of the total number of outcomes. The above distribution seemed like “typical” performance on my system. The distribution didn’t change too much from run to run.
In my use case, I will probably stick with stashing values in postgres and load everything up front at once vs get/setting on individual values. However, it’s nice to know the cost of being lazy, should the need arise. 👌