Using StatsD with Librato Metrics

StatsD is a simple network daemon that continuously receives metrics pushed over UDP and periodically sends aggregate metrics to upstream services like Graphite and Librato Metrics. Because it uses UDP, clients (for example, web applications) can ship metrics to it very fast with little to no overhead. This means that a user can capture multiple metrics for every request to a web application, even at a rate of thousands of requests per second. Request-level metrics are aggregated over a flush interval (default 10 seconds) and pushed to an upstream metrics service.

Librato maintains a StatsD backend module that will push aggregate StatsD metrics to the Librato Metrics service. This document explains the concepts behind StatsD and how those map to Librato Metrics. Afterwards the reader should have a comfortable understanding of how to use StatsD to instrument their code and understand how those metrics are presented in Librato Metrics.

StatsD Concepts

Buckets

In StatsD, whether it's a counter, a timing, or a gauge, each metric is placed in a bucket. A bucket is just a representative name for a metric that acts as the key used to store it. When using the statsd Librato backend, each bucket is used as the metric name when that bucket is submitted to Librato Metrics. Therefore, users should ensure that buckets follow the same naming conventions for Librato Metrics metric names and users should not use the same name for a counter and a gauge, for example.

Counting

One of the more basic use cases for StatsD is counting the number of times an event occurs. You might count every time a user signs in, every time an application controller is invoked, or number of elements received in POST requests. Using a statsd client you simply invoke the increment operation each time you want to count something. You can specify an integer increment value, or let it default to one, and the client will send that increment to the statsd daemon over UDP. Multiple entities (processes or servers) can increment a single counter and the statsd daemon will update the total count of the counter bucket. On each flush interval statsd will push the current value of each counter to the upstream metrics service.

When used with Librato's backend, StatsD will push counter metrics to Librato as gauges. The gauge value will represent the aggregate of all increment and decrement operations since the last flush interval. For example, if you perform ten increment(1) operations within a flush interval, the gauge value will be '10' for that interval of time. When the measurements are rolled up to the higher level resolutions, the rollups will still be an aggregate of all operations during that period of time.

Timing

Oftentimes you'll want to record a sampling of a particular metric multiple times and track the average rate of those samplings over time. For example, you might track average response time to your web application by sampling how long each request takes. StatsD supports this type of sampling with the timing interface. While it is commonly used for tracking values that are actual times (e.g., nginx response time), it can also be used to sample any type of value, e.g. the byte size of every nginx response. Statsd clients send each sampled value to the statsd daemon.

Internally, statsd keeps each timing sample point until the next flush interval. At the flush interval, statsd will record the min, max, sum, and count for each timing bucket. Using the Librato backend, these values are composed into the complex gauge type and sent to Librato Metrics as a gauge.

Statsd also supports the capability to calculate an arbitrary number of percentiles for timing buckets. By default, the 90th percentile is automatically calculated for each timing bucket, but the user can override that with one or more percentile thresholds. The Librato backend will publish a complex gauge measurement for each defined percentile threshold, by suffixing the percentile onto the metric name. For example, for the metric name api-response-times, the 90th percentile will be sent as the additional metric api-response-times.90. The percentile gauge measurement will include the sum of all values below the percentile, the max value at the percentile, and the count number of samples below the percentile.

Gauges

Statsd also supports the concept of gauges. In Statsd, gauges are single metric readings that are recorded and published to the upstream service. They are different from timings in that only the last gauge reading in a single flush interval is sent to the upstream service. This can be useful if you want to track the current temperature or the current price of a stock where only the most recent version is required in a given flush interval. However, for anything that is request driven -- requiring measuring samples on each request -- the timing interface is preferable.

The Librato backend will submit statsd gauge reading as single-value Librato Metrics gauges. At higher rollups levels those single-value readings will be combined into complex gauges that can be used to track averages across the longer rollup periods.

Sets

Sets are a relatively new concept in recent versions of StatsD. Sets track the number of unique elements belonging to a group. This could be used to track the number of unique visitors to your site at any point in time. At each flush interval, the Librato statsd backend will push the number of unique elements in the set as a single gauge value. Be aware that if you are tracking the same set across multiple statsd servers, there is no guarantee of uniqueness across the individual statsd servers.

Installing StatsD + Librato Metrics

To get started with StatsD and Librato Metrics, follow the installation instructions located in the statsd-librato-backend README.

Feedback and Knowledge Base