OpenTSDB – the perfect database for your Internet of Things projects

I got a call the other day with a question: how can we store huge amount of sensor data. They are measuring air temperature in different rooms every 5 seconds. That means 17280 data points per data, 6307200 data points per year and for 15 rooms 94608000 data points per year.

Because I never had a situation where I needed to store a huge amount of sensor data, I didn’t know the answer. But I started digging. There are many questions online regarding what database to use to store this kind of data. Some recommend old-school databases like MySQL or Oracle. Some recommend Redis, Riak or MongoDB. But one recommendation beat them all: OpenTSDB.

OpenTSDB – The Scalable Time Series Database
Store and server massive amounts of time series data without losing granularity.

Currently in version 2.0, OpenTSDB is a tested solution build on top of HBase. It was designed especially for time series data and can handle

– up to 100+ billion data points and
– 2000 new data points per second (tested by OpenTSDB developers on a old dual-core Intel Xeon CPU from 2006; I tested on a newer machine and could easily insert 20000 points in few seconds).

Long story short. It’s perfect database for huge amount of sensor data. It has great options to query data (I will explain it below), has additional features to annotate data and it’s under active development.

Installation and running it for the first time

To run OpenTSDB, you need to first install HBase. The procedure is pretty straightforward. First you need to install HBase. Download HBase, unpack, define configuration and run it with

If everything was defined correctly, you should get a message

Next step is installing OpenTSDB. There is a great tutorial how to install OpenTSDB. In short, download it and unpack or clone git repository, and run build.

It should take few minutes to compile everything. Next step is to create tables with command

You can see created tables with few opensource HBase viewers like hrider. Currently the compression is set to none. It’s highly recommend to use compression LZO, because there is no performance impact but it can highly reduce the size of your data.

Because we will store temperatures in metric temperatures, we need to create it first. OpenTSDB has a configuration to enable auto creation of metrics, but it’s not recommended, so we will do it manually.

The last step is to run everything.

If everything went well, you should see OpenTSDB page at localhost:4242. It’s that simple.

How data is stored

How OpenTSDB is storing the data is in my opinion the biggest difference compared to other databases. It does support tables, but they are actually called metrics. In each metric we can store data points. Each data points is structured as

Timestamp (unix time or ISO 9601 format) is the time of the data point. Value is a number (integer or float). Then we have tags. With tags we separate data points. In our example, we are storing value for bedroom on our first floor. This structure enables us to separate data and later make advanced queries; for example average temperature on first floor or sum of all rooms.

Storing data

With version 2.0, OpenTSDB has 2 ways to store and access data (plus one additional to store by importing data). They are Telnet API, HTTP API and batch import from a file. Make sure you have OpenTSDB running before you try the examples below.

Storing with Telnet API [Java]

We need to execute command PUT with metric and data. še dopiši

Storing with HTTP API
When working with HTTP API, we have to make a PUT request to the URL localhost:4242/api/put with JSON data.

There is also a possibility to make a batch insert. Just wrap all metrics in an array.

Personally I had few problems inserting a large amount of data with the API. I ended up using Telnet API and it seems to work really well.

Querying the data

The whole beauty of OpenTSDB is it’s ability to not only to store huge amount of data, but to also query it fast. I will be showing how to query data with HTTP API, but the same query parameters can be used with Telnet API.

For the examples, we will first insert some data. Of course we can insert a much large dataset, but for this tutorial lets keep it simple.

Getting all temperatures

Let’s break down the request:
1. We can make GET or POST requests
2. The HTTP API URL is http://localhost:4242/api/query
3. We must define start, but end is optional. It can be unix timestamp or you can define nx-ago where n is unit and x is metric. For example, 1day-ago or 1h-ago. OpenTSDB will automatically convert it to timestamp based on your time.
4. m is the metric, where we are using aggregation = sum and metric = temperatures.
5. The last is grouping_operator (inside {}), which is used to group the data. If we define it with *, then it will not group the data. We can also use it to filter. For example room=bedroom will only fetch data from bedroom.

You can read more about different parameters and what they do at http://opentsdb.net/docs/build/html/api_http/query/index.html.

Our above request returns JSON.

Getting temperatures in the bedroom

As mentioned above, we can query by tags. In our case by room=bedroom.

returns

Getting average temperature on first floor

To calculate the average of the temperatures on first floor, we have to group by tags. Be careful to define correct aggregation function (in our case avg). See all aggregators at http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html#available-aggregators.

produces

We can see tag room in aggregateTags. It means it used this tag to aggregate (or if you are familiar with other databases, GROUP BY) data.

Getting average temperatures per day

Let’s imagine a situation where we want to create reports of the temperatures on a daily basis. We could load all the data and then manually calculate the averages. For larger datasets it could take some time. OpenTSDB has an answer. We can also define downsampling. Downsampling will automatically calculate the values based on our downsampling aggregation function and timeframe.

Notice different parameter m? We added 1d-avg (be careful to separate everything correctly with “:”), which will downsample by 1 day and calculate average. Compared to manual way, it’s much faster and it just gives us results, which we can use in graphs.

Other awesome features

OpenTSDB has few additional features to cover real-life situations. Of course we can easily add more with plugins. But 2 of them worth mentioning are Annotations and CLI Tools.

Annotations
Annotations enable us to add additional meta data to data points. For example, we could store information when we opened and closed window in each room or when we changed the heating level.

Read more at http://opentsdb.net/docs/build/html/api_http/annotation.html.

CLI Tools

CLI Tools are just simple tools to perform additional task like fixing the data storage (in case if something breaks down), querying and deleting data and creating metrics. One of the most common tools I use it scan, because it has the feature to delete data. t’s useful when we are doing different tests.

To delete all temperatures for basement, we execute command

Again, we can filter what to delete with start and end parameters, metric and tags.

Wrap up

OpenTSDB has been proved to be an excellent solution. It’s scalable, fast and has really neat features. Most importantly, it’s under active development and has many people contributing. With the era of IoT and Big Data upon us, it has a bright future ahead.

If you are ready, start with http://opentsdb.net/docs/build/html/index.html.