By Stephen Etheridge, EMEA Solutions Architect

In today’s increasingly automated world, we rely on vast networks of computers to help us make difficult decisions. We have long since passed the point where humans can manually analyse the vast swathes of data that are factored into each choice. Now even many purpose-built databases are failing to keep up with the scale of data generated by the vast number of IoT sensors that are being installed.

The following is a guide to what you need to think about in selecting the NoSQL database that is right for you, with some examples of how to make it work.

The challenge of the IoT

The challenge IoT brings is not a simple one to solve, yet in essence in boils down to dealing with sheer scale of data. Companies that manage their data efficiently can use the power of IoT to unlock incredible opportunities, from setting up their own automated farm to creating a home security and surveillance system that would have Kevin McCallister from Home Alone salivating.

But to activate this potential, the necessity of a powerful database quickly becomes clear. IoT, and particularly industrial IoT, can create a situation where potentially hundreds of thousands of sensors from disparate and distant sources are taking readings many times a second. Whatever database seeks to harvest, store and analyse this quantity of data needs to be able to cope with high volume, speed and variety of queries. To maintain pace with this technological evolution, time series (TS) databases were designed, in order to innately manage heavier demands than their non-TS counterparts. While modern NoSQL databases inherently solve many of the data volume and diversity issues that come with IoT, these too are having to evolve to provide optimum storage and retrieval for TS data. Solving the unique challenges that come with TS data requires a database solution that is specifically optimised for the task.

Which database?

This has not stopped some programmers from attempting to modify existing database structures to cope with the strain of IoT, with mixed results. While some databases were made to be flexible, this same flexibility can both complicate and mislead the process of extracting value from data. Column databases like Cassandra are an example of this: they have arbitrary length and content, and while they can be made to work like a TS database, they treat the time value as just another variable. This lack of ingrained understanding on the part of the database can later serve to slow down the querying and analysis processes. In situations where the competitive advantage relies on split-second decision making, these delays can have very tangible costs for a business.

It is understandable, therefore, that demand exists for enterprise-grade solutions in this space. A business that wishes to operationalize its data to stimulate growth needs the following:

1. Ease-of-use

While IoT optimized databases are starting to become more popular, it is understandable if some CTOs are weary of changing their technical operations to the point where they have to hire or train a whole new team to install, develop and maintain them. A good starting point is to opt for a program that already has optimized libraries for all the different coding languages, allowing developers to hit the ground running. Doing a proof of concept for example, would traditionally require an acquaintance period for the developer to learn everything about the database first, and then hopefully muddle through, sometimes along a very steep learning curve. Simplicity is therefore the key.

For example, learning the method of querying can slow developers. Luckily, some NoSQL databases allow for standard SQL querying. In the case of Riak TS, creating or altering a table is the same as in SQL, with only one or two lines extra.

riak-shell>CREATE TABLE GeoCheckin (region VARCHAR NOT NULL, state VARCHAR NOT NULL, time TIMESTAMP NOT NULL, weather VARCHAR NOT NULL, temperature DOUBLE, PRIMARY KEY ((region, state, QUANTUM(time, 15, 'm')), region, state, time));

The same select statements that are used in SQL are also used for queries in Riak TS and the process of adding or subtracting nodes is even easier.

2. Horizontal scalability

One of the main detractors of legacy technologies is their cost: they are expensive to maintain, repair and replace. This is especially true if scalability is a concern. Selecting a scalable system is crucial, and allows the enterprise to adapt to their needs. With the right system and database in place, a business can increase the number of nodes in their network with just a few lines of code, allowing developers to forgo the usual stress of revamping or re-configuring their data infrastructure.

Use case: video games

Video games highlight this need, as increasingly modern games are focusing on massive multiplayer experiences relying on TS data. To support these games, massively distributed architectures that can easily be scaled and can replicate data globally are needed to cope with the sheer number of players, as well as providing low latency access to all players to ensure a seamless experience. Some of the industry’s biggest brands have failed on launch day, frustrating millions of players and losing money by the minute – think Pokemon Go’s numerous hiccups on launch day. NoSQL provides scalability, consistently high performance and a flexible data model, in order to make sure this doesn’t happen.

Use case: transport

This is important for more than just gaming though. Let’s say, for example, that you are a bit of a claustrophobe, and as a result you wish to avoid the packed rush-hour public transport. You have no aversion to public transport itself, you’d just prefer not to be the proverbial sardine.

You realise that if you could gather real-time data on oyster and ticket use, and combine it with either temperature readings from tube platforms or automated video analysis from the public transport security cameras, you could make educated predictions on the relative levels of crowding and plan your commute accordingly. However, the scale of such a project would automatically preclude the use of less powerful NoSQL databases. The virtually constant stream of data, which would stop only during the early weekday hours, would promptly overload the capacity of non-optimized databases, making the data slow to access and almost impossible to query.

But with a properly set-up time series database, these large quantities of data are manageable and, what’s more, will continue to be manageable even if the amount of data increases. This means that if you wanted to add another country to the mix, your technical constraints would be based on how many nodes you’d be able to field.

Scaling with Riak TS

With some databases, this expansion is done easily enough. Again in Riak TS for example, you can attach to any node in your cluster from Riak shell. To attach to any node in your cluster from Riak shell, you must first locate your riak_shell.config file. On most systems, it will be in the /etc/riak directory with the other Riak TS configuration files. On Mac OS X, the configuration files are in the ~/riak-ts-1.4.0/etc directory. Open riak_shell.config, and add the nodename and IP addresses you wish to connect to to nodes:

   {riak_shell, [
      {logging, off},
      {cookie, riak},
      {show_connection_status, false},
      {nodes, [

The next step is to then open riak shell (if you have updated riak_shell.config, you will need to navigate back to your Riak TS directory):


You can verify your connection by running show_connection:


You will find that once Riak TS is installed on a new node and started, which amounts to a 10 minute operation, you literally only need to type:

bin/riak-admin cluster join <node_in_cluster>

where ‘node_in_cluster’ is the IP address of an existing node, it’s all automatic after that!

3. Optimization for querying and writing

Another important feature for enterprises is the ability to store and access data quickly and easily. This is paramount for many industries, from financial management to game development. Loss of data can corrupt the whole analysis process and erode trust in data, while a slow querying process can be not just irritating but costly as well.

In Riak TS, data querying occurs via the columns are based on rows in your TS table. There are three categories of column, each with a different set of rules for valid queries.

PRIMARY KEY ((a, QUANTUM(b, 1, 's'))<-Partition Key,a,b,c)<-Local Key)

Before you begin querying, however, there are a couple of guidelines you should keep in mind.
– Columns may not be compared against other columns in the query.
– When using or, you must surround the expression with parentheses or your query will return an error.

Basic queries return the full range of values between two given times for an instance within a class or type of data. Remember also that you can also extend the query beyond the primary key and use secondary columns to filter results.

Of course, even powerful TS databases are not omnipotent. It’s good practice to understand how you intend to use the data that you extract. This is usually done in range readings for TS databases, so it makes sense to optimize the databases for this kind of task. Will all the sensor readings for a day be pulled at once? Or for the week? Or for the month? Part of the index is drawn from the quantum, so for the best performance you need to understand it.

So which database should I use?

All in all, there are three main questions you should ask when deciding which database to use:
1. How easy is it to use?
2. How easy is it to scale?
3. How easy is it to query and write your data?

Once these are answered, all you have to do is set up your database, and then you can start databasing around. Tanoshimu!

Using a time series NoSQL database

About The Author

1 Comment

Leave a Reply to Akmal Chaudhri Cancel reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>