Erol.si

The complete list of all time series databases for your IoT project

January 8, 2015 - Last update: June 7, 2017

While searching the perfect database for my project, I spent hours and hours searching the internet and making a list of all candidates. Quickly I realized that the list is pretty long and the projects differ in many ways, but all of the have the same goal: store your time series data.

How the data looks like

The structure of time series data always consists of at least 2 parts (we call it a datapoint): time and value. At certain time we have a certain value. Based on architecture of different time series databases, we also annotate the datapoint with additional information. The goal of the information is to better differentiate the data and filter it easier. One of the examples is adding source=device.1 tag to the datapoint. Later we can easily fetch all the data that belongs to device.1.

So in some context, time series databases are similar to key-value databases where key is combination of time and tags. Only difference is, that we have a better ability to filter the data and all the nice aggregations functions (min, max, avg, dev, ..) are already built in (well in most cases). As with key-value databases everything starts with a key, in time series databases everything starts with time.

Below is the list of all the time series databases I found that follow the previous mentioned principle. If you find a new unlisted database or you create a new one and want to share, send me an email at erol@(enter my domain name).

The list

1. OpenTSDB
Website: http://www.opentsdb.net/
Pricing: Free
Technologies: Java, HBase
Presentation: http://youtu.be/WlsyqhrhRZA

Store and serve massive amounts of time series data without losing granularity.

2. KairosDB
Website: http://www.kairosdb.org
Pricing: Free
Technologies: Java, Cassandra
Clients: Java, Python
Presentation: http://youtu.be/Ykf_C9RZEQI?t=31m15s

KairosDB is a fast distributed scalable time series database written on top of Cassandra.

3. InfluxDB
Website: http://influxdb.com/
Pricing: Free
Technologies: Go, BoltDB
Clients: JavaScript, Ruby, Python, Node.js, PHP, Java, Clojure, Common Lisp, Go, Scala, R, Erlang, Perl, Haskell, .NET
Presentation: http://youtu.be/sRi64imN7xg

InfluxDB is a time series, metrics, and analytics database. It’s written in Go and has no external dependencies. That means once you install it there’s nothing else to manage (like Redis, ZooKeeper, HBase, or whatever).

4. TempoIQ
Website: https://www.tempoiq.com/
Pricing: Subscription
Clients: .NET, Java, Node.js, Python, Ruby
Presentation: http://youtu.be/TRv0tfFAdbY

Fast, scalable monitoring & analysis of sensor data in your application.

5. Graphite
Website: https://github.com/graphite-project

Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite’s processing backend, carbon, which stores the data in Graphite’s specialized database. The data can then be visualized through graphite’s web interfaces.

6. Druid
Website: http://druid.io/
Pricing: Free
Technologies: Java
Clients: Ruby, Python, R, Node.js
Presentation: http://youtu.be/Dlqj34l2upk

An open-source, real-time data store designed to power interactive applications at scale.

7. kdb+
Website: http://kx.com/
Technologies: K
Clients: Java, .NET, Python, Excel
Presentation: http://youtu.be/AGGGU7tVdEk

The high-performance database that sets the standard for time-series analytics.

8. RRDtool
Website: http://oss.oetiker.ch/rrdtool/

RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.

9. seriesly
Website: https://github.com/dustin/seriesly
Technologies: Go

seriesly is a database for storing and querying time series data. Unlike databases like RRDtool, it’s schemaless so you can just lob data into it and start hacking. However, it also doesn’t use a finite amount of space.

10. Cube
Website: http://square.github.io/cube/ (development seems stopped, most active fork is https://github.com/red-gate/cube)
Pricing: Free
Technologies: Node.js, MongoDB

Cube is a system for collecting timestamped events and deriving metrics. By collecting events rather than metrics, Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB.

11. IBM Informix
Website: http://www-01.ibm.com/software/data/informix/

Informix, with its TimeSeries feature, helps organizations solve the Big Data challenge of sensor data by providing unprecedented performance and scalability to applications that leverage time series data.

12. Akumuli
Website: http://www.akumuli.org/
Pricing: Free
Technologies: C++

Distributed time-series database

13. BlueFlood
Website: http://blueflood.io/
Technologies: Java
Presentation: http://vimeo.com/87210602

Blueflood is a multi-tenant distributed metric processing system created by engineers at Rackspace. It is used in production by the Cloud Monitoring team to process metrics generated by their monitoring systems. Blueflood is capable of ingesting, rolling up and serving metrics at a massive scale.

14. DalmatinerDB
Website: https://dalmatiner.io/
Technologies: Erlang, ZFS, Riak Core

DalmatinerDB is a no fluff purpose built metric database. Not a layer put on top of a general purpose database or datastores.

15. Rhombus
Website: https://github.com/Pardot/Rhombus
Pricing: Free
Technologies: Java, Cassandra

A time-series object store for Cassandra that handles all the complexity of building wide row indexes.

16. Prometheus
Website: http://prometheus.io/
Pricing: Free
Technologies: Go
Clients: Go, Java, Ruby

An open-source service monitoring system and time series database.

17. Axibase Time-Series Database
Website: http://axibase.com/products/axibase-time-series-database/
Pricing: Free & License version
Technologies: Java, HBase, Hadoop
Clients: Java, R Language, PHP, Python, Ruby, JavaScript

Axibase Time-Series Database (ATSD) is a next-generation statistics database. ATSD is for companies that need to extract value from large amounts of time-series data which exists in their IT and operational infrastructure.

18. Newts
Website: http://opennms.github.io/newts/
Pricing: Free
Technologies: Java, Cassandra

A time-series data store based on Apache Cassandra.

19. InfiniFlux
Website: http://www.infiniflux.com/
Pricing: Free & License version
Technologies: C
Clients: Java, Python, JavaScript, R, PHP

INFINIFLUX is the World’s Fastest Time Series DBMS for IoT and BigData.

20. Heroic
Website: https://spotify.github.io/heroic/
Pricing: Free
Technologies: Java, Cassandra, Elasticsearch

The Heroic Time Series Database

21. Riak TS
Website: http://basho.com/products/riak-ts/
Pricing: Free & License version
Technologies: Erlang
Clients: Java, Ruby, Python, Erlang, Node.js
Presentation: https://www.youtube.com/watch?v=l-U-oSnpdLQ

Riak TS is the only enterprise-grade NoSQL database optimized for IoT and Time Series data

22. The Warp 10 Platform
Website: http://www.warp10.io/
Pricing: Free
Technologies: Java, WarpScript
Presentation: http://www.slideshare.net/Mathias-Herberts/warp-10-platform-presentation-criteo-beer-tech-20160203

Warp 10 is an Open Source Geo Time Series® Platform designed to handle data coming from sensors, monitoring systems and the Internet of Things.

23. KsanaDB
Website: https://github.com/zzzmanzzz/KsanaDB
Pricing: Free
Technologies: Go, Redis

KsanaDB is a time series database, base on redis and go.

24. eXtremeDB DBMS
Website: http://financial.mcobject.com/
Pricing: License version
Presentation: https://youtu.be/lG7Fw1sHFKQ

The eXtremeDB DBMS product family delivers high levels of scalability, reliability and processing speed for storing and manipulating complex data, and is used successfully in environments ranging from Big Data analytics in the data center to supporting increasingly “smart” features in the resource-constrained devices comprising the Internet of Things.

25. SiriDB
Website: http://siridb.net/
Pricing: Free
Technologies: C

SiriDB is a highly-scalable, robust and fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without a global index and allows server resources to be added on the fly. SiriDB’s unique query language includes dynamic grouping of time series for easy analysis over large amounts of time series.

26. DB4IoT
Website: http://db4iot.com/
Pricing: License version
Presentation: https://youtu.be/X-4OB_8SzTI

Visualize and analyze time-series IoT data with blazing-fast interactive maps for the
“Internet of Moving Things.”

27. quasardb
Website: http://www.quasardb.net/
Pricing: Client API Open Source, Community Edition and Enterprise Edition
Technologies: C++14
Clients: Java, .NET, Python, Excel, C, C#, C++, R

Quasardb is a high-performance, distributed, column-oriented database with native time series support.

I will frequently update the list and add new time series databases as they come along.

pg_dump: permission denied for relation mytable – LOCK TABLE public.mytable IN ACCESS SHARE MODE

January 3, 2015 - Last update: December 26, 2014

One of the good practices is to create backups of your database at regular intervals. If you are using PostgreSQL database, you can use built-in tool called pg_dump. With pg_dump we can export the database structure and data. In case we want to dump all databases, then we can use pg_dumpall.

When I was creating a simple bash script, I was getting a very strange error: pg_dump: permission denied for relation mytable – LOCK TABLE public.mytable IN ACCESS SHARE MODE. Googling around I got few tips how to solve the problem, but no actual solution.

Script to dump

To make our life easier, we use a script to make the whole process easier. It’s also convenient to have a script which we can later call from other processes, from build tools (backup because upgrading) or with cron.

#!/bin/bash  

SCRIPT=$(readlink -m $( type -p $0 ))      
BASE_DIR=$(dirname $SCRIPT)             
BASE_DIR=${BASE_DIR/tools/}
NAME=$(basename $SCRIPT)  

now=$(date +"%Y_%m_%d")
pg_dump -i -U databaseuser -h localhost -F t databasename > /backups/database/database_backup_$now.tar

TODAY=$(date +"%Y-%m-%d")
echo "$TODAY: Successful backup of database" >> $BASE_DIR'logs/dbbackup.log'

#!/bin/bash

SCRIPT=$(readlink -m $( type -p $0 ))

BASE_DIR=$(dirname $SCRIPT)

BASE_DIR=${BASE_DIR/tools/}

NAME=$(basename $SCRIPT)

now=$(date +"%Y_%m_%d")

pg_dump -i -U databaseuser -h localhost -F t databasename > /backups/database/database_backup_$now.tar

TODAY=$(date +"%Y-%m-%d")

echo "$TODAY: Successful backup of database" >> $BASE_DIR'logs/dbbackup.log'

When we run this, we get previous error. Big problem.

Locked table problem

Problem is with permissions. There are multiple permission layers. First is if we actually have an access to database. Second layer is if we actually have an access to table; in our case table mytable. To check it, we need to see the structure and permissions of the table.

$ su - postgres
$ psql -d databasename
databasename=# \d+

$ su - postgres

$ psql -d databasename

databasename=# \d+

Above commands with output all the tables in the database. If we check the columns, we will notice that there is a owner column. In our case it’s important that table owner and export user is same, otherwise we get the permission problem.

To change the permission of the table, we need to run the command

ALTER TABLE mytable OWNER TO actual_databaseuser;

1	ALTER TABLE mytable OWNER TO actual_databaseuser;

The command will alter the table ownership to our export user. Be sure you change owner for every table in the selected database.

Extra tip – cron

Of course we don’t have time and we especially don’t want to waste time for tasks that can be automated. One of them is actually running our dbbackup.sh every week, month or at some interval you desire. To perform backups every week, we can use cron.

To add a cron job, just run

$ crontab -e

1	$ crontab -e

It will show simple editor where you write your tasks/jobs. In our case, we will run backup every Sunday at the morning (00:00).

0 0 * * 0 /home/user/dbbackup.sh >> /var/log/dbbackupcron.log 2>&1

1	0 0 * * 0 /home/user/dbbackup.sh >> /var/log/dbbackupcron.log 2>&1

To make our script work with cron, we need to add an extra thing. Because if we run the code, we are asked for the password. Cron cannot enter the password, so it will fail. Based on suggestions, we should create ~/.pgpass file and add a line in it.

localhost:5432:databasename:databaseuser:password

1	localhost:5432:databasename:databaseuser:password

Now when the cron will run the script, everything will work.

Scripts to start and stop Play Framework application

December 23, 2014 - Last update: January 25, 2015

Play Framework (I’m talking about 2.X version) has multiple deployment ways. If you just check the docs, you will notice it has few pages of instructions just how to deploy your Play Framework application. For me, using stage seems to be the best and most stable way.

Stage

When deploying an application, I always run clean and stage commands. The first one will remove compiled and cached files. Second one will compile the application and create an executable. Everything will be located in ${application-home}/target/universal/stage/. There you have a bin folder and inside simple script to run the app. We will create start/stop scripts to make our work a little bit easier.

Scripts start/stop

To run the application, we use nohup. nohup enables us that the application will run even when we close the terminal or logout from our development machine. As everyone says, it’s not perfect solution but it works. We run the command in stage folder and add additional parameters.

nohup target/universal/stage/bin/myapp -J-server -J-Xms256M -J-Xmx256M -Dconfig.resource=application-prod.conf -Dhttp.port=9000 > /dev/null 2>&1 & echo $! > RUNNING_PID

1	nohup target/universal/stage/bin/myapp -J-server -J-Xms256M -J-Xmx256M -Dconfig.resource=application-prod.conf -Dhttp.port=9000 > /dev/null 2>&1 & echo $! > RUNNING_PID

-J-Server enables us to pass additional JVM related settings. In our case, we define Xms and Xmx for better memory management.
We have application.conf for development and application-prod.conf for the production with different settings (database, secret key, API logins, etc).
With -Dhttp.port we define the port of our application. We use apache to map the 80 port to 9000. It’s much safer and easier like this, because later we can have load balancer in the middle to divide the load to multiple application instances.

When running nohup, it will create a nohup.out file in which it logs everything (basically what application returns). Don’t confuse it with application logs. Application will still log everything based on logger.xml configuration independently of nohup. To prevent nohup.out file, we have to redirect everything to /dev/null and basically just ignore it.

On the end, we output the pid into RUNNING_PID file. Be careful. Play Framework automatically creates an additional file RUNNING_PID in stage folder. We add this as extra information and the file is removed after stopping our application.

test -f target/universal/stage/RUNNING_PID && kill `cat target/universal/stage/RUNNING_PID` && sleep 5;
rm RUNNING_PID;

1 2	test -f target/universal/stage/RUNNING_PID && kill `cat target/universal/stage/RUNNING_PID` && sleep 5; rm RUNNING_PID;

When we want to stop our application, we need to get the pid. We read it from stage folder RUNNING_PID file and pass it to kill command. For safety reasons we wait 5 seconds just to be sure that application is stopped. We could have a running job which needs few more seconds to complete or save the state.

Extra tip

We can also pass additional parameters to our stage command. One of them is javaagent. If we are using some remote monitoring solution like newrelic, we can include jar to send application data.

To do so, we need to pass -J-javaagent:/path/to/newrelic.jar with all the other parameters. Be sure that you include the correct path, because otherwise it will fail to start the application.

Best Internet of Things books

December 15, 2014 - Last update: February 1, 2015

First we need to learn few things about IoT. Although the concept wasn’t named until 1999, the Internet of Things has been in development for decades. The first Internet appliance, for example, was a Coke machine at Carnegie Melon University in the early 1980s. The programmers could connect to the machine over the Internet, check the status of the machine and determine whether or not there would be a cold drink awaiting them, should they decide to make the trip down to the machine.

Kevin Ashton, cofounder and executive director of the Auto-ID Center at MIT, first mentioned the Internet of Things in a presentation he made to Procter & Gamble. From that day, IoT started to slowly grow and today it’s one of the fastest growing trends.

10 best Internet of Things books

One of the best ways to learn about IoT is using books. I have completed the list of some of the best and a must reads.

Internet of Things (A Hands-on-Approach) (2014)
Internet of Things (IoT) refers to physical and virtual objects that have unique identities and are connected to the internet to facilitate intelligent applications that make energy, logistics, industrial control, retail, agriculture and many other domains “smarter”.
Designing the Internet of Things (2013)
Whether it’s called physical computing, ubiquitous computing, or the Internet of Things, it’s a hot topic in technology: how to channel your inner Steve Jobs and successfully combine hardware, embedded software, web services, electronics, and cool design to create cutting-edge devices that are fun, interactive, and practical. If you’d like to create the next must-have product, this unique book is the perfect place to start.
The Internet of Things (2014)
You might only have heard this expression recently (indeed you might never had heard of it) but, apparently this is a concept that has been around for some time.The term was coined around the turn of the millennium and refers to the potential interconnectivity of basically all electronic devices and capacity to record, monitor and transmit information between them to achieve all manner of wonderful (and maybe not-so-wonderful) outcomes.
The Silent Intelligence: The Internet of Things (2013)
The Silent Intelligence is a book about the Internet of Things. We talk about the history, trends, technology ecosystem and future of Connected Cities, Connected Homes, Connected Health and Connected Cars. We also discuss the most exciting growth areas for entrepreneurs and venture capital investors.
Rethinking the Internet of Things: A Scalable Approach to Connecting Everything (2013)
Over the next decade, most devices connected to the Internet will not be used by people in the familiar way that personal computers, tablets and smart phones are. Billions of interconnected devices will be monitoring the environment, transportation systems, factories, farms, forests, utilities, soil and weather conditions, oceans and resources.
The Epic Struggle of the Internet of Things (2014)
If the hype is to be believed then the next big thing is the Internet of Things. But is it what you think it is? Because the Internet of Things is not about things on the internet. A world in which all our household gadgets can communicate with each other may sound vaguely useful, but it’s not really for us consumers.
Designing for Emerging Technologies: UX for Genomics, Robotics, and the Internet of Things (2014)
The recent digital and mobile revolutions are a minor blip compared to the next wave of technological change, as everything from robot swarms to skin-top embeddable computers and bio printable organs start appearing in coming years. In this collection of inspiring essays, designers, engineers, and researchers discuss their approaches to experience design for groundbreaking technologies.
From Machine-to-Machine to the Internet of Things: Introduction to a New Age of Intelligence (2014)
This book outlines the background and overall vision for the Internet of Things (IoT) and Machine-to-Machine (M2M) communications and services, including major standards. Key technologies are described, and include everything from physical instrumentation of devices to the cloud infrastructures used to collect data.
Enchanted Objects: Design, Human Desire, and the Internet of Things (2014)
In the tradition of Who Owns the Future? and The Second Machine Age, an MIT Media Lab scientist imagines how everyday objects can intuit our needs and improve our lives. We are now standing at the precipice of the next transformative development: the Internet of Things.
Invest, Make Money & Retire Early From The Internet Of Things Revolution (2014)
The internet of things will equate to one of the single most lucrative investment opportunities in the history of modern business and will be bigger than the economies of China, Norway, and Canada, combined. Simply put the internet of things is connecting any device with an on and off switch to the Internet (and/or to each other).

Handle file uploads in Play Framework 2.x [Java]

October 25, 2014 - Last update: October 25, 2014

Most applications have the ability to upload something. Handling uploaded files should not be hard. We need to check if user uploaded the file, if it’s the right type and store it. Mather of fact, this is really easy with Play Framework.

An example

We have a form to upload a file.

<form action="/upload" method="POST" enctype="multipart/form-data">
    <input type="file" name="file" />
</form>

</form>

This form will take one file and post it to /upload path. To be able to upload a file, we need to define enctype to multipart/form-data. This just means how the POST will be constructed and how the file will be send.

Next this is to create a controller and a method. We will only enable uploading of PDF files.

package controllers;

import play.mvc.Result;
import static play.mvc.Results.badRequest;
import static play.mvc.Results.ok;
import play.mvc.Controller;
import play.mvc.Http;

public class Uploads extends Controller
{
    public static Result upload()
    {
        Http.MultipartFormData body = request().body().asMultipartFormData();
        if(body == null)
        {
            return badRequest("Invalid request, required is POST with enctype=multipart/form-data.");
        }

        Http.MultipartFormData.FilePart filePart = body.getFile("file");
        if(filePart == null)
        {
             return badRequest("Invalid request, no file has been sent.");
        }

        // getContentType can return null, so we check the other way around to prevent null exception
        if(!"application/pdf".equalsIgnoreCase(filePart.getContentType())
        {
             return badRequest("Invalid request, only PDFs are allowed.");
        }
        
        File file = filePart.getFile();

        // handle file

        return ok();    
    }
}

package controllers;

import play.mvc.Result;

import static play.mvc.Results.badRequest;

import static play.mvc.Results.ok;

import play.mvc.Controller;

import play.mvc.Http;

public class Uploads extends Controller

{

public static Result upload()

{

Http.MultipartFormData body = request().body().asMultipartFormData();

if(body == null)

{

return badRequest("Invalid request, required is POST with enctype=multipart/form-data.");

}

Http.MultipartFormData.FilePart filePart = body.getFile("file");

if(filePart == null)

{

return badRequest("Invalid request, no file has been sent.");

}

// getContentType can return null, so we check the other way around to prevent null exception

if(!"application/pdf".equalsIgnoreCase(filePart.getContentType())

{

return badRequest("Invalid request, only PDFs are allowed.");

}

File file = filePart.getFile();

// handle file

return ok();

}

Very simple, right? I highly recommend you move this code to somewhere else (for example to some service). Good practice is to keep controllers slim.

First we need to check type of the request and check if it’s multipart/form-data. If body is null, then something is wrong. Same thing is for a file. If there is no file present, we need to report an error. Beware, it’s easy to modify the content type. Checking if the file is really PDF can sometimes be more difficult. Best ways are to use some additional libraries – one of the is Apache Tika.

Handling multiple files

We can also handle multiple file at once. All we have to do is loop through all posted files.

package controllers;

import play.mvc.Result;
import static play.mvc.Results.badRequest;
import static play.mvc.Results.ok;
import play.mvc.Controller;
import play.mvc.Http;
import java.util.List;

public class Uploads extends Controller
{
    public static Result upload()
    {
        Http.MultipartFormData body = request().body().asMultipartFormData();
        if(body == null)
        {
            return badRequest("Invalid request, required is POST with enctype=multipart/form-data.");
        }

        List<Http.MultipartFormData.FilePart> fileParts = body.getFiles();
        if(fileParts.isEmpty())
        {
            return badRequest("Invalid request, no files have been included in the request.");
        }

        for(Http.MultipartFormData.FilePart filePart: fileParts)
        {
            if(!"application/pdf".equalsIgnoreCase(filePart.getContentType())
            {
                 return badRequest("Invalid request, only PDFs are allowed.");
            }
        
            File file = filePart.getFile();
            // handle file
        }

        return ok();    
    }
}

package controllers;

import play.mvc.Result;

import static play.mvc.Results.badRequest;

import static play.mvc.Results.ok;

import play.mvc.Controller;

import play.mvc.Http;

import java.util.List;

public class Uploads extends Controller

{

public static Result upload()

{

Http.MultipartFormData body = request().body().asMultipartFormData();

if(body == null)

{

return badRequest("Invalid request, required is POST with enctype=multipart/form-data.");

}

List<Http.MultipartFormData.FilePart> fileParts = body.getFiles();

if(fileParts.isEmpty())

{

return badRequest("Invalid request, no files have been included in the request.");

}

for(Http.MultipartFormData.FilePart filePart: fileParts)

{

if(!"application/pdf".equalsIgnoreCase(filePart.getContentType())

{

return badRequest("Invalid request, only PDFs are allowed.");

}

File file = filePart.getFile();

// handle file

}

return ok();

}

Extra tip

When we upload the file, it will be stored into /tmp folder (of course if we use Linux server). Then we just need to move or copy the file to the right folder.

Recommended way is to use highly tested library Apache Commons IO and methods FileUtils.copyFile(source, destination) or FileUtils.moveFile(source, destination).

try
{
    File file = filePart.getFile();
    File destination = new File("/home/app/uploads/", file.getName());
    FileUtils.moveFile(file, destination);
}
catch(IOException ex)
{
    // something went wrong, handle it
}

try

{

File file = filePart.getFile();

File destination = new File("/home/app/uploads/", file.getName());

FileUtils.moveFile(file, destination);

}

catch(IOException ex)

{

// something went wrong, handle it

}

Run a program as certain user from service on Windows XP

September 3, 2014 - Last update: September 3, 2014

At work we are collecting data about different production lines. We use Windows XP (yes, I know) machines. On them we have a server which connects to sensors. We access the server with our Python script and use data for further processing.

Pretty simple, right? Well it gets little more complicated. We have a service (Quartz.net) to call our Python script every 5 seconds. Everything works great, but there is one little problem.

The problem

The server is very unstable and crashes a lot. There is no log to check what is the problem. We have no idea how to make it more robust. But good thing is that when server crashes, all you have to do is login to the machine and restart it (doubleclick the icon to start the program again).

So the idea was just to automatically start the server program when we detect crash. But it’s not so simple.

Solution #1

First we need to know one thing. The Quartz.net service is running as SYSTEM. It calls the Python script as SYSTEM user. That means that everything we call from script, it will run under SYSTEM user.

Our first solution was to simply call the server program from our Python script.

import subprocess
subprocess.call(['C:\server.exe'])

1 2	import subprocess subprocess.call(['C:\server.exe'])

As mentioned before, this will run server.exe as SYSTEM user. Not a good thing, because quickly we got problems with too much instances and memory leaks. If user manually started the server, it ran 2 instances under different users.

Solution #2

We cannot run program as SYSTEM user, but as logged in user – Mike. Idea was to use some Python pywin32 extensions to control the system. There is a nice code where you pass user credentials with program path and it works. Then you use the code to run the process as another user.

The code does not work for me. I tried different variations, but not success. Someone also mentioned that I need to change some permissions. Even though I changed them, it still did not work. Even if it did, it would be really tricky to make this kind of changes if they require admin permissions.

Solution #3

Let’s forget running a program from Python script and maybe use some other way. Of course, here comes the Windows Batch. There is a really nice command runas. The command takes the login credentials and path to executable program.

This works only if your user doesn’t have password. But out Mike user has password, so we couldn’t use it.

The problem is that when you start the batch script which calls runas, it will prompt you for a password. There is a parameter /savecred. Basically you enter only password the first time and it memorizes it. But in our case, the service calls the Python script and Python script calls the batch file which calls runas. But when runas prompts for a password, the service cannot enter it. So nothing happens and this also does not work.

Solution #4

Reading online, these is a really nice program called PsExec. This allows you to login to the computer (mostly used for accessing remote computer, but also works for local) as certain user and execute program.

psexec \\computername -u Mike -p pass1234 -l -d "C:\server.exe"

1	psexec \\computername -u Mike -p pass1234 -l -d "C:\server.exe"

First I tried to run it without -l parameter and nothing happened. But when I added, it still ran as SYSTEM user instead of user Mike. -l parameter actually means Run process as limited user, which explains the problem. Again, did not work for us.

Solution #5 – The working one

The working solution is really interesting and uses built in features of Windows. It’s called Task Scheduler.

:: Script to restart Server by creating a scheduled task
:: All other solution did not enable us to run server under logged in user
:: Beware /sc is language dependent

@echo off

schtasks /create /tn "Restart Server" /tr "C:\server.exe" /sc once /st 23:59:59 /ru Mike /rp pass1234
schtasks /run /tn "Restart Server"

ping 127.0.0.1 -n 3 > nul

schtasks /delete /tn "Restart Server" /f

:: Script to restart Server by creating a scheduled task

:: All other solution did not enable us to run server under logged in user

:: Beware /sc is language dependent

@echo off

schtasks /create /tn "Restart Server" /tr "C:\server.exe" /sc once /st 23:59:59 /ru Mike /rp pass1234

schtasks /run /tn "Restart Server"

ping 127.0.0.1 -n 3 > nul

schtasks /delete /tn "Restart Server" /f

Windows has builtin command called schtasks. With this command we can schedule a new task, define the frequency, manually start it and on the end even delete it.

Creating a scheduled task has few problems. First is that scheduled tasks are executed every minute (even though we can define start time /st with seconds). In our case, we need to define that task runs few seconds after we created it. But if we define it to run at 16:00:05, it will actually never run. Because all times from 16:00:00 to 16:00:59 will actually run at 16:00:00. We could add a minute to the time, or example 16:01:00, but in worse case scenario we would need to wait almost a minute for scheduled task to run. At the same time, adding minutes to time in bat script is not really easy.

Our solution actually says to run at end of the day (time actually doesn’t even matter), but then we manually execute it. There is a ping method, which is actually similar to sleep. We wait for 3 seconds for task to finish and then delete it.

In short, our code created a task to run under user Mike, manually runs the task, waits for it to finish and then deletes the task. The end result in that our program finally starts under user Mike.

But beware of the on really crazy thing. The /sc parameter defines the frequency of the task. We can define or to run task every day or week and so on. But the parameter is language dependent. So in English is ONCE, in Slovenian is ENKRAT, in German EINMAL and so on. Strange, right?

Why I think Spring Data repositories are awesome – Part 2

August 17, 2014 - Last update: August 31, 2014

In the first part, we covered some very basic things we can do with Spring Data repositories. In this part, we will learn how to make more complex queries. By that I mean how to find data by entity field or make a count. You will be amazed how easy it is with Spring Data.

Entities

We will use Post entity from our previous part, update it and add an entity called User.

package models;
 
import java.util.Date;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.ManyToOne;
import javax.persistence.Table;

@Entity
@Table(name = "posts")
public class Post
{
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    public Long id;
    
    @Column(nullable = false)
    public String title;    

    @Column(nullable = false, unique = true)
    public String url;  
 
    @Column(nullable = false, name = "created_at")
    public Date created_at;
 
    @Column(name = "is_active")
    public boolean isActive;

    @ManyToOne(optional = false, fetch = FetchType.LAZY)
    public User user;
}

package models;

import java.util.Date;

import javax.persistence.Column;

import javax.persistence.Entity;

import javax.persistence.GeneratedValue;

import javax.persistence.GenerationType;

import javax.persistence.Id;

import javax.persistence.ManyToOne;

import javax.persistence.Table;

@Entity

@Table(name = "posts")

public class Post

{

@Id

@GeneratedValue(strategy = GenerationType.AUTO)

public Long id;

@Column(nullable = false)

public String title;

@Column(nullable = false, unique = true)

public String url;

@Column(nullable = false, name = "created_at")

public Date created_at;

@Column(name = "is_active")

public boolean isActive;

@ManyToOne(optional = false, fetch = FetchType.LAZY)

public User user;

}

package models;
 
import java.util.Date;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;

@Entity
@Table(name = "users")
public class User
{
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    public Long id;
    
    @Column(nullable = false)
    public String username;

    @OneToMany(mappedBy = "user", fetch = FetchType.LAZY)
    public Set<Post> posts;
}

package models;

import java.util.Date;

import javax.persistence.Column;

import javax.persistence.Entity;

import javax.persistence.GeneratedValue;

import javax.persistence.GenerationType;

import javax.persistence.Id;

import javax.persistence.OneToMany;

import javax.persistence.Table;

@Entity

@Table(name = "users")

public class User

{

@Id

@GeneratedValue(strategy = GenerationType.AUTO)

public Long id;

@Column(nullable = false)

public String username;

@OneToMany(mappedBy = "user", fetch = FetchType.LAZY)

public Set<Post> posts;

}

As you can see, we have defined relation between our entities. Every user can have multiple posts and each post has only 1 user.

Tasks

Let’s imagine we have a situation where we want to create a simple blogging system. We have to be able to:

1. find all active posts
2. find a post by an url
3. find all posts by a user
4. count all active posts

1. Find all active posts

We will use the PostRepository we have defined in our first part. By simply adding a method to the repository, it will generate the right code and map everything to a SQL. Because CrudRepository already has few basic methods prebuilt, we don’t need to add a method to find all posts. Instead, we can use findAll method.

But to find all active posts we have to define our custom method. Actually, it’s very simple.

package repositories;
 
import java.util.Set;
import org.springframework.data.repository.CrudRepository;
import org.springframework.stereotype.Repository;
import models.Post;
 
@Repository
public interface PostRepository extends CrudRepository<Post, Long>
{
    public Set<Post> findByIsActiveTrue();
}

package repositories;

import java.util.Set;

import org.springframework.data.repository.CrudRepository;

import org.springframework.stereotype.Repository;

import models.Post;

@Repository

public interface PostRepository extends CrudRepository<Post, Long>

{

public Set<Post> findByIsActiveTrue();

}

This is it. This is the whole magic. One short line of code. But how it actually works? Spring Data will build the query based on method name and method return type. It will split the method name. In our case into find, by, isactive, true. First part defines to make a select query, the second indicates we want to filter, the third field name and fourth the value of the field. But be careful, defining field value after specifying the field name it will only work for booleans. For other field types, you need to pass the value as method argument. One great this is also that we can add multiple fields.

2. Find post by an url

Continuing the though from the previous section, we can have method build from different fields. For example, let’s load a post by it’s url. We have to update our repository.

@Repository
public interface PostRepository extends CrudRepository<Post, Long>
{
    public Set<Post> findByIsActiveTrue();
    public Post findByUrl(String url);
}

@Repository

public interface PostRepository extends CrudRepository<Post, Long>

{

public Set<Post> findByIsActiveTrue();

public Post findByUrl(String url);

}

Again, if we look at the method name, we will see that we are finding a post by it’s url. Because the url is a String, we have to pass the value as a method argument. Because return type is a Post, it will return one post. In case if the query returns multiple rows, then the exception will be thrown. Be careful that when you expect only 1 record, that you query by some unique field.

But actually, our blog system has to return a post by url and be active. We could have a code where we load the post by url and then check if isActive or not. Instead, we can do this in one query.

@Repository
public interface PostRepository extends CrudRepository<Post, Long>
{
    public Set<Post> findByIsActiveTrue();
    public Post findByUrlAndIsActiveTrue(String url);
}

@Repository

public interface PostRepository extends CrudRepository<Post, Long>

{

public Set<Post> findByIsActiveTrue();

public Post findByUrlAndIsActiveTrue(String url);

}

We are now querying the database by 2 field: url and isActive. When we use different fields in a method name, all of them are joined by AND. We cannot use OR. For that, we have to use some other approach (we will explain it in another tutorial).

3. Find all posts by a user

Every user has a username. Our task is to find all posts by a user or more specific, find all posts by a username. Writing a method name is actually the same, we just need to include the relation name. Again, we update our PostRepository.

@Repository
public interface PostRepository extends CrudRepository<Post, Long>
{
    public Set<Post> findByIsActiveTrue();
    public Post findByUrlAndIsActiveTrue(String url);
    public Set<Post> findByUserUsername(String username);
}

@Repository

public interface PostRepository extends CrudRepository<Post, Long>

{

public Set<Post> findByIsActiveTrue();

public Post findByUrlAndIsActiveTrue(String url);

public Set<Post> findByUserUsername(String username);

}

Method name has to include the relation name. When we defined it in the entity, we have to use the same name in the method. If we change it in the entity, we also have to change the method name. It may look complicated and no really robust to changes, but there is no other way for Spring Data to know how to correctly build the query. Of course, as we mentioned it few times already, in the next part we will learn how we can actually use custom queries to help out Spring Data with building the native SQL query.

Once we define the relation in the method name, everything else is the same. We again filter by field name, we can or use multiple fields. But remember, for each relation field, we have to prepend the name with the relation name.

4. Count all active posts

For the last task, we have to count all active posts. CrudRepository already has a method called count(), but it will count all posts. We could use findByIsActiveTrue() method to find all posts and get a populated Set. All we have to do then is to call .size() and there, we have the count of all active posts.

Don’t do that. Sure, it works and it might even work in the production for a small number of posts, but in case of a larger dataset, it’s not a good practice. We have to fetch all the records, populate Set and then call .size() to just get one number. It’s a too big overhead.

Instead, we will use count which maps to SQL count. It’s much much faster and consumes much less resources. Before, we were finding records, so we prepended every method name with find. If we want to count, we have to do what? You are right, prepend every method name with count. Let’s for the last time update our PostRepository.

@Repository
public interface PostRepository extends CrudRepository<Post, Long>
{
    public Set<Post> findByIsActiveTrue();
    public Post findByUrlAndIsActiveTrue(String url);
    public Set<Post> findByUserUsername(String username);
    public Long countByIsActiveTrue();
}

@Repository

public interface PostRepository extends CrudRepository<Post, Long>

{

public Set<Post> findByIsActiveTrue();

public Post findByUrlAndIsActiveTrue(String url);

public Set<Post> findByUserUsername(String username);

public Long countByIsActiveTrue();

}

There are few differences compared to other method names. First is return type. It has to be a Long, so it will bind a row count to it (Integer can be too small). As we mentioned before, we start method with count and then define the field filters. It’s that simple.

Part 3 – What more will we learn?

In the next part, we will how we can make even more complex queries by using @Query annotation. @Query annotation enables us to write HQL, which is very similar to SQL but has a compile time checking. Another thing we will learn is how to extend Repository and use PersistanceManager to build super complex queries. We will create custom methods and insert them into repositories. It’s a really cool and advanced feature, so stay tuned.

ng-repeat with draggable or how to correctly use AngularJS with jQuery UI

July 27, 2014 - Last update: August 31, 2014

AngularJS is an amazing framework. Together with jQuery and jQuery UI is a killer combo. But sometimes it’s really difficult to make them work together.

Task

Imagine we have a box (div) and inside some elements that we can drag around.

<div id="items" >
    <span>drag me 1</span>
    <span>drag me 2</span>
    <span>drag me 3</span>
    <span>drag me ...</span>
</div>

</div>

Solution #1

We will attach the jQuery UI draggable inside of the directive that we added to html (items-drag)

<div id="items" items-drag>
    <span>drag me 1</span>
    <span>drag me 2</span>
    <span>drag me 3</span>
    <span>drag me ...</span>
</div>

</div>

and create an app with a directive.

var app = angular.module('app', []);

app.directive('itemsDrag', function()
{
    return {
        link: function(scope, element, attrs)
        {
             // element == $('#items')
             element.find('span').draggable();

             scope.$on('$destroy', function()
             {
                 element.find('span').off('**');
             });
        }
    };
});

var app = angular.module('app', []);

app.directive('itemsDrag', function()

{

return {

link: function(scope, element, attrs)

{

// element == $('#items')

element.find('span').draggable();

scope.$on('$destroy', function()

{

element.find('span').off('**');

});

}

};

});

This works, but it’s totally unrealistic. In real life we probably load items from somewhere and populate the div. So let’s try that.

Solution #2

We add the controller ItemsController to the HTML with ng-repeat

<div ng-controller="ItemsController">
    <div id="items" items-drag>
        <span ng-repeat="item in items">drag me {{ item }}</span>
    </div>
</div>

</div>

and add controller to our app.

var app = angular.module('app', []);

app.controller('ItemsController', function($scope, $timeout)
{
    $scope.items = [];

    // we will simulate loading items from the server with timeout
    $timeout(function()
    {
        $scope.items = [1, 2, 3, 4, 5, 6];
    }, 4000); 
});

app.directive('itemsDrag', function()
{
    return {
        link: function(scope, element, attrs)
        {
             element.find('span').draggable();

             scope.$on('$destroy', function()
             {
                 element.find('span').off('**');
             });
        }
    };
});

var app = angular.module('app', []);

app.controller('ItemsController', function($scope, $timeout)

{

$scope.items = [];

// we will simulate loading items from the server with timeout

$timeout(function()

{

$scope.items = [1, 2, 3, 4, 5, 6];

}, 4000);

});

app.directive('itemsDrag', function()

{

return {

link: function(scope, element, attrs)

{

element.find('span').draggable();

scope.$on('$destroy', function()

{

element.find('span').off('**');

});

}

};

});

This will NOT work. Because when directive is loaded, it will find all spans and attach draggable to them. But because items are empty, it won’t find any spans. When they are loaded from the server ($timeout executes), ng-repeat will repeat and show items, but draggable will not be attached.

We can solve this by adding $watch and watching when items update and attach draggable. Let’s just update our directive.

app.directive('itemsDrag', function()
{
    return {
        link: function(scope, element, attrs)
        {
             scope.$watch('items', function(items)
             {
                 // items changed, let's reattach again
                 element.find('span').off('**').draggable();

                 scope.$on('$destroy', function()
                 {
                     element.find('span').off('**');
                 });
             });             
        }
    };
});

app.directive('itemsDrag', function()

{

return {

link: function(scope, element, attrs)

{

scope.$watch('items', function(items)

{

// items changed, let's reattach again

element.find('span').off('**').draggable();

scope.$on('$destroy', function()

{

element.find('span').off('**');

});

}

};

});

This works. Great. But actually there is a big problem. When ng-repeat is adding the elements into the DOM, $watch method is fired and draggable is attached to items. Problem is that this happens during ng-repeat so draggable is not attached to all elements. What now?

Solution #3

We need to somehow wait for ng-repeat to finish and that all elements/items are loaded into DOM. Based on my research, there is no bulletproof way. Some suggest to use timeout.

app.directive('itemsDrag', function()
{
    return {
        link: function(scope, element, attrs)
        {
             scope.$watch('items', function(items)
             {
                 // attach draggable when all elements are loaded into DOM
                 // (hopefully)
                 $timeout(function()
                 {
                     element.find('span').off('**').draggable();
                 }, 500);

                 scope.$on('$destroy', function()
                 {
                     element.find('span').off('**');
                 });
             });
        }
    };
});

app.directive('itemsDrag', function()

{

return {

link: function(scope, element, attrs)

{

scope.$watch('items', function(items)

{

// attach draggable when all elements are loaded into DOM

// (hopefully)

$timeout(function()

{

element.find('span').off('**').draggable();

}, 500);

scope.$on('$destroy', function()

{

element.find('span').off('**');

});

}

};

});

This solution has one big problem. We cannot never set the right timeout time. If we set too small, it won’t work if we have a long list of items. If we set too large, then we can impact the user experience.

Solution #4 – The working one

The working solution is actually really simple and works for small or large lists of items without impacting the user experience.

<div ng-controller="ItemsController">
    <div id="items">
        <span ng-repeat="item in items" items-drag>drag me {{ item }}</span>
    </div>
</div>

</div>

var app = angular.module('app', []);

app.controller('ItemsController', function($scope, $timeout)
{
    $scope.items = [];

    // we will simulate loading items from the server with timeout
    $timeout(function()
    {
        $scope.items = [1, 2, 3, 4, 5, 6]
    }, 4000); 
});

app.directive('itemsDrag', function()
{
    return {
        link: function(scope, element, attrs)
        {
             element.draggable();

             scope.$on('$destroy', function()
             {
                 element.off('**');
             });
        }
    };
});

var app = angular.module('app', []);

app.controller('ItemsController', function($scope, $timeout)

{

$scope.items = [];

// we will simulate loading items from the server with timeout

$timeout(function()

{

$scope.items = [1, 2, 3, 4, 5, 6]

}, 4000);

});

app.directive('itemsDrag', function()

{

return {

link: function(scope, element, attrs)

{

element.draggable();

scope.$on('$destroy', function()

{

element.off('**');

});

}

};

});

We updated the directive’s element. We don’t attach directive to div#items anymore, but to each span. When each element/item is added to DOM, directive is fired and attaches draggable. So there is no timeouts or watching if $scope.items changed.

For me, this is the best way to combine AngularJS with jQuery UI – Draggable. Of course it also works for any plugin.

Verify a method was called with certain argument using Mockito

July 8, 2014 - Last update: May 8, 2018

Unit test are important part of every application. Even though we sometimes hate to write time and think they are just time consuming, they can make our app must more stable and bulletproof. One often scenario that we want to test is to check if some method in a class was called with certain parameters. We have a class Importer to read and import a record from XML.

public class Importer
{
    public void readXML(File file)
    {
         // read record from XML
         // when everything successfully read, call process
         ...
         this.process(record);
    }

    public void process(Record record)
    {
        // save record in database
    }
}

public class Importer

{

public void readXML(File file)

{

// read record from XML

// when everything successfully read, call process

...

this.process(record);

}

public void process(Record record)

{

// save record in database

}

We want to test if XML was correctly read and if our process(record) method was called with record object.

Solution

@RunWith(MockitoJUnitRunner.class)
public class ImporterTest
{
    public void shouldSucceed_When_ImportValidXMLRecord()
    {
        // prepare
        File file = new File("record.xml");
        Importer importer = Mockito.spy(new Importer()); // spy enables us to mock actual classes
        
        // run
        importer.readXML(file);

        // results
        ArgumentCaptor<Record> captur = ArgumentCaptor.for(Record.class);
        verify(importer).process(captur.capture());
        assertThat(captur.getValue().firstName).isEqualTo("Bob"); // XML contains attribute with firstName = Bob
    }
}

@RunWith(MockitoJUnitRunner.class)

public class ImporterTest

{

public void shouldSucceed_When_ImportValidXMLRecord()

{

// prepare

File file = new File("record.xml");

Importer importer = Mockito.spy(new Importer()); // spy enables us to mock actual classes

// run

importer.readXML(file);

// results

ArgumentCaptor<Record> captur = ArgumentCaptor.for(Record.class);

verify(importer).process(captur.capture());

assertThat(captur.getValue().firstName).isEqualTo("Bob"); // XML contains attribute with firstName = Bob

}

ArgumentCaptur enables us to capture actual object that is passed to our process(record) method. We don’t only test if method was called, but also if it was called with specific values.

Extra tip for collections

Again, let’s image we modify our method to be able to read multiple records.

public class ImporterBatch
{
    public void readXML(File file)
    {
         // read all records from XML
         // when everything successfully read, call process
         ...
         this.process(records);
    }

    public void process(Set<Record> records)
    {
        // save all records in database
    }
}

public class ImporterBatch

{

public void readXML(File file)

{

// read all records from XML

// when everything successfully read, call process

...

this.process(records);

}

public void process(Set<Record> records)

{

// save all records in database

}

Solution is similar. To test collection (List, Set, Map…) arguments, we have to use some additional Mockito features.

@RunWith(MockitoJUnitRunner.class)
public class ImporterBatchTest
{
    // it will automatically create a new instance 
    @Captor
    private ArgumentCaptor<Set<Record>> recordsCaptor;

    public void shouldSucceed_When_ImportValidBatchXMLRecords()
    {
        // prepare
        File file = new File("records.xml");
        ImporterBatch importer = Mockito.spy(new ImporterBatch()); 

        // run
        importer.readXML(file);

        // results
        verify(importer).process(recordsCaptor.capture()); 
        assertThat(recordsCaptor.getValue().size).isEqualTo(2"); // our XML contains 2 records
    }
}

@RunWith(MockitoJUnitRunner.class)

public class ImporterBatchTest

{

// it will automatically create a new instance

@Captor

private ArgumentCaptor<Set<Record>> recordsCaptor;

public void shouldSucceed_When_ImportValidBatchXMLRecords()

{

// prepare

File file = new File("records.xml");

ImporterBatch importer = Mockito.spy(new ImporterBatch());

// run

importer.readXML(file);

// results

verify(importer).process(recordsCaptor.capture());

assertThat(recordsCaptor.getValue().size).isEqualTo(2"); // our XML contains 2 records

}

In our test, recordsCaptor.getValue() returns Set<Record>. We can check size or each record. So with Mockito you can test a lot of scenarios and it’s a really nice tool for unit testing.

OpenVPN and TAP-Windows adapter ClientVPN not found fix

July 6, 2014 - Last update: August 28, 2014

The other day I wanted to access company network, so I had to install OpenVPN. The network administrator sent me all the files (certificates and some other configuration files). I installed the certificates and ran the config file. It was actually some executable file and it opened the command prompt. It was doing something, but on the end it yield error.

...
TAP-Windows adapter 'ClientVPN' not found
Exiting due to fatal error
...

...

TAP-Windows adapter 'ClientVPN' not found

Exiting due to fatal error

...

Because I’m not a network or VPN expert, I had not idea what to do?

Solution

I Googled around and didn’t find any useful. I contacted the network administrator with this problem. He replied with a very simple solution. OpenVPN actually adds a new network connection. For some reason, it wasn’t named correctly. All I had to do was to rename that connection to ClientVPN and bang, it started to work.

How the data looks like

The list

Script to dump

Locked table problem

Extra tip – cron

Stage

Scripts start/stop

Extra tip

10 best Internet of Things books

An example

Handling multiple files

Extra tip

The problem

Solution #1

Solution #2

Solution #3

Solution #4

Solution #5 – The working one

Entities

Tasks

Further reading

Part 3 – What more will we learn?

Task

Solution #1

Solution #2

Solution #3

Solution #4 – The working one

Solution

Extra tip for collections

Solution