Python and multiple constructors

One thing I missed when switching from Java to Python was multiple constructors. Python does not support them (directly), but there a may other approaches that work very similar (maybe even better).


Let’s say we are building a client to query remote service (some aggregation service). We want to pass the aggregator.

To make code more fluent and giving it more robustness for integrating into other solutions, we have multiple options to create an aggregator.

The query.aggregator will create a new instance of Aggregator and pass it to the request.

(Possible) solution

Python has a great feature of passing args and kwargs. We can create a constructor

then in the constructor we check and parse args and kwargs. This solution works, but it has many problems:

  1. No indication what is required and what not
    This is most important for autocompletion. When I want to create a new instance of the class Aggregator, I want to know what is required. With current constructor, this is really hard.
  2. Complexity and combinations
    There are many combinations how to initialize a new instance by passing different arguments.

    This is absolutely weird and hard to read.

Better solution

Python has an option to decorate a method with @classmethod. We can define custom methods that work as multiple constructors. For example, we can create a method from_arguments.

We use it as Aggregator.from_arguments(args). The validation of the parameters (if value an int) is done in the constructor.

The from_arguments method just parses the arguments and creates a new instance of the Aggregator. We could add a validation (if list has at least 2 items, if str is in correct format, if dict has all the required elements, …).

Django Rest Framework, NestedSerializer with relation and CRUD

I started a Django project that enables other services to interact with it over the API. Of course, one of the best solutions for building the API using Python is Django Rest Framework. Great project with large community that got supported on Kickstarter. How cool is that?


My project/service offers among other things access and creation of companies and subscriptions. Each company can have multiple subscription – we have a one-to-many relation. I quickly created the models Company and Subscription.

One thing to notice here is that I use UUId’s. The reason lays in the fact that some other services also contain company data. Those services will create companies as they have all the required data (id, name). With this I’m able to resolve sync problems.

For subscription model I will create UUID by using random method.

Django Rest Framework

Django Rest Framework has great docs. If you follow quickstart, you can set up working API in few minutes.

In my case, I created serializers

I had to define additional id for the company serializer. By default id’s are read only (normally id’s are generated at the database level), but in my case I pass the id while creating the company.

I also created

For the last step you have to add the viewsets to API router.

Now when you access /api/companies/ or /api/subscriptions you should get a response (for now probably only empty array).

This part is very simple and there are tons of examples how to do this.


To create a company, I execute a POST JSON request (I’m using Postman) to /api/companies/ with the following payload.

and I get returned

Now I have a company in the database. Let’s create a subscription. Again, I execute POST JSON requst to /api/subscriptions with payload

and I get an error that company name is required. What?

Same request and response

Before I go into explaining what previous error means and how I solved it, I have to first explain what I want.

Other services that talk with my service use different HTTP clients. One of them is also Netflix Feign. With it you can simply create HTTP clients that map the request or response to DTO’s. For example, they have a SubscriptionDTO defined as

and CompanyDTO

So same DTO is used for request and response. I want to pass the same DTO with all the required data when creating the subscription. When response is returned, it populates the existing SubscriptionDTO. This is important, because I want to eliminate the confusion when using multiple DTO’s for same entity (Subscription).

Process of identifying the problem

To return to previous error. When I want to retrieve the subscription, I also want to include company information in the subscription list.

I accomplished this by defining

in my SubscriptionSerializer. If I didn’t use this, then the response would be in format

But I don’t want this, I want the full output. When I defined company field, I didn’t pass any arguments. By default it means that when I execute the POST, it will create subscription and all it’s relations (company). That is why I got an error that company name is required, because it wanted to create a new company (but name is missing). But I don’t want this.

I checked online and asked few people. Most of them suggested that I pass read_only=True argument when I define the company field: company = CompanySerializer(read_only=True). Now when I executed the POST, I got that subscription.company_id database field should not be null. Once you define read_only for a field, it’s data is not passed to method that creates the model (subscription). Why?

There are many discussions around how to solve this.


Some suggest different serializers, other using 2 fields (one for read and other for create/update). But all of them seem hackish and impose a lot of extra code. Author of the DRF Tom Christie suggested that I define CompanySerializer fields (except id) as read only. This kinda solved the problem. If company has additional fields, then I need to overwrite them also which means extra code. At the same time, I want to preserve the /api/companies/ endpoint for creating/updating companies. If I set fields as read only, then I wouldn’t be able to create companies without having additional CompanySerializer.

I tried to overwrite subscription create methods, but without a success. If I defined read_only=True when creating field company, then no company information was passed to validated_data (the data that is later used to created a subscription). If I defined read_only=False, then I was always getting “name is required” error.

I wanted a simple and working solution.


I started to look for a solution that was simple and enabled me to make the requests that I want. Digging through the code I noticed many methods for field creation that I could overwrite. On the end, I had to modify validation method.

I overwrote the validate_empty_values where I check the relation. The idea is that I check posted data. If there is an id (or primary key) of the relation model present, I validate that record exists for that id and return it. If it doesn’t exist or the data is invalid, I raise an error.

There is also a is_relation argument that you have to pass when creating serializer. The is only used when creating serializer as nestedserializer. The updated code is

What this does is that now I can execute POST JSON requests with payload

and get a response

Same DTO for request and response. At the same time, I didn’t modify the /api/companies/ endpoint. Companies get created/updated normally with all the required validation working as it should.

Why you shouldn’t buy an Asus tablet?

Having a tablet these days is pretty common thing. I have been (probably pretty rare) owner of Samsung ATIV tablet for many years. Because it’s too large for normal e-book reading, I decided to purchase a smaller 7 or 8 inch tablet.

After checking multiple reviews and comparing my top candidates, I decided to purchase ASUS MeMO Pad 7 (ME176C). Based on reviews it was a best offer for it’s price (I paid around 120 EUR). Tablet looks great, it has a pretty good CPU, enough ram and space, camera is OK, and has solid build.

After few days of usage I started noticing a pretty annoying problem. It’s updating all the time, adding new apps (that I actually don’t really need) and just taking up space. After I was able to uninstall few of them (TripAdvisor, Omlet chat, and others) I was still left with 20 applications that Asus installed on my tablet. 20! Biggest problem is that I will never use them as they are totally useless.

If I go in store and buy something, don’t I own the thing I just bought? Why then when I paid for the tablet, I’m not able to do with it what I want?

Having an Asus computer for over 5 years, I started to like Asus brand and their products. But what they did with my tablet is just awful. Now when I want to install new apps, I’m getting low space errors. The whole system works slow, it crashes and it have become totally useless. There are many forums topics about the same problem and Asus is not doing anything about it. Instead of them limiting the garbage they force on tables, they add extra support apps. But why?

The hardware part is pretty great. Why ruin it with software? Maybe it’s Androids fault. I have to admit that I’m not a big fan of Android. I see it as a big mess waiting to collapse.

Right now I’m in process of buying another, better tablet (hopefully Win10). But again, the problem can repeat even with other manufacturers. I miss the ability to get an empty tablet and install whatever system I like on it, similar to PCs. I know that won’t be possible anytime soon. I also understand that manufacturers need to create communities, but what Asus is doing right now is causing the opposite.

Would I buy another Asus tablet? NO. Would I recommend Asus tablet to someone else? No.

You want to use Akka? Better learn Scala

I have been using Play Framework from the 1.2 version. Lately, I do most of the work with the 2.2/2.3 version. It supports both Scala and Java (you can literally mix the code files). Because I know Java much better compared to Scala (well I don’t know Scala at all), I do all my coding in Java.

Play Framework comes with Akka that supports actors for processing data. I have about 20 different actors that handle different scenarios. Actors talk with each other, so there’s a lot of different messages. Each message represents specific action.

One thing that really frustrates me is that Akka and Java don’t play nicely. I mean, everything works, but the authors of Akka don’t put a lot of effort into Java. I know, Scala is the great new language and once you know it, why the hell would you still use Java. Problem is that examples and tutorials are mostly written in Scala, coding actors in Scala is much easier and testing is just damn short and sweet.

Examples and Tutorials

When I face a problem and I want to see how others solved it or learn a new thing, I notice that most (about 95%) of Akka examples are in Scala. That means I need to somehow decode examples and convert to Java. This is not always possible. Sometimes certain implementations in Scala cannot be directly converted to Java and different approach has to be used.

Official docs have Scala nad Java version, but Scala version has much more content. There are many very useful blogs, like, but it’s all in Scala. Most of the books for Akka are also in Scala. Similar is with opensource projects on

For somebody coming from Java world, this can be frustrating.


Java is not a language for writing short programs. So part of the fault is on Java, but again there could be better ways to write actors. For example, when I create an actor and want to process 3 different messages.

The code quickly become long (and I have very simple example), because it’s hard to filter messages and forward them to appropriate methods. While in Scala, things are much shorter.

For me it’s important that you can fit the whole logic into screen. So I can see the whole code without scrolling it. It’s easier to put the logic into my brand. This is not possible with Java code (of course, I could split it and add extra files, but again the instead of scrolling I would be clicking).

Sending a message

Scheduling a message

There are tons of other examples. What I’m trying to say it that it’s much easier to write actor logic with Scala compared to Java. Much easier.


When writing unit tests, you should always test small parts of the code. That means that tests should also be small and short. Because Scala is very descriptive language, you can easily define a test. For example, let’s test our ABC actor and see if it replies.

Again, much shorted, much easier to understand and mostly less possibility for mistakes. We can also test specific method of actor.

Or we can mock certain actor methods. First we need to create a trait (similar to Java interfaces), so we can mock methods.

Now when we create a test, we can inject different ABCActorBase.

Scala has ScalaTest, which in my opinion is one of the best testing libraries. It offers very descriptive test results and error reporting. It supports multiple styles of testing, some of them look really awesome.


Even though I didn’t like Scala, I had to realize that it’s actually a great language. Not my favorite, but it’s OK. Since I’m developing an application that runs on JVM, it’s a shame not to use it if possible to solve certain tasks. It has many many more great features that I didn’t showcase, so feel free to check them out.

Final result it that I rewrote all my actors with Scala. They are much shorter and it’s easier to understand what they are doing. At the same time, great part of tests are now in Scala and I have a feeling that code is much stable and more error prone. It’s been few weeks since I have new actors in production and there were zero problems. Happy coding.

How to install KairosDB time series database?

In my previous post, I described why I switched from OpenTSDB (other time series database) to KairosDB. In this post, I will show how to install and run KairosDB.


To run KairosDB we actually just need KairosDB (if we ignore Ubuntu/Debian/something similar and Java). How is that possible? Well, KairosDB supports two datastores: H2 and Cassandra. H2 is actually an in memory H2 database. It’s easy to setup and cleanup, and it’s mostly used for development. Don’t use it in the production; it will work, but it will be very very slow.

For our tutorial we will use Cassandra as datastore. To install Cassandra, you can follow the official tutorial at We will install it via apt-get.

You will want to replace 21x by the series you want to use: 20x for the 2.0.x series, 12x for the 1.2.x series, etc… You will not automatically get major version updates unless you change the series, but that is a feature.

We also need to add public keys to be able to access debian packages.

Now we are ready to install it.

This will install the Cassandra database. Few things you must know is that the configuration files are located in /etc/cassandra, and the start-up options (heap size, etc) can be configured in /etc/default/cassandra. Now that Cassandra is install, run it.

Another requirement is that you have Oracle Java JDK instead of OpenJDK. You must install version 7 or 8 (8 is recommended, I’m using 7). Again, we will install it with apt-get.


KairosDB uses Thrift for communicating with Cassandra. When I installed Cassandra, it wasn’t enabled by default. So I had to enable it first. There are many ways and if you hate to fiddle with config files, you can install OpsCenter. It’s a really great tool for monitoring your cluster. It has a simple interface where you can access your nodes and change their configuration to enable Thrift. To change it the in the config file, update start_rpc setting to true in /etc/cassandra/cassandra.yaml.

Installing KairosDB

We can again install KairosDB in few ways.

a) Building from the source

a) Clone the git repository
b) Make sure that JAVA_HOME is set to your java install.
c) Compile the code

b) Installing via .deb package (recommended)

Current stable version is 0.9.4 1.1.1. Make sure you download the latest version at

Setting Cassandra as a datastore

As mentioned before, KairosDB by default uses H2 database for datastore. We need to change it to Cassandra.

a) If you are running from source, then copy to KairosDB root folder from src/main/resources/ folder to change it.
b) If you installed it, then change the file /opt/kairosdb/conf/

In the file comment the line where H2 is set as datastore and uncomment Cassandra module. So the file should look like this.

You can also change some other setting to tune it, but for now just save it and you are ready to go.

Test if everything works

Make sure your Cassandra service is running. Now lets run KairosDB.

a) Running from source

b) Or if installed

Go to http://localhost:8080 to check if everything works OK. If you can see KairosDB dashboard, then congratulations, you can now use KairosDB.

What’s next

In the next tutorial we will see how to save, query and delete datapoints with web interface, HTTP and Telnet API.

Passing collections between Akka actors

Akka actors are great when we are looking for a scalable real-time transaction processing (yes, this is the actual definition using some big words). Actually, it’s really great for some background processing because you can create many instances without actually worrying about concurrency and parallelism.

The code

We have a simple application for processing the uploaded file. We accept the file, parse it (simple txt file), calculate the values and save them in some database. We could have everything in one actor, but it’s much better to split it in multiple actors and create a pipeline. Each actor does exactly one thing. We have much more clean code and at the same time, testing it is much easier.

We have (for this demonstration) 2 actors. One reads the files to a List and sends it to another actor.

The second actor gets a the List of numbers and calculates the sum of them.

If we used this code, we would quickly discover problems. When I tested it with VisualVM for memory leaks, I quickly discovered a memory leak with List numbers. How to solve it?

Immutable collections

When passing object between actors we need to follow few guidelines. If we brake them, we can face memory leaks and consequentially app crashes. One of the guidelines is to use Immutable collections. If we pass them between actors, they have to be Immutable. What are the advantages of Immutable objects?

  1. Thread-safe – so they can be used by many threads with no risk of race conditions.
  2. Doesn’t need to support mutation, and can make time and space savings with that assumption.
  3. All immutable collection implementations are more memory-efficient than their mutable siblings (analysis)
  4. Can be used as a constant, with the expectation that it will remain fixed.

There are many implementations of Immutable collections and one of the best ones is in Guava.

Improved code

We have to use ImmutableList to create a list of numbers for passing between actors.

Rerunning VisualVM confirmed that memory leak was resolved. Great.

Why I switched from OpenTSDB to KairosDB?

In my previous post, I described how to correctly install and use OpenTSDB. After some time, I decided to move on to other solution.

The story

Before everything, we need to know one thing. Because of IoT, the demand for storing sensor data has increased dramatically. Many new projects emerged, some are good, some are bad. They are different in technologies used, how fast they are and what kind of features they support.

You can read the full list of all IoT timeseries databases that can be used for storing data of you Internet of Things projects or startup.

Problems of OpenTSDB

OpenTSDB is great, don’t get me wrong. But when you try to use is with some more complex projects and customer demands, you can quickly hit the wall. It’s mostly because it involves a lot of moving parts to make it work (Hadoop, HBase, ZooKeeper). If one of the parts fail, the whole thing fails. Sure, you can replicate each thing and make it more robust, but you will also spend more money. When you are starting, it’s a over optimization and waste of money (that you don’t have).

Aggregation of the data is another problem. It does support basic function like min, max, avg etc. I spent days investigating the problem why avg aggregation is not working correctly when I filter by multiple tags. It just didn’t want to work and I couldn’t find anything in the docs. I asked on Google group and after some time I got a reply that I must use another aggregation function and that even that doesn’t work 100% as I want it. Another problem is when I want to get just one value – for example avg of all values from X to now. Not possible!

No clients to talk with OpenTSDB is another problem for me. Sure, storing the data with socket API is super simple and can be easily integrated in every language. The HTTP API is another story. Sure, again it shouldn’t be a problem to implement my own client, but why waste time with this?

Development of the OpenTSDB is slow and it takes ages for new features to be integrated. One of them (one of the most important for me) is an ability to support time zones. It’s used when downsampling data to one day (or even more) so data is correctly grouped. There was some work, but until today it still wasn’t implemented. Too bad.

On the bright side, OpenTSDB is super fast. I was able to store and load data as super fast rate – loading 3 million records in few seconds is for me super fast. Try it with relational database and you will be quickly disappointed.

KairosDB to the rescue

I remember when I was doing a research, I noticed KairosDB but I didn’t spend too much time testing it. It just wasn’t appealing and I didn’t know how it actually works. Big mistake.

KairosDB uses Cassandra to store data (compared to HBase used with OpenTSDB) and it’s actually a rewritten and upgraded version of OpenTSDB. It has evolved into great project. It has many more features: many more (and fully working) aggregation methods, option to easily delete metric or datapoint, easy extensibility with plugins etc. It has great clients and has much more active community. I remember when I asked a question on OpenTSDB Google group and waited weeks for an answer (I’m not forcing anyone to provide the support, because after all, it’s an opensource project), while on KairosDB Google group I got it within a day.

Why is this important you might ask? Well, when you are catching deadlines and something goes wrong, responsive community is very important. Sometimes this kind of things can be a difference between success and a failure.

What now?

I wrote an tutorial how to start with KairosDB. You can also you visit and check out the documentation. Feel free to play with it, test it and hopefully also use it in production. I

The complete list of all time series databases for your IoT project

While searching the perfect database for my project, I spent hours and hours searching the internet and making a list of all candidates. Quickly I realized that the list is pretty long and the projects differ in many ways, but all of the have the same goal: store your time series data.

How the data looks like

The structure of time series data always consists of at least 2 parts (we call it a datapoint): time and value. At certain time we have a certain value. Based on architecture of different time series databases, we also annotate the datapoint with additional information. The goal of the information is to better differentiate the data and filter it easier. One of the examples is adding source=device.1 tag to the datapoint. Later we can easily fetch all the data that belongs to device.1.

So in some context, time series databases are similar to key-value databases where key is combination of time and tags. Only difference is, that we have a better ability to filter the data and all the nice aggregations functions (min, max, avg, dev, ..) are already built in (well in most cases). As with key-value databases everything starts with a key, in time series databases everything starts with time.

Below is the list of all the time series databases I found that follow the previous mentioned principle. If you find a new unlisted database or you create a new one and want to share, send me an email at erol@(enter my domain name).

The list

1. OpenTSDB
Pricing: Free
Technologies: Java, HBase

Store and serve massive amounts of time series data without losing granularity.

2. KairosDB
Pricing: Free
Technologies: Java, Cassandra
Clients: Java, Python

KairosDB is a fast distributed scalable time series database written on top of Cassandra.

3. InfluxDB
Pricing: Free
Technologies: Go, BoltDB
Clients: JavaScript, Ruby, Python, Node.js, PHP, Java, Clojure, Common Lisp, Go, Scala, R, Erlang, Perl, Haskell, .NET

InfluxDB is a time series, metrics, and analytics database. It’s written in Go and has no external dependencies. That means once you install it there’s nothing else to manage (like Redis, ZooKeeper, HBase, or whatever).

4. TempoIQ
Pricing: Subscription
Clients: .NET, Java, Node.js, Python, Ruby

Fast, scalable monitoring & analysis of sensor data in your application.

5. Graphite

Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite’s processing backend, carbon, which stores the data in Graphite’s specialized database. The data can then be visualized through graphite’s web interfaces.

6. Druid
Pricing: Free
Technologies: Java
Clients: Ruby, Python, R, Node.js

An open-source, real-time data store designed to power interactive applications at scale.

7. kdb+
Technologies: K
Clients: Java, .NET, Python, Excel

The high-performance database that sets the standard for time-series analytics.

8. RRDtool

RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.

9. seriesly
Technologies: Go

seriesly is a database for storing and querying time series data. Unlike databases like RRDtool, it’s schemaless so you can just lob data into it and start hacking. However, it also doesn’t use a finite amount of space.

10. Cube
Website: (development seems stopped, most active fork is
Pricing: Free
Technologies: Node.js, MongoDB

Cube is a system for collecting timestamped events and deriving metrics. By collecting events rather than metrics, Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB.

11. IBM Informix

Informix, with its TimeSeries feature, helps organizations solve the Big Data challenge of sensor data by providing unprecedented performance and scalability to applications that leverage time series data.

12. Akumuli
Pricing: Free
Technologies: C++

Distributed time-series database

13. BlueFlood
Technologies: Java

Blueflood is a multi-tenant distributed metric processing system created by engineers at Rackspace. It is used in production by the Cloud Monitoring team to process metrics generated by their monitoring systems. Blueflood is capable of ingesting, rolling up and serving metrics at a massive scale.

14. DalmatinerDB
Technologies: Erlang, ZFS, Riak Core

DalmatinerDB is a no fluff purpose built metric database. Not a layer put on top of a general purpose database or datastores.

15. Rhombus
Pricing: Free
Technologies: Java, Cassandra

A time-series object store for Cassandra that handles all the complexity of building wide row indexes.

16. Prometheus
Pricing: Free
Technologies: Go
Clients: Go, Java, Ruby

An open-source service monitoring system and time series database.

17. Axibase Time-Series Database
Pricing: Free & License version
Technologies: Java, HBase, Hadoop
Clients: Java, R Language, PHP, Python, Ruby, JavaScript

Axibase Time-Series Database (ATSD) is a next-generation statistics database. ATSD is for companies that need to extract value from large amounts of time-series data which exists in their IT and operational infrastructure.

18. Newts
Pricing: Free
Technologies: Java, Cassandra

A time-series data store based on Apache Cassandra.

19. InfiniFlux
Pricing: Free & License version
Technologies: C
Clients: Java, Python, JavaScript, R, PHP​

INFINIFLUX is the World’s Fastest Time Series DBMS for IoT and BigData.

20. Heroic
Pricing: Free
Technologies: Java, Cassandra, Elasticsearch

The Heroic Time Series Database

21. Riak TS
Pricing: Free & License version
Technologies: Erlang
Clients: Java, Ruby, Python, Erlang, Node.js

Riak TS is the only enterprise-grade NoSQL database optimized for IoT and Time Series data

22. The Warp 10 Platform
Pricing: Free
Technologies: Java, WarpScript

Warp 10 is an Open Source Geo Time Series® Platform designed to handle data coming from sensors, monitoring systems and the Internet of Things.

23. KsanaDB
Pricing: Free
Technologies: Go, Redis

KsanaDB is a timeseries database, base on redis and go.

I will frequently update the list and add new time series databases as they come along.

pg_dump: permission denied for relation mytable – LOCK TABLE public.mytable IN ACCESS SHARE MODE

One of the good practices is to create backups of your database at regular intervals. If you are using PostgreSQL database, you can use built-in tool called pg_dump. With pg_dump we can export the database structure and data. In case we want to dump all databases, then we can use pg_dumpall.

When I was creating a simple bash script, I was getting a very strange error: pg_dump: permission denied for relation mytable – LOCK TABLE public.mytable IN ACCESS SHARE MODE. Googling around I got few tips how to solve the problem, but no actual solution.

Script to dump

To make our life easier, we use a script to make the whole process easier. It’s also convenient to have a script which we can later call from other processes, from build tools (backup because upgrading) or with cron.

When we run this, we get previous error. Big problem.

Locked table problem

Problem is with permissions. There are multiple permission layers. First is if we actually have an access to database. Second layer is if we actually have an access to table; in our case table mytable. To check it, we need to see the structure and permissions of the table.

Above commands with output all the tables in the database. If we check the columns, we will notice that there is a owner column. In our case it’s important that table owner and export user is same, otherwise we get the permission problem.

To change the permission of the table, we need to run the command

The command will alter the table ownership to our export user. Be sure you change owner for every table in the selected database.

Extra tip – cron

Of course we don’t have time and we especially don’t want to waste time for tasks that can be automated. One of them is actually running our every week, month or at some interval you desire. To perform backups every week, we can use cron.

To add a cron job, just run

It will show simple editor where you write your tasks/jobs. In our case, we will run backup every Sunday at the morning (00:00).

To make our script work with cron, we need to add an extra thing. Because if we run the code, we are asked for the password. Cron cannot enter the password, so it will fail. Based on suggestions, we should create ~/.pgpass file and add a line in it.

Now when the cron will run the script, everything will work.

Scripts to start and stop Play Framework application

Play Framework (I’m talking about 2.X version) has multiple deployment ways. If you just check the docs, you will notice it has few pages of instructions just how to deploy your Play Framework application. For me, using stage seems to be the best and most stable way.


When deploying an application, I always run clean and stage commands. The first one will remove compiled and cached files. Second one will compile the application and create an executable. Everything will be located in ${application-home}/target/universal/stage/. There you have a bin folder and inside simple script to run the app. We will create start/stop scripts to make our work a little bit easier.

Scripts start/stop

To run the application, we use nohup. nohup enables us that the application will run even when we close the terminal or logout from our development machine. As everyone says, it’s not perfect solution but it works. We run the command in stage folder and add additional parameters.

  1. -J-Server enables us to pass additional JVM related settings. In our case, we define Xms and Xmx for better memory management.
  2. We have application.conf for development and application-prod.conf for the production with different settings (database, secret key, API logins, etc).
  3. With -Dhttp.port we define the port of our application. We use apache to map the 80 port to 9000. It’s much safer and easier like this, because later we can have load balancer in the middle to divide the load to multiple application instances.

When running nohup, it will create a nohup.out file in which it logs everything (basically what application returns). Don’t confuse it with application logs. Application will still log everything based on logger.xml configuration independently of nohup. To prevent nohup.out file, we have to redirect everything to /dev/null and basically just ignore it.

On the end, we output the pid into RUNNING_PID file. Be careful. Play Framework automatically creates an additional file RUNNING_PID in stage folder. We add this as extra information and the file is removed after stopping our application.

When we want to stop our application, we need to get the pid. We read it from stage folder RUNNING_PID file and pass it to kill command. For safety reasons we wait 5 seconds just to be sure that application is stopped. We could have a running job which needs few more seconds to complete or save the state.

Extra tip

We can also pass additional parameters to our stage command. One of them is javaagent. If we are using some remote monitoring solution like newrelic, we can include jar to send application data.

To do so, we need to pass -J-javaagent:/path/to/newrelic.jar with all the other parameters. Be sure that you include the correct path, because otherwise it will fail to start the application.