You want to use Akka? Better learn Scala

I have been using Play Framework from the 1.2 version. Lately, I do most of the work with the 2.2/2.3 version. It supports both Scala and Java (you can literally mix the code files). Because I know Java much better compared to Scala (well I don’t know Scala at all), I do all my coding in Java.

Play Framework comes with Akka that supports actors for processing data. I have about 20 different actors that handle different scenarios. Actors talk with each other, so there’s a lot of different messages. Each message represents specific action.

One thing that really frustrates me is that Akka and Java don’t play nicely. I mean, everything works, but the authors of Akka don’t put a lot of effort into Java. I know, Scala is the great new language and once you know it, why the hell would you still use Java. Problem is that examples and tutorials are mostly written in Scala, coding actors in Scala is much easier and testing is just damn short and sweet.

Examples and Tutorials

When I face a problem and I want to see how others solved it or learn a new thing, I notice that most (about 95%) of Akka examples are in Scala. That means I need to somehow decode examples and convert to Java. This is not always possible. Sometimes certain implementations in Scala cannot be directly converted to Java and different approach has to be used.

Official docs have Scala nad Java version, but Scala version has much more content. There are many very useful blogs, like http://letitcrash.com/, but it’s all in Scala. Most of the books for Akka are also in Scala. Similar is with opensource projects on github.com.

For somebody coming from Java world, this can be frustrating.

Code

Java is not a language for writing short programs. So part of the fault is on Java, but again there could be better ways to write actors. For example, when I create an actor and want to process 3 different messages.

The code quickly become long (and I have very simple example), because it’s hard to filter messages and forward them to appropriate methods. While in Scala, things are much shorter.

For me it’s important that you can fit the whole logic into screen. So I can see the whole code without scrolling it. It’s easier to put the logic into my brand. This is not possible with Java code (of course, I could split it and add extra files, but again the instead of scrolling I would be clicking).

Sending a message

Scheduling a message

There are tons of other examples. What I’m trying to say it that it’s much easier to write actor logic with Scala compared to Java. Much easier.

Testing

When writing unit tests, you should always test small parts of the code. That means that tests should also be small and short. Because Scala is very descriptive language, you can easily define a test. For example, let’s test our ABC actor and see if it replies.

Again, much shorted, much easier to understand and mostly less possibility for mistakes. We can also test specific method of actor.

Or we can mock certain actor methods. First we need to create a trait (similar to Java interfaces), so we can mock methods.

Now when we create a test, we can inject different ABCActorBase.

Scala has ScalaTest, which in my opinion is one of the best testing libraries. It offers very descriptive test results and error reporting. It supports multiple styles of testing, some of them look really awesome.

Conclusion

Even though I didn’t like Scala, I had to realize that it’s actually a great language. Not my favorite, but it’s OK. Since I’m developing an application that runs on JVM, it’s a shame not to use it if possible to solve certain tasks. It has many many more great features that I didn’t showcase, so feel free to check them out.

Final result it that I rewrote all my actors with Scala. They are much shorter and it’s easier to understand what they are doing. At the same time, great part of tests are now in Scala and I have a feeling that code is much stable and more error prone. It’s been few weeks since I have new actors in production and there were zero problems. Happy coding.

How to install KairosDB time series database?

In my previous post, I described why I switched from OpenTSDB (other time series database) to KairosDB. In this post, I will show how to install and run KairosDB.

Requirements

To run KairosDB we actually just need KairosDB (if we ignore Ubuntu/Debian/something similar and Java). How is that possible? Well, KairosDB supports two datastores: H2 and Cassandra. H2 is actually an in memory H2 database. It’s easy to setup and cleanup, and it’s mostly used for development. Don’t use it in the production; it will work, but it will be very very slow.

For our tutorial we will use Cassandra as datastore. To install Cassandra, you can follow the official tutorial at http://wiki.apache.org/cassandra/GettingStarted. We will install it via apt-get.

You will want to replace 21x by the series you want to use: 20x for the 2.0.x series, 12x for the 1.2.x series, etc… You will not automatically get major version updates unless you change the series, but that is a feature.

We also need to add public keys to be able to access debian packages.

Now we are ready to install it.

This will install the Cassandra database. Few things you must know is that the configuration files are located in /etc/cassandra, and the start-up options (heap size, etc) can be configured in /etc/default/cassandra. Now that Cassandra is install, run it.

Another requirement is that you have Oracle Java JDK instead of OpenJDK. You must install version 7 or 8 (8 is recommended, I’m using 7). Again, we will install it with apt-get.

Source: http://stackoverflow.com/a/16263651/73010

KairosDB uses Thrift for communicating with Cassandra. When I installed Cassandra, it wasn’t enabled by default. So I had to enable it first. There are many ways and if you hate to fiddle with config files, you can install OpsCenter. It’s a really great tool for monitoring your cluster. It has a simple interface where you can access your nodes and change their configuration to enable Thrift. To change it the in the config file, update start_rpc setting to true in /etc/cassandra/cassandra.yaml.

Installing KairosDB

We can again install KairosDB in few ways.

a) Building from the source

a) Clone the git repository https://github.com/kairosdb/kairosdb.git
b) Make sure that JAVA_HOME is set to your java install.
c) Compile the code

b) Installing via .deb package (recommended)

Current stable version is 0.9.4 1.1.1. Make sure you download the latest version at https://github.com/kairosdb/kairosdb/releases.

Setting Cassandra as a datastore

As mentioned before, KairosDB by default uses H2 database for datastore. We need to change it to Cassandra.

a) If you are running from source, then copy kairosdb.properties to KairosDB root folder from src/main/resources/ folder to change it.
b) If you installed it, then change the file /opt/kairosdb/conf/kairosdb.properties.

In the file comment the line where H2 is set as datastore and uncomment Cassandra module. So the file should look like this.

You can also change some other setting to tune it, but for now just save it and you are ready to go.

Test if everything works

Make sure your Cassandra service is running. Now lets run KairosDB.

a) Running from source

b) Or if installed

Go to http://localhost:8080 to check if everything works OK. If you can see KairosDB dashboard, then congratulations, you can now use KairosDB.

What’s next

In the next tutorial we will see how to save, query and delete datapoints with web interface, HTTP and Telnet API.

Passing collections between Akka actors

Akka actors are great when we are looking for a scalable real-time transaction processing (yes, this is the actual definition using some big words). Actually, it’s really great for some background processing because you can create many instances without actually worrying about concurrency and parallelism.

The code

We have a simple application for processing the uploaded file. We accept the file, parse it (simple txt file), calculate the values and save them in some database. We could have everything in one actor, but it’s much better to split it in multiple actors and create a pipeline. Each actor does exactly one thing. We have much more clean code and at the same time, testing it is much easier.

We have (for this demonstration) 2 actors. One reads the files to a List and sends it to another actor.

The second actor gets a the List of numbers and calculates the sum of them.

If we used this code, we would quickly discover problems. When I tested it with VisualVM for memory leaks, I quickly discovered a memory leak with List numbers. How to solve it?

Immutable collections

When passing object between actors we need to follow few guidelines. If we brake them, we can face memory leaks and consequentially app crashes. One of the guidelines is to use Immutable collections. If we pass them between actors, they have to be Immutable. What are the advantages of Immutable objects?

  1. Thread-safe – so they can be used by many threads with no risk of race conditions.
  2. Doesn’t need to support mutation, and can make time and space savings with that assumption.
  3. All immutable collection implementations are more memory-efficient than their mutable siblings (analysis)
  4. Can be used as a constant, with the expectation that it will remain fixed.

There are many implementations of Immutable collections and one of the best ones is in Guava.

Improved code

We have to use ImmutableList to create a list of numbers for passing between actors.

Rerunning VisualVM confirmed that memory leak was resolved. Great.

Scripts to start and stop Play Framework application

Play Framework (I’m talking about 2.X version) has multiple deployment ways. If you just check the docs, you will notice it has few pages of instructions just how to deploy your Play Framework application. For me, using stage seems to be the best and most stable way.

Stage

When deploying an application, I always run clean and stage commands. The first one will remove compiled and cached files. Second one will compile the application and create an executable. Everything will be located in ${application-home}/target/universal/stage/. There you have a bin folder and inside simple script to run the app. We will create start/stop scripts to make our work a little bit easier.

Scripts start/stop

To run the application, we use nohup. nohup enables us that the application will run even when we close the terminal or logout from our development machine. As everyone says, it’s not perfect solution but it works. We run the command in stage folder and add additional parameters.

  1. -J-Server enables us to pass additional JVM related settings. In our case, we define Xms and Xmx for better memory management.
  2. We have application.conf for development and application-prod.conf for the production with different settings (database, secret key, API logins, etc).
  3. With -Dhttp.port we define the port of our application. We use apache to map the 80 port to 9000. It’s much safer and easier like this, because later we can have load balancer in the middle to divide the load to multiple application instances.

When running nohup, it will create a nohup.out file in which it logs everything (basically what application returns). Don’t confuse it with application logs. Application will still log everything based on logger.xml configuration independently of nohup. To prevent nohup.out file, we have to redirect everything to /dev/null and basically just ignore it.

On the end, we output the pid into RUNNING_PID file. Be careful. Play Framework automatically creates an additional file RUNNING_PID in stage folder. We add this as extra information and the file is removed after stopping our application.

When we want to stop our application, we need to get the pid. We read it from stage folder RUNNING_PID file and pass it to kill command. For safety reasons we wait 5 seconds just to be sure that application is stopped. We could have a running job which needs few more seconds to complete or save the state.

Extra tip

We can also pass additional parameters to our stage command. One of them is javaagent. If we are using some remote monitoring solution like newrelic, we can include jar to send application data.

To do so, we need to pass -J-javaagent:/path/to/newrelic.jar with all the other parameters. Be sure that you include the correct path, because otherwise it will fail to start the application.

Handle file uploads in Play Framework 2.x [Java]

Most applications have the ability to upload something. Handling uploaded files should not be hard. We need to check if user uploaded the file, if it’s the right type and store it. Mather of fact, this is really easy with Play Framework.

An example

We have a form to upload a file.

This form will take one file and post it to /upload path. To be able to upload a file, we need to define enctype to multipart/form-data. This just means how the POST will be constructed and how the file will be send.

Next this is to create a controller and a method. We will only enable uploading of PDF files.

Very simple, right? I highly recommend you move this code to somewhere else (for example to some service). Good practice is to keep controllers slim.

First we need to check type of the request and check if it’s multipart/form-data. If body is null, then something is wrong. Same thing is for a file. If there is no file present, we need to report an error. Beware, it’s easy to modify the content type. Checking if the file is really PDF can sometimes be more difficult. Best ways are to use some additional libraries – one of the is Apache Tika.

Handling multiple files

We can also handle multiple file at once. All we have to do is loop through all posted files.

Extra tip

When we upload the file, it will be stored into /tmp folder (of course if we use Linux server). Then we just need to move or copy the file to the right folder.

Recommended way is to use highly tested library Apache Commons IO and methods FileUtils.copyFile(source, destination) or FileUtils.moveFile(source, destination).

Why I think Spring Data repositories are awesome – Part 2

In the first part, we covered some very basic things we can do with Spring Data repositories. In this part, we will learn how to make more complex queries. By that I mean how to find data by entity field or make a count. You will be amazed how easy it is with Spring Data.

Entities

We will use Post entity from our previous part, update it and add an entity called User.

As you can see, we have defined relation between our entities. Every user can have multiple posts and each post has only 1 user.

Tasks

Let’s imagine we have a situation where we want to create a simple blogging system. We have to be able to:

1. find all active posts
2. find a post by an url
3. find all posts by a user
4. count all active posts

1. Find all active posts

We will use the PostRepository we have defined in our first part. By simply adding a method to the repository, it will generate the right code and map everything to a SQL. Because CrudRepository already has few basic methods prebuilt, we don’t need to add a method to find all posts. Instead, we can use findAll method.

But to find all active posts we have to define our custom method. Actually, it’s very simple.

This is it. This is the whole magic. One short line of code. But how it actually works? Spring Data will build the query based on method name and method return type. It will split the method name. In our case into find, by, isactive, true. First part defines to make a select query, the second indicates we want to filter, the third field name and fourth the value of the field. But be careful, defining field value after specifying the field name it will only work for booleans. For other field types, you need to pass the value as method argument. One great this is also that we can add multiple fields.

2. Find post by an url

Continuing the though from the previous section, we can have method build from different fields. For example, let’s load a post by it’s url. We have to update our repository.

Again, if we look at the method name, we will see that we are finding a post by it’s url. Because the url is a String, we have to pass the value as a method argument. Because return type is a Post, it will return one post. In case if the query returns multiple rows, then the exception will be thrown. Be careful that when you expect only 1 record, that you query by some unique field.

But actually, our blog system has to return a post by url and be active. We could have a code where we load the post by url and then check if isActive or not. Instead, we can do this in one query.

We are now querying the database by 2 field: url and isActive. When we use different fields in a method name, all of them are joined by AND. We cannot use OR. For that, we have to use some other approach (we will explain it in another tutorial).

3. Find all posts by a user

Every user has a username. Our task is to find all posts by a user or more specific, find all posts by a username. Writing a method name is actually the same, we just need to include the relation name. Again, we update our PostRepository.

Method name has to include the relation name. When we defined it in the entity, we have to use the same name in the method. If we change it in the entity, we also have to change the method name. It may look complicated and no really robust to changes, but there is no other way for Spring Data to know how to correctly build the query. Of course, as we mentioned it few times already, in the next part we will learn how we can actually use custom queries to help out Spring Data with building the native SQL query.

Once we define the relation in the method name, everything else is the same. We again filter by field name, we can or use multiple fields. But remember, for each relation field, we have to prepend the name with the relation name.

4. Count all active posts

For the last task, we have to count all active posts. CrudRepository already has a method called count(), but it will count all posts. We could use findByIsActiveTrue() method to find all posts and get a populated Set. All we have to do then is to call .size() and there, we have the count of all active posts.

Don’t do that. Sure, it works and it might even work in the production for a small number of posts, but in case of a larger dataset, it’s not a good practice. We have to fetch all the records, populate Set and then call .size() to just get one number. It’s a too big overhead.

Instead, we will use count which maps to SQL count. It’s much much faster and consumes much less resources. Before, we were finding records, so we prepended every method name with find. If we want to count, we have to do what? You are right, prepend every method name with count. Let’s for the last time update our PostRepository.

There are few differences compared to other method names. First is return type. It has to be a Long, so it will bind a row count to it (Integer can be too small). As we mentioned before, we start method with count and then define the field filters. It’s that simple.

Further reading

You can read all about how to correctly build the method name using the official docs. It has everything explained really nicely and it also demonstrates some additional keywords. I strongly recommend it.

Part 3 – What more will we learn?

In the next part, we will how we can make even more complex queries by using @Query annotation. @Query annotation enables us to write HQL, which is very similar to SQL but has a compile time checking. Another thing we will learn is how to extend Repository and use PersistanceManager to build super complex queries. We will create custom methods and insert them into repositories. It’s a really cool and advanced feature, so stay tuned.

Why I think Spring Data repositories are awesome – Part 1

I have been using Play Framework from version 1.2. Currently I’m using 2.2, which comes with a simple Ebean ORM but after reading few comments I realized it won’t be good enough for more complex projects. There is nothing worse than to realize in the middle of the project that some bug is causing your app not to work.

I looked around and noticed Spring Data. At first I didn’t put much effort in it but after checking the docs I realized it’s awesome solution. Only problem is that I doesn’t work out of the box with the Play Framework. There are few example projects on github, for me the only working was https://github.com/jamesward/play-java-spring.

I strongly advise you to check the code, because it’s show how to combine Play Framework with Spring Data. At the same time it shows how Dependency Injection works, how you define repositories and how to use them in controllers.

How it works

The basic logic is that Spring Data offers basic repositories for basic operations like saving, finding, deleting etc. Let’s imagine we have an entity Post.

I’m not going into details how to create an entity. There are many great tutorials around. For the entity we create a repository

1. We extended CrudRepository. CRUD stands for Create, Read, Update and Delete. It’s means that CrudRepository probably has some methods for creating, reading, updating and deleting posts. If we check the source of CrudRepository, it enables us to that.

2. We added annotation @Repository so Spring knows to find it correctly and inject it.

3. We extended CrudRepository where Post is out entity and Long is type of the primary key. Based on there 2 attributes, Spring knows how to correctly build queries.

Let’s take it for a spin

We put the example in 1 method for the sake of simplicity. Point is that we can simply retrieve, insert, update and delete Post entity. Spring handles building the queries, converting to objects and all other things. Again, check the example on github I mentined before so you know how to correctly define Controller, what @AutoWired does and why method index() is not static any more.

I have been using Spring Data for some time and there was not a single situation I couldn’t solve. I think the guys at Spring did a really good job. The simplicity on one side and ability to solve even most complex scenarios makes it really awesome.

More to come

In the next part we are going to check how to make a more complex queries by just defining methods name. We will also check how to include paging, sorting and how to run really really really complex queries.