Running Angular 6 and up tests (unit and e2e) on CI
Angular is always updating and bringing new features. Angular CLI has also been updated (currently at version 6.x). With latest version it also updated few parameters when running tests.
Angular is always updating and bringing new features. Angular CLI has also been updated (currently at version 6.x). With latest version it also updated few parameters when running tests.
Vue.js is great framework as it had very small learning curve, it’s easy to setup and it offers really nice transition from AngularJS. With it’s component approach you can split the application into small parts and reuse certain components around the application.
Problem comes when your application grows and you need to structure it better. This is the structure I’m using for my large scale Vue.js applications.
Flask is an amazing framework for building web application in Python. It’s main features are easy to use, very small learning curve, it has everything included that you need (or it can be easily extended) and it just gets out of your way of development.
It offers few simple ways to return data in format that we want: HTML, JSON, XML and others.
One thing I missed when switching from Java to Python was multiple constructors. Python does not support them (directly), but there a may other approaches that work very similar (maybe even better).
Let’s say we are building a client to query remote service (some aggregation service). We want to pass the aggregator.
1 2 3 4 5 |
class Aggregator(object): def __init__(self, value, unit, time_zone=None): self.value = value self.unit = unit self.time_zone = time_zone |
To make code more fluent and giving it more robustness for integrating into other solutions, we have multiple options to create an aggregator.
1 2 3 4 |
query.aggregator(Aggregator(5, 'min', time_zone='Europe/Ljubljana')) query.aggregator({'value': 5, 'unit': 'min', 'time_zone': 'Europe/Ljubljana'}) query.aggregator((5, 'min')) query.aggregator('5min') |
The query.aggregator will create a new instance of Aggregator and pass it to the request.
Python has a great feature of passing args and kwargs. We can create a constructor
1 2 |
def __init__(self, value=None, unit=None, time_zone=None, *args, **kwargs): ... |
then in the constructor we check and parse args and kwargs. This solution works, but it has many problems:
1 |
Aggregator(5, args=('min', ), kwargs={'time_zone': 'Europe/Ljubljana'}) |
This is absolutely weird and hard to read.
Python has an option to decorate a method with @classmethod. We can define custom methods that work as multiple constructors. For example, we can create a method from_arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
class Aggregator(object): def __init__(self, value, unit, time_zone=None): # TODO validate self.value = value self.unit = unit self.time_zone = time_zone @classmethod def from_arguments(cls, args): if isinstance(args, str): # 5min value, unit = parse_time(args) return cls(value, unit) elif isinstance(args, (list, tuple)): # (5, 'min') value, unit = args return cls(value, unit) elif isinstance(args, dict): # {'value': 5, 'unit': 'min'} return cls(**args) elif isinstance(args, Aggregator): return args else: raise ValueError("Invalid args, should be str, list, tuple, dict or instance of Aggregator.") |
We use it as Aggregator.from_arguments(args). The validation of the parameters (if value an int) is done in the constructor.
The from_arguments method just parses the arguments and creates a new instance of the Aggregator. We could add a validation (if list has at least 2 items, if str is in correct format, if dict has all the required elements, …).
I started a Django project that enables other services to interact with it over the API. Of course, one of the best solutions for building the API using Python is Django Rest Framework. Great project with large community that got supported on Kickstarter. How cool is that?
My project/service offers among other things access and creation of companies and subscriptions. Each company can have multiple subscription – we have a one-to-many relation. I quickly created the models Company and Subscription.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from django.db import models class Company(models.Model): id = models.UUIDField(primary_key=True, editable=True) name = models.CharField(max_length=255) created_at = models.DateTimeField(auto_now_add=True) def __unicode__(self): return self.name class Subscription(models.Model): id = models.UUIDField(primary_key=True, editable=False, default=uuid.uuid4) company = models.ForeignKey(Company) price = models.DecimalField(decimal_places=2, max_digits=10) |
One thing to notice here is that I use UUId’s. The reason lays in the fact that some other services also contain company data. Those services will create companies as they have all the required data (id, name). With this I’m able to resolve sync problems.
For subscription model I will create UUID by using random method.
Django Rest Framework has great docs. If you follow quickstart, you can set up working API in few minutes.
In my case, I created serializers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from rest_framework import serializers from models import Company, Subscription class CompanySerializer(serializers.ModelSerializer): id = serializers.UUIDField(required=True, read_only=False) class Meta: model = Company fields = ('id', 'name', 'created_at') class SubscriptionSerializer(serializers.ModelSerializer): company = CompanySerializer() class Meta: model = Subscription fields = ('id', 'company', 'price') |
I had to define additional id for the company serializer. By default id’s are read only (normally id’s are generated at the database level), but in my case I pass the id while creating the company.
I also created viewsets.py.
1 2 3 4 5 6 7 8 9 10 |
from rest_framework import viewsets from models import Company, Subscription class SubscriptionViewSet(viewsets.ModelViewSet): serializer_class = SubscriptionSerializer queryset = Subscription.objects.all() class CompanyViewSet(viewsets.ModelViewSet): serializer_class = CompanySerializer queryset = Company.objects.all() |
For the last step you have to add the viewsets to API router.
1 2 3 4 5 6 7 8 9 10 11 12 |
from rest_framework import routers from viewsets import CompanyViewSet, SubscriptionViewSet router = routers.DefaultRouter() router.register(r'companies', CompanyViewSet) router.register(r'subscriptions', SubscriptionViewSet) urlpatterns = [ url(r'^admin/', include(admin.site.urls)), url(r'^api/', include(router.urls)), ] |
Now when you access /api/companies/ or /api/subscriptions you should get a response (for now probably only empty array).
This part is very simple and there are tons of examples how to do this.
To create a company, I execute a POST JSON request (I’m using Postman) to /api/companies/ with the following payload.
1 2 3 4 |
{ "id": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61", "name": "My Test Company" } |
and I get returned
1 2 3 4 5 |
{ "id": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61", "name": "My Test Company", "created_at": "2015-09-22T10:56:58.876908Z" } |
Now I have a company in the database. Let’s create a subscription. Again, I execute POST JSON requst to /api/subscriptions with payload
1 2 3 4 5 6 7 |
{ "price": 80.0, "company": { "id": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61" } } |
and I get an error that company name is required. What?
Before I go into explaining what previous error means and how I solved it, I have to first explain what I want.
Other services that talk with my service use different HTTP clients. One of them is also Netflix Feign. With it you can simply create HTTP clients that map the request or response to DTO’s. For example, they have a SubscriptionDTO defined as
1 2 3 4 5 6 7 8 |
import java.util.Date; import java.util.UUID; public class SubscriptionDTO { public CompanyDTO company; public Double price; } |
and CompanyDTO
1 2 3 4 5 6 7 |
import java.util.UUID; public class CompanyDTO { public UUID id; public String name; } |
So same DTO is used for request and response. I want to pass the same DTO with all the required data when creating the subscription. When response is returned, it populates the existing SubscriptionDTO. This is important, because I want to eliminate the confusion when using multiple DTO’s for same entity (Subscription).
To return to previous error. When I want to retrieve the subscription, I also want to include company information in the subscription list.
1 2 3 4 5 6 7 8 9 |
{ "id": "1ffc2a43-6ca6-4ba7-a551-adfccee86427", "price": 0.0, "company": { "id": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61" "name": "My Test Company" } } |
I accomplished this by defining
1 |
company = CompanySerializer() |
in my SubscriptionSerializer. If I didn’t use this, then the response would be in format
1 2 3 4 5 |
{ "id": "1ffc2a43-6ca6-4ba7-a551-adfccee86427", "price": 0.0, "company": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61" } |
But I don’t want this, I want the full output. When I defined company field, I didn’t pass any arguments. By default it means that when I execute the POST, it will create subscription and all it’s relations (company). That is why I got an error that company name is required, because it wanted to create a new company (but name is missing). But I don’t want this.
I checked online and asked few people. Most of them suggested that I pass read_only=True argument when I define the company field: company = CompanySerializer(read_only=True). Now when I executed the POST, I got that subscription.company_id database field should not be null. Once you define read_only for a field, it’s data is not passed to method that creates the model (subscription). Why?
There are many discussions around how to solve this.
a) https://groups.google.com/forum/#!topic/django-rest-framework/5twgbh427uQ
b) http://stackoverflow.com/questions/29950956/drf-simple-foreign-key-assignment-with-nested-serializers
c) http://stackoverflow.com/questions/22616973/django-rest-framework-use-different-serializers-in-the-same-modelviewset
Some suggest different serializers, other using 2 fields (one for read and other for create/update). But all of them seem hackish and impose a lot of extra code. Author of the DRF Tom Christie suggested that I define CompanySerializer fields (except id) as read only. This kinda solved the problem. If company has additional fields, then I need to overwrite them also which means extra code. At the same time, I want to preserve the /api/companies/ endpoint for creating/updating companies. If I set fields as read only, then I wouldn’t be able to create companies without having additional CompanySerializer.
I tried to overwrite subscription create methods, but without a success. If I defined read_only=True when creating field company, then no company information was passed to validated_data (the data that is later used to created a subscription). If I defined read_only=False, then I was always getting “name is required” error.
I wanted a simple and working solution.
I started to look for a solution that was simple and enabled me to make the requests that I want. Digging through the code I noticed many methods for field creation that I could overwrite. On the end, I had to modify validation method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
from rest_framework import serializers from rest_framework.fields import empty class RelationModelSerializer(serializers.ModelSerializer): def __init__(self, instance=None, data=empty, **kwargs): self.is_relation = kwargs.pop('is_relation', False) super(RelationModelSerializer, self).__init__(instance, data, **kwargs) def validate_empty_values(self, data): if self.is_relation: model = getattr(self.Meta, 'model') model_pk = model._meta.pk.name if not isinstance(data, dict): error_message = self.default_error_messages['invalid'].format(datatype=type(data).__name__) raise serializers.ValidationError(error_message) if not model_pk in data: raise serializers.ValidationError({model_pk: model_pk + ' is required'}) try: instance = model.objects.get(pk=data[model_pk]) return True, instance except: raise serializers.ValidationError({model_pk: model_pk + ' is not valid'}) return super(RelationModelSerializer, self).validate_empty_values(data) |
I overwrote the validate_empty_values where I check the relation. The idea is that I check posted data. If there is an id (or primary key) of the relation model present, I validate that record exists for that id and return it. If it doesn’t exist or the data is invalid, I raise an error.
There is also a is_relation argument that you have to pass when creating serializer. The is only used when creating serializer as nestedserializer. The updated code is
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from rest_framework import serializers from models import Company, Subscription class CompanySerializer(RelationModelSerializer): id = serializers.UUIDField(required=True, read_only=False) class Meta: model = Company fields = ('id', 'name', 'created_at') class SubscriptionSerializer(serializers.ModelSerializer): company = CompanySerializer(read_only=False, is_relation=True) class Meta: model = Subscription fields = ('id', 'company', 'price') |
What this does is that now I can execute POST JSON requests with payload
1 2 3 4 5 6 7 |
{ "price": 80.0, "company": { "id": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61" } } |
and get a response
1 2 3 4 5 6 7 8 9 |
{ "id": "1ffc2a43-6ca6-4ba7-a551-adfccee86427", "price": 0.0, "company": { "id": "6230fbeb-bffd-4e37-b0e8-c545f4a83a61" "name": "My Test Company" } } |
Same DTO for request and response. At the same time, I didn’t modify the /api/companies/ endpoint. Companies get created/updated normally with all the required validation working as it should.
Having a tablet these days is pretty common thing. I have been (probably pretty rare) owner of Samsung ATIV tablet for many years. Because it’s too large for normal e-book reading, I decided to purchase a smaller 7 or 8 inch tablet.
After checking multiple reviews and comparing my top candidates, I decided to purchase ASUS MeMO Pad 7 (ME176C). Based on reviews it was a best offer for it’s price (I paid around 120 EUR). Tablet looks great, it has a pretty good CPU, enough ram and space, camera is OK, and has solid build.
After few days of usage I started noticing a pretty annoying problem. It’s updating all the time, adding new apps (that I actually don’t really need) and just taking up space. After I was able to uninstall few of them (TripAdvisor, Omlet chat, and others) I was still left with 20 applications that Asus installed on my tablet. 20! Biggest problem is that I will never use them as they are totally useless.
If I go in store and buy something, don’t I own the thing I just bought? Why then when I paid for the tablet, I’m not able to do with it what I want?
Having an Asus computer for over 5 years, I started to like Asus brand and their products. But what they did with my tablet is just awful. Now when I want to install new apps, I’m getting low space errors. The whole system works slow, it crashes and it have become totally useless. There are many forums topics about the same problem and Asus is not doing anything about it. Instead of them limiting the garbage they force on tables, they add extra support apps. But why?
The hardware part is pretty great. Why ruin it with software? Maybe it’s Androids fault. I have to admit that I’m not a big fan of Android. I see it as a big mess waiting to collapse.
Right now I’m in process of buying another, better tablet (hopefully Win10). But again, the problem can repeat even with other manufacturers. I miss the ability to get an empty tablet and install whatever system I like on it, similar to PCs. I know that won’t be possible anytime soon. I also understand that manufacturers need to create communities, but what Asus is doing right now is causing the opposite.
Would I buy another Asus tablet? NO. Would I recommend Asus tablet to someone else? No.
I have been using Play Framework from the 1.2 version. Lately, I do most of the work with the 2.2/2.3 version. It supports both Scala and Java (you can literally mix the code files). Because I know Java much better compared to Scala (well I don’t know Scala at all), I do all my coding in Java.
Play Framework comes with Akka that supports actors for processing data. I have about 20 different actors that handle different scenarios. Actors talk with each other, so there’s a lot of different messages. Each message represents specific action.
One thing that really frustrates me is that Akka and Java don’t play nicely. I mean, everything works, but the authors of Akka don’t put a lot of effort into Java. I know, Scala is the great new language and once you know it, why the hell would you still use Java. Problem is that examples and tutorials are mostly written in Scala, coding actors in Scala is much easier and testing is just damn short and sweet.
When I face a problem and I want to see how others solved it or learn a new thing, I notice that most (about 95%) of Akka examples are in Scala. That means I need to somehow decode examples and convert to Java. This is not always possible. Sometimes certain implementations in Scala cannot be directly converted to Java and different approach has to be used.
Official docs have Scala nad Java version, but Scala version has much more content. There are many very useful blogs, like http://letitcrash.com/, but it’s all in Scala. Most of the books for Akka are also in Scala. Similar is with opensource projects on github.com.
For somebody coming from Java world, this can be frustrating.
Java is not a language for writing short programs. So part of the fault is on Java, but again there could be better ways to write actors. For example, when I create an actor and want to process 3 different messages.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
public class ABCActor extends UntypedActor { private final LoggingAdapter log = Logging.getLogger(getContext().system(), this); @Override public void onReceive(Object message) throws Exception { if(message instanceof MessageA) { MessageA messageA = (MessageA) message; this.process(messageA.foo); } else if(message instanceof BMessage) { MessageB messageB = (MessageB) message; this.process(messageB.foo); } else if(message instanceof CMessage) { MessageC messageC = (MessageC) message; this.process(messageC.foo) } else { unhandled(message) } } public void process(String foo) { String reply = generateReply() sender().tell(reply, self()); } public String generateReply() { return "bar"; } public static class MessageA implements Serializable { public final String foo; public MessageA(String foo) { this.foo = foo; } } // similar for MessageB and MessageC } |
The code quickly become long (and I have very simple example), because it’s hard to filter messages and forward them to appropriate methods. While in Scala, things are much shorter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
object ABCActor { case class MessageA(foo:String) case class MessageB(foo:String) case class MessageC(foo:String) } class ABCActor extends Actor with ActorLogging { import ABCActor._ def receive = { case MessageA(foo:String) => process(foo) case MessageB(foo:String) => process(foo) case MessageC(foo:String) => process(foo) case _ => unhandled() } def generateReply():String = "bar" def process(foo:String) = { val reply = generateReply() sender ! reply } } |
For me it’s important that you can fit the whole logic into screen. So I can see the whole code without scrolling it. It’s easier to put the logic into my brand. This is not possible with Java code (of course, I could split it and add extra files, but again the instead of scrolling I would be clicking).
Sending a message
1 2 3 4 5 |
// Java actor.tell(message, self()) // Scala actor ! message |
Scheduling a message
1 2 3 4 5 6 7 8 9 10 |
// Java system.scheduler().scheduleOnce(Duration.create(5, TimeUnit.SECONDS), actor, message, system.dispatcher(), null); // Scala import context.system._ scheduler.scheduleOnce(5 seconds, actor, message) |
There are tons of other examples. What I’m trying to say it that it’s much easier to write actor logic with Scala compared to Java. Much easier.
When writing unit tests, you should always test small parts of the code. That means that tests should also be small and short. Because Scala is very descriptive language, you can easily define a test. For example, let’s test our ABC actor and see if it replies.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
class ABCActorTest(_system: ActorSystem) extends TestKit(_system) with FlatSpecLike with MockitoSugar with ImplicitSender with Matchers with BeforeAndAfterAll { trait Fixture { var message = new MessageA("foo") } def this() = this(ActorSystem("ABCActorTest")) override def afterAll() { TestKit.shutdownActorSystem(system) } "Actor" should "send a message and get a reply" in new Fixture { val actor = TestActorRef(Props(new ABCActor())) actor ! message expectMsg("bar") } } |
Again, much shorted, much easier to understand and mostly less possibility for mistakes. We can also test specific method of actor.
1 2 3 4 5 6 |
"Actor" should "generate reply" in { val actorRef = TestActorRef(Props(new ABCActor())) val actor = actorRef.underlyingActor actor.generateReply() should be == ("bar") } |
Or we can mock certain actor methods. First we need to create a trait (similar to Java interfaces), so we can mock methods.
1 2 3 4 5 6 7 8 9 |
trait ABCActorBase { def generateReply():String } class ABCActor extends Actor with ActorLogging with ABCActorBase { // same code } |
Now when we create a test, we can inject different ABCActorBase.
1 2 3 4 5 6 7 8 9 10 11 |
trait ABCActorTestBase extends ABCActorBase { def generateReply():String = "mocked bar" } "Actor" should "mock method" in new Fixture { val actor = TestActorRef(Props(new ABCActor with ABCActorTestBase)) actor ! message expectMsg("mocked bar") } |
Scala has ScalaTest, which in my opinion is one of the best testing libraries. It offers very descriptive test results and error reporting. It supports multiple styles of testing, some of them look really awesome.
Even though I didn’t like Scala, I had to realize that it’s actually a great language. Not my favorite, but it’s OK. Since I’m developing an application that runs on JVM, it’s a shame not to use it if possible to solve certain tasks. It has many many more great features that I didn’t showcase, so feel free to check them out.
Final result it that I rewrote all my actors with Scala. They are much shorter and it’s easier to understand what they are doing. At the same time, great part of tests are now in Scala and I have a feeling that code is much stable and more error prone. It’s been few weeks since I have new actors in production and there were zero problems. Happy coding.
In my previous post, I described why I switched from OpenTSDB (other time series database) to KairosDB. In this post, I will show how to install and run KairosDB.
To run KairosDB we actually just need KairosDB (if we ignore Ubuntu/Debian/something similar and Java). How is that possible? Well, KairosDB supports two datastores: H2 and Cassandra. H2 is actually an in memory H2 database. It’s easy to setup and cleanup, and it’s mostly used for development. Don’t use it in the production; it will work, but it will be very very slow.
For our tutorial we will use Cassandra as datastore. To install Cassandra, you can follow the official tutorial at http://wiki.apache.org/cassandra/GettingStarted. We will install it via apt-get.
1 2 |
deb http://www.apache.org/dist/cassandra/debian 21x main deb-src http://www.apache.org/dist/cassandra/debian 21x main |
You will want to replace 21x by the series you want to use: 20x for the 2.0.x series, 12x for the 1.2.x series, etc… You will not automatically get major version updates unless you change the series, but that is a feature.
We also need to add public keys to be able to access debian packages.
1 2 3 4 5 6 7 8 |
gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D gpg --export --armor F758CE318D77295D | sudo apt-key add - gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00 gpg --export --armor 2B5C1B00 | sudo apt-key add - gpg --keyserver pgp.mit.edu --recv-keys 0353B12C gpg --export --armor 0353B12C | sudo apt-key add - |
Now we are ready to install it.
1 2 |
sudo apt-get update sudo apt-get install cassandra |
This will install the Cassandra database. Few things you must know is that the configuration files are located in /etc/cassandra, and the start-up options (heap size, etc) can be configured in /etc/default/cassandra. Now that Cassandra is install, run it.
1 |
sudo service cassandra start |
Another requirement is that you have Oracle Java JDK instead of OpenJDK. You must install version 7 or 8 (8 is recommended, I’m using 7). Again, we will install it with apt-get.
1 2 3 4 5 |
sudo apt-get install python-software-properties sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java7-installer sudo apt-get install oracle-java7-set-default |
Source: http://stackoverflow.com/a/16263651/73010
KairosDB uses Thrift for communicating with Cassandra. When I installed Cassandra, it wasn’t enabled by default. So I had to enable it first. There are many ways and if you hate to fiddle with config files, you can install OpsCenter. It’s a really great tool for monitoring your cluster. It has a simple interface where you can access your nodes and change their configuration to enable Thrift. To change it the in the config file, update start_rpc setting to true in /etc/cassandra/cassandra.yaml.
We can again install KairosDB in few ways.
a) Building from the source
a) Clone the git repository https://github.com/kairosdb/kairosdb.git
b) Make sure that JAVA_HOME is set to your java install.
c) Compile the code
1 2 |
export CLASSPATH=tools/tablesaw-1.2.2.jar java make |
b) Installing via .deb package (recommended)
Current stable version is 0.9.4 1.1.1. Make sure you download the latest version at https://github.com/kairosdb/kairosdb/releases.
1 2 |
wget https://github.com/kairosdb/kairosdb/releases/download/v1.1.1/kairosdb_1.1.1-1_all.deb sudo dpkg -i kairosdb_1.1.1-1_all.deb |
As mentioned before, KairosDB by default uses H2 database for datastore. We need to change it to Cassandra.
a) If you are running from source, then copy kairosdb.properties to KairosDB root folder from src/main/resources/ folder to change it.
b) If you installed it, then change the file /opt/kairosdb/conf/kairosdb.properties.
In the file comment the line where H2 is set as datastore and uncomment Cassandra module. So the file should look like this.
1 2 |
#kairosdb.service.datastore=org.kairosdb.datastore.h2.H2Module kairosdb.service.datastore=org.kairosdb.datastore.cassandra.CassandraModule |
You can also change some other setting to tune it, but for now just save it and you are ready to go.
Make sure your Cassandra service is running. Now lets run KairosDB.
a) Running from source
1 |
java make run |
b) Or if installed
1 |
sudo service kairosdb start |
Go to http://localhost:8080 to check if everything works OK. If you can see KairosDB dashboard, then congratulations, you can now use KairosDB.
In the next tutorial we will see how to save, query and delete datapoints with web interface, HTTP and Telnet API.
Akka actors are great when we are looking for a scalable real-time transaction processing (yes, this is the actual definition using some big words). Actually, it’s really great for some background processing because you can create many instances without actually worrying about concurrency and parallelism.
We have a simple application for processing the uploaded file. We accept the file, parse it (simple txt file), calculate the values and save them in some database. We could have everything in one actor, but it’s much better to split it in multiple actors and create a pipeline. Each actor does exactly one thing. We have much more clean code and at the same time, testing it is much easier.
We have (for this demonstration) 2 actors. One reads the files to a List and sends it to another actor.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
public class ReadFileActor extends UntypedActor { @Override public void onReceive(Object m) throws Exception { if(m instanceof ReadFileMessage) { ReadFileMessage message = (ReadFileMessage) m; List<Double> numbers = this.readNumbersFromFile(message.filename); // some simple method to read file final ActorSelection actor = context().system().actorSelection("user/calculator-actor"); actor.tell(new NumbersMessage(numbers)); } else { unhandled(m); } } public static class ReadFileMessage { public final String filename; public ReadFileMessage(String filename) { this.filename = filename; } } } |
The second actor gets a the List of numbers and calculates the sum of them.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
public class CalculatorActor extends UntypedActor { @Override public void onReceive(Object m) throws Exception { if(m instanceof NumbersMessage) { NumbersMessagemessage = (NumbersMessage) m; Double sum = this.sumNumbers(message.numbers); // send to another actor... } else { unhandled(m); } } public static class NumbersMessage { public final List<Double> numbers; public ReadFileMessage(List<Double> numbers) { this.numbers= numbers; } } } |
If we used this code, we would quickly discover problems. When I tested it with VisualVM for memory leaks, I quickly discovered a memory leak with List
When passing object between actors we need to follow few guidelines. If we brake them, we can face memory leaks and consequentially app crashes. One of the guidelines is to use Immutable collections. If we pass them between actors, they have to be Immutable. What are the advantages of Immutable objects?
There are many implementations of Immutable collections and one of the best ones is in Guava.
We have to use ImmutableList to create a list of numbers for passing between actors.
1 2 3 4 |
... ReadFileMessage message = (ReadFileMessage) m; ImmutableList<Double> numbers = this.readNumbersFromFile(message.filename); ... |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
... NumbersMessagemessage = (NumbersMessage) m; // numbers is now ImmutableList<Double> Double sum = this.sumNumbers(message.numbers); .... public static class NumbersMessage { public final ImmutableList<Double> numbers; public ReadFileMessage(ImmutableList<Double> numbers) { this.numbers= numbers; } } |
Rerunning VisualVM confirmed that memory leak was resolved. Great.
In my previous post, I described how to correctly install and use OpenTSDB. After some time, I decided to move on to other solution.
Before everything, we need to know one thing. Because of IoT, the demand for storing sensor data has increased dramatically. Many new projects emerged, some are good, some are bad. They are different in technologies used, how fast they are and what kind of features they support.
You can read the full list of all IoT timeseries databases that can be used for storing data of you Internet of Things projects or startup.
OpenTSDB is great, don’t get me wrong. But when you try to use is with some more complex projects and customer demands, you can quickly hit the wall. It’s mostly because it involves a lot of moving parts to make it work (Hadoop, HBase, ZooKeeper). If one of the parts fail, the whole thing fails. Sure, you can replicate each thing and make it more robust, but you will also spend more money. When you are starting, it’s a over optimization and waste of money (that you don’t have).
Aggregation of the data is another problem. It does support basic function like min, max, avg etc. I spent days investigating the problem why avg aggregation is not working correctly when I filter by multiple tags. It just didn’t want to work and I couldn’t find anything in the docs. I asked on Google group and after some time I got a reply that I must use another aggregation function and that even that doesn’t work 100% as I want it. Another problem is when I want to get just one value – for example avg of all values from X to now. Not possible!
No clients to talk with OpenTSDB is another problem for me. Sure, storing the data with socket API is super simple and can be easily integrated in every language. The HTTP API is another story. Sure, again it shouldn’t be a problem to implement my own client, but why waste time with this?
Development of the OpenTSDB is slow and it takes ages for new features to be integrated. One of them (one of the most important for me) is an ability to support time zones. It’s used when downsampling data to one day (or even more) so data is correctly grouped. There was some work, but until today it still wasn’t implemented. Too bad.
On the bright side, OpenTSDB is super fast. I was able to store and load data as super fast rate – loading 3 million records in few seconds is for me super fast. Try it with relational database and you will be quickly disappointed.
I remember when I was doing a research, I noticed KairosDB but I didn’t spend too much time testing it. It just wasn’t appealing and I didn’t know how it actually works. Big mistake.
KairosDB uses Cassandra to store data (compared to HBase used with OpenTSDB) and it’s actually a rewritten and upgraded version of OpenTSDB. It has evolved into great project. It has many more features: many more (and fully working) aggregation methods, option to easily delete metric or datapoint, easy extensibility with plugins etc. It has great clients and has much more active community. I remember when I asked a question on OpenTSDB Google group and waited weeks for an answer (I’m not forcing anyone to provide the support, because after all, it’s an opensource project), while on KairosDB Google group I got it within a day.
Why is this important you might ask? Well, when you are catching deadlines and something goes wrong, responsive community is very important. Sometimes this kind of things can be a difference between success and a failure.
I wrote an tutorial how to start with KairosDB. You can also you visit kairosdb.org and check out the documentation. Feel free to play with it, test it and hopefully also use it in production. I