Spray – Building a website

Spray what?

So for my current ongoing project we’ve been using the excellent Play! Framework; which although primarily designed for websites we used it for an API WebService. So I figure I’d do the opposite with Spray. Spray is a http framework for building web services on top of Akka actors. The next few posts will probably be around some patterns I did in getting a MVP Login demo that accepts UserPassword or OAuth[2] providers.

The following is the specific commit which I believe I’ve achieved enough of an MVP if you’d like to later browse how its done.

Composing Routes

All the examples on the Spray website show the DSL for the Routes in a single class however I found this to be confusing and exploded to be too large. There are two options people have discussed when composing routes:

  1. Create multiple HttpService actors. Have the main actor forward the RequestContext to other actors based on some route prefix
  2. Use a trait to easily compose the routes that will be given to the runRoute method of the single HttpActor

Although option (2) sounds less everything on Akka ideal, it was the option I chose as I found it very easy to bootstrap and the results were fantastic.

Analyzing CloudTrail Logs Using Hive/Hadoop


This is simply a blog post record for myself as I had great difficulty in finding information on the subject. It’s not meant to be a very informative guide on either CloudTrail or Hive/hadoop


Recently at work we’ve had an issue where some security group ingress rules were being modified (either automated or manually) and it has been affecting our test runs that rely on those rules. In order to try and track down the source of the modification we have enabled CloudTrail. CloudTrail is part of the AWS family of web services and it records AWS API records you’ve made and places those logs in an S3 bucket that you can access.

The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service.


My experience with Hive has been very limited (simple exposure from running tutorials) however I was aware that it was a SQL-ish type execution engine that transformed those queries into MapReduce jobs to execute using Hadoop. As it was built with Hadoop that means it has native support for using S3 as a HDFS.

With the little knowledge of Hive I had, I thought there should exist a very prominent white paper in which describes how to consume CloudTrail logs using Hive (using some custom SerDe). A co-worker was simply consuming the JSON log files via Python however I was on a mission to see if I could solve the solution (querying relevant data from the logs) using an easy-setup with Hive! The benefit of setting up the Hadoop/Hive cluster for this would be that it could be used easily to query additional information and be persistent.


After contacting some people from the EMR team (I was unable to find anything myself on the internet) I was finally pointed to some relevant information! I’ve included the reference link and the original example code for incase the link ever breaks.
reference: http://www.emrsandbox.com/beeswax/execute/design/4#query

The key thing to note from the example is that it is using a custom SerDe that is included with the Hadoop clusters created with AWS ElasticMapReduce. The SerDe includes the input format table and deserializer which will properly consume the nested JSON records. With this you can now query easily CloudTrail logs!

SQS , Apache Camel & Akka

Akka Apache-Camel Via SQS

This is an example project of how to setup a sample project using Akka, Apache-Camel and SQS together. Never heard of them or curious how they interact with each other?


Apache Camel is a rule-based routing and mediation engine.

What that basically means is that Apache Camel provides a common API for exchanging messages across a variety of platforms/protocols such as: HTTP, SQS, AMQP


Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM.

What that basically means is that Akka is a framework for writing code (known as actors) that lend to code to be easily distributed.


Amazon Simple Queue Service (SQS) is a fast, reliable, scalable & affordable message queuing service.

Why should you care!?

I’ve recently had the pleasure of releasing some code on heroku using the PlayFramework. Although deployment and initial setup was a breeze, I was bummed out to find that using Akka’s protocol was not doable as only standard ports on heroku are allowed (80,443). This leads to being unable to use Akka actors in a proper distributed model (i.e. they can’t talk to each other!)

Heroku has made a post where they outline using RabbitMq instead of the default Akka protocol however I did not find simple / ideal.

This brings us to this sample project! Leveraging Apache-Camel & SQS it was very straightforward to send messages to distributed actors.

Please check out the project on my Github page: https://github.com/fzakaria/Akka-Camel-SQS

Try #2: Fixing download name


Let’s get straight to the point; I believe I have finally fixed the download name issue. You can download the latest fix in the chrome-bug-373182 branch; simply download the zip chrome-bug-373182.zip


For those curious who have been tracking the Chrome bug; I thought the issue was unsolvable as the latest fix has caused any cross-origin requests to not respect the download attribute (except for data URIs). I had nevertheless believed I had fixed it earlier because when testing some downloads (from the popular page) I only attempted the first page of songs loaded. The glaring problem however was that the songs loaded upon scrolling down were not receiving the jQuery on event (even with the delegation syntax!). My solution at the moment, is simply to unbind all event handlers for the array key and re-bind them upon any ajax completion. This isn’t the greatest solution but it works!
Perhaps someone can help me fix it so that it uses jQuery’s delegation event handling properly…

Unit test in Play! Framework with Slick


The play framework includes some pretty good integration/documentation on how to perform testing however most of it relies on requiring a Play! application to be created (i.e. FakeApplication) and the fact that they share the same database. The second point was the more troublesome one, since in ScalaTest the tests are run in parallel. The following is what I have done to get around these problems.

The above code makes re-use of the Evolution plugin provided by Play! The code creates Evolution objects (which contain UpSQL and DownSQL statements) and we can apply them on a before / after method before each test.

Scala Closure Example

Somewhat more real example

Closures aren’t new to me. I’ve used them a bunch when I’ve done JavaScript however I felt like the concept of a closure was really well defined/understood in JavaScript. Although I was aware you could likely (wasn’t sure until I google’d) do it Scala, every example online infuriated me!
Every example was pretty trivial (i.e. summing function) and used a global variable.
I wanted to give a better example!


Fix for latest Extension Hypemachine bug

Wah Happened?!

During the last week I got a lot of e-mails from people telling me how the extension was no longer working for them. The complaint was the name of the song downloaded seemed to have been ‘garbled’ or no longer working. I’m sorry it took me a while (~1 week) to dig into it, however I now have published the fix into a new branch on Github.

For those curious, the regression (bug) looks to be caused by Chrome/Chromium and is currently being tracked here, where they don’t seem to be respecting the download attribute for anchor tags.

Enjoy the music,

Play! Framework with Akka actors on Heroku

Play! Framework

Recently for a project I’ve gotten to know the Play! framework and really the Scala language. The introduction into Scala was a very interesting curve, as I feel like I’m constantly flipping back and forth between having a solid understanding and a complete wtf confusion as I learn deeper pragmatic/idiomatic scala code. I have grown to love functional programming and can’t wait to flex some of the map / Option / fold / reduce in the new Java 8 language. Working with the Play! framework was a great first choice as there seems to be a good community and documentation however with any project the more complex it becomes the farther you go from being able to find good examples and into ‘icky’ situations’


I think you can’t get far into learning Scala with at least hearing about Akka and how cool it is. You read the introduction about Akka in plenty of Play!/Scala books and start to imagine how you’r architecture will look when everything is this beautiful encapsulated Actor. I’ve found however some nifty roadblocks setting it up and this blog post will pretty much be a brain dump on some of the issues I’ve resolved; namely cause of non-interoptiability with Heroku

So you want to use Akka with Play! ?

You want to use Akka with Play! ? Great! Play! actually bundles an Actor system already defined which is managed by the lifetime of the web server (whatever that really means) and come preconfigured to do some X thing which isn’t very clear. The first issue I ran into however is that I wanted to have Akka Actors to perform general book-keeping which meant they would have to be somewhat managed so that only a single instance is running every interval (i.e. Singleton). This sounds like it should be doable with Akka thanks to their SingletonClusterManager however you quickly learn that remote actors are not possible on Heroku. Bummer!

There is a blog post from Heroku on how to setup an Akka cluster using RabbitMQ as their event bus but it just seemed like way to much overhead for my simple use case. Here is some general steps to get Singleton Actors working in Heroku. The overall plan will be to nest a sub-project within your Play! application which contains it’s own Main class that will launch all singleton actors. This sub-application should then be launched only on a single dyno.

Setup a sub-module

Understanding the need for a sub-module is important or at least why I chose to do it this way. The way in which Heroku launches a Play! application is via a start script which is created from the SBT Native package installer when the stage command is executed. You can view the file for your application by opening up target/universal/stage/bin/ and you can see that is launches a NettyServer.scala class.

We could have easily included additional Main methods within our Play! project and included a worker in the Procfile which executes a java target (like what is happening within the run script) however I wanted to leverage the script which meant a new module/project has to be created (limitation of the SBT packager is that it only supports one main method per project). Much of the steps are followed idential from http://www.playframework.com/documentation/2.2.x/SBTSubProjects

Here are some interesting notes:

  • Don’t forget to set that the sub-project is an aggregate so that commands waterfall down to the inner project
  • If you want to re-use the configuration file from the higher project, make sure you reference the path now accordingly
  • Procfile entry: actors/target/universal/stage/bin/actors -Dconfig.file=../conf/application.conf -Dprocess.type=actors

How to install Elasticsearch on EC2 with Amazon Linux

This is largely a brain dump since I spent some time trying to get elasticsearch working for EC2.


Creating the Instance

  1. Create an IAM role with the needed credentials for the elasticsearch aws plugin. For now, I didn’t mess with setting up the fine grained policy and just opted for ‘power-user’. A good follow up would be to share a default policy that we can send as a pull-request to the plugin Github.
  2. Create a security group which has at least port 22, 80, 2003 and 2004 open. I’m not sure the complete difference between 2003 and 2004.
  3. Launch the instance!

Installing Elasticsearch

  1. I opt-ed to install Elasticsearch via the RPM because it includes a chkconfig script to start Elasticsearch as a service. You can download the rpm here. Do not turn it on yet!
  2. In the installed directory (/usr/share/elasticsearch for RPM) run the following command: bin/plugin -install elasticsearch/elasticsearch-cloud-aws/2.1.0

Configuring Elasticsaerch

Within the configuration file (/etc/elasticsearch/elasticsearch.yml) perform the following modifications:

  1. cluster.name: pick-a-cluster-name
  2. discovery.type: ec2
  3. discovery.ec2.host_type: public_ip
  4. discovery.ec2.groups:
  5. discovery.ec2.ping_timeout: 5m
  6. cloud.aws.region:

The security group option is important otherwise the plugin will cause Elasticsearch to try and discover all hosts within that region and fail if they do not have elasticsearch. Limiting the search to only instances with the specified security group will fix that!


sudo service elasticsearch start

New Changes to HypeMachine Extension

Minor improvements

Hasn’t been a while since I’ve written a blog post and this one will be pretty short. I’ve recently submitted a small push to my HypeMachine-Extension. The commit includes:

  • General code cleanlyness (so people can learn/understand better what it’s doing)
  • Added Google Analytics to see how many songs are being downloaded in general.
  • Added download attribute so that the song automatically downloads when you click the button!

I think the best reason to grab the latest changes is for the download on click fix, which also sets the name of the file download to the “title – artist.mp3”

Open Questions

Ideas for Metrics?

I’d love to hear ideas on some neat Google Analytics people would like to see tracked. I’ll be publishing the metrics I’m tracking now once I get enough data!

Google Chrome Store ?

Does anyone know enough of copyright law/trademark infringement to help me understand whether I can safely re-publish the extension to the Google Chrome Store?