Large Scale Guice Projects – Module Deduplication

If you come from a large Java shop, you’ve likely heard of or encountered Guice — Google’s lightweight dependency injection framework; analogous to Spring.

First time users of Guice will usually be starry eyed amazed at the ability to get type safe dependency and the seemingly modular way in which to bundle up your dependencies into Modules.

Much of the tutorials, guides and best practices found online though are targeted towards smaller codebases, and anyone in large Java shops will have likely hit the spaghetti of configuration and AbstractModule dependencies to get your code to boot up — seemingly ruining the modularity of using a dependency injection framework.
This post is aimed at some best practice I’ve found for keeping AbstractModule composable and easy to understand.

If you don’t want to read to the end just checkout my project on solving Guice deduplication — guice-dedupe


The biggest hurdle large projects will face in Guice is that you’ll want to keep Modules resuable and more importantly self-contained.

Consider the following example where I have a JSON class that is bound in a module, and two other modules want to make use of it.

We’d like to make use of the install option, so that a consumer can either use ModuleB or ModuleA and the necessary bindings are self-contained. The problem arises if ModuleA and ModuleB are used — you’ll be treated with a Multiple Binding Exception.

Many codebases, simply remove the installation of Module dependencies and move the mess of figuring out the right set of final modules you need at moment you try to create the injector. What a mess!

The way to solve this is to use Guice’s built-in de-duplication code. For the most part it works out of the box, unless you are using @Provides in your modules.
Simply change all your existing AbstractModule to SingletonModule from the library guice-dedupe and you’ll get modules that are fully self-contained now even with providers.

WaterFlow – SWF Framework

SWF Framework

I’ve had pretty good exposure to SWF through my last stint at Amazon/AWS and I grew to love the service. Once you get past some of the confusing aspects of programming in a very stateless / distributed manner, you begin to appreciate the
true power that is available to you.

At my previous team, even though it was within AWS, the team had created their own SWF Framework – mostly because they pre-dated the AWS Flow Framework. I was exposed to some interesting concepts that were necessary in the custom framework and that were lacking in Flow.

Personally, although Flow is a great framework, I never loved the use of Annotation processing through AspectJ. It makes it hard to debug code in your IDE, reason about the code mentally and difficult to set up on anything other than Eclipse.

Recently I came across, which is a very minimal SWF framework that targets Java 1.6. It gave me a good idea on how you achieve something pretty robust with SWF with minimal code. I found it however to be too much on the other extreme. Whereas Flow was overly complicated and too magical, I found swift to be lacking when writing the workflow/decider.


I decided to take the best parts of Flow and the best parts of Swift and make WaterFlow. Its a relatively small SWF framework in the same vein of Swift but brought into the world of JDK8 and with strong asynchronous programming story (for when orchestrating the decider). I’d love to help get someone bootstrapped on it and help them with onboarding! Please contact me.

Learning Netty – HTTP Echo

Learning Netty – Part 1

I recently picked up a copy of Netty in Action – which has been a great to learn more about Netty. I’ve become more fascinated with Netty as I’ve delved farther into Reactive & Asynchronous programming. Netty is a very powerful framework that simplifies a lot of the challenges with NIO programming however it is still difficult to find resources / examples of every day use cases.


Many examples online start with building an Echo Server & Client. However they use a simple TCP echo server – which although probably more reasonable for the server doesn’t show a barebones HTTP setup.

The following is a Gist of a barebones setup of setting up an HTTP server and client. The example shows a few simple defaults such as the use of HttpObjectAggregator, HttpServerCodec and SimpleChannelInboundHandler.

PhotoReflect Unwatermarked image

Reverse Engineering

I lately had a special event where a photographer took some photos of the occasion and used PhotoReflect to host/sell me them.

The price per photo is 29$ or the whole set (90) 699$. I consider that an egregious amount to charge, considering that her services were supposedly already included with the event.

The following are some instructions you can follow for downloading the non-water marked medium quality images. This assumes that you are running on a Mac however the commands can easily be modified for any setup.

The real magic here is in the s=-3 portion of the request, which I’ve found returns the image without the water-mark.

I still haven’t figured out how to download the HighQuality images, but if you find out please share!

EDIT: I have also found a helpful Gist that can help inject a download button onto the store.

HypeMachine Chrome Extension – Still going

Short and Sweet

This will be a short and sweet post. It’s been a while since I’ve used my HypeMachine Chrome Extension and just as long since I’ve used HypeMachine (my music taste has shifted). However I’m always surprises me the community of people using the extension.

I’ve recently merged in some new code that allows the user to download multiple songs at once which was graciously written by Scott Clayton. The code is now in master, so feel free to re-install the extension and give the new functionality a whirl!

Pagination in Clojure

Luminus – Pagination

I’ve recently been working on a fun side project using the Luminus web framework as my first foray into Clojure (which I’m absolutely falling in love with)

One thing however I find missing from the documentation and in general online is an idiomatic way to paginate in Clojure. I’m sure there is some sexy pagination strategy that uses lazy-seqs, macros, protocols and records however I was not able to come up with anything (myself or via google).

I’m dumping my small helper functions that I ended up writing in hopes that perhaps someone finds use for it:

Ultimately one would use the create function to include in their context/response a structured Pagination map.

If you have anything better please share!

AWS Lambda – Sending CloudTrail notifications to CloudSearch


Amazon has just recently announced AWS Lambda, which is a pretty cool new service that runs your code in response to events. The service manages all the compute resources for you and is a nice hands-off approach to running things in the cloud (How much easier can it get!). At the moment there are only a few event sources that are supported by AWS Lambda however one of them are S3 Put notifications (creation/update of new keys/objects).

CloudTrail & Inspiration

Recently at work I wanted more insight into some of the API calls that were made on our AWS accounts (occasionally mysterious actions have occured and finding the CloudTrail could prove fruitful). I’ve recently written on setting up an EMR cluster connected to your CloudTrail S3 bucket to perform easy queries against your dataset however I find that too much power in most cases and thought there should be a simpler way.

I had come across this blog post which outlines sending CloudTrail events to CloudSearch with the help of SQS, & SNS. Now that AWS Lambda exists can it be simpler!
You bet!

I’ve created the following gist which you can upload to AWS Lambda to start sending your S3 CloudTrail notifications to CloudSearch

In order to utilize the script, make sure you’ve created a CloudSearch domain and added the index fields in the MAPPINGS variable (you can use the helpful script in the linked blog post here).

Scalatron Build.sbt file to the rescue


I’ve been recently playing around with writing a bot for Scalatron however I didn’t find any great explanation on how to setup a nice development process with SBT. The closest I could find was this blog post but it left a lot to the imagination. I hope you find my annotated Build.sbt below better and more clear!

If you launch sbt and run play you should see the Scalatron server start up and pickup your Bot!

Spray – Building a website

Spray what?

So for my current ongoing project we’ve been using the excellent Play! Framework; which although primarily designed for websites we used it for an API WebService. So I figure I’d do the opposite with Spray. Spray is a http framework for building web services on top of Akka actors. The next few posts will probably be around some patterns I did in getting a MVP Login demo that accepts UserPassword or OAuth[2] providers.

The following is the specific commit which I believe I’ve achieved enough of an MVP if you’d like to later browse how its done.

Composing Routes

All the examples on the Spray website show the DSL for the Routes in a single class however I found this to be confusing and exploded to be too large. There are two options people have discussed when composing routes:

  1. Create multiple HttpService actors. Have the main actor forward the RequestContext to other actors based on some route prefix
  2. Use a trait to easily compose the routes that will be given to the runRoute method of the single HttpActor

Although option (2) sounds less everything on Akka ideal, it was the option I chose as I found it very easy to bootstrap and the results were fantastic.

Analyzing CloudTrail Logs Using Hive/Hadoop


This is simply a blog post record for myself as I had great difficulty in finding information on the subject. It’s not meant to be a very informative guide on either CloudTrail or Hive/hadoop


Recently at work we’ve had an issue where some security group ingress rules were being modified (either automated or manually) and it has been affecting our test runs that rely on those rules. In order to try and track down the source of the modification we have enabled CloudTrail. CloudTrail is part of the AWS family of web services and it records AWS API records you’ve made and places those logs in an S3 bucket that you can access.

The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service.


My experience with Hive has been very limited (simple exposure from running tutorials) however I was aware that it was a SQL-ish type execution engine that transformed those queries into MapReduce jobs to execute using Hadoop. As it was built with Hadoop that means it has native support for using S3 as a HDFS.

With the little knowledge of Hive I had, I thought there should exist a very prominent white paper in which describes how to consume CloudTrail logs using Hive (using some custom SerDe). A co-worker was simply consuming the JSON log files via Python however I was on a mission to see if I could solve the solution (querying relevant data from the logs) using an easy-setup with Hive! The benefit of setting up the Hadoop/Hive cluster for this would be that it could be used easily to query additional information and be persistent.


After contacting some people from the EMR team (I was unable to find anything myself on the internet) I was finally pointed to some relevant information! I’ve included the reference link and the original example code for incase the link ever breaks.

The key thing to note from the example is that it is using a custom SerDe that is included with the Hadoop clusters created with AWS ElasticMapReduce. The SerDe includes the input format table and deserializer which will properly consume the nested JSON records. With this you can now query easily CloudTrail logs!