System tracing with AWS X-Ray

Knowing your application is running fast or slow is one thing, understanding what makes up those characteristics is another. This is not a new problem, APM (Application Performance Monitoring) tools like New Relic and AppDynamics will happily take your money to make this problem easier and have done for a while. However, this problem gets even more complicated in a serverless environment where integration points are not always totally under your control. This is a frustration of mine as AWS Lambda can often be an opaque beast.

AWS X-Ray is a lightweight entrant into this field, with the extent of the configuration being a tick box in the ‘advanced settings’ section of the ‘config’ tab. The feature switch will even add the required permissions for you.

With this enabled, each request will now produce a trace.

Now for the interesting bits, we can see that this Lambda was cold and took a while to start executing. We can also see where Amazon have called out some of their ‘initialization’ time.

Slow Lambda execution

My Lambda includes a call to a RESTful API but the default X-Ray setup won’t show this. I want to call this out especially in my monitoring because this external dependency has the possibility of affecting my service. To do this I need to create a sub-segment, Amazon as ever have thought of everything and have created a decorated version of HttpClient with their own X-Ray integration. It’s literally a drop in replacement with no further configuration. With this addition, I now get a much clearer view of the profile of my service. The remote API call I’m calling makes up a large percentage of the total time taken.

Lambda execution with HttpClient augmented

Amazon does support creating your own custom sub-segments which is something I want to try out soon and maybe write about later.

The other interesting feature I’ve not yet tried but is also on the backlog for investigation is attaching metadata to the segments and sub-segments. I can imagine this being really useful by adding pertinent data to traces, this could then be used in investigating themes of slowness for example.

Finally, another freebie from Amazon: out of the box, a service map is created linking together any associated resources it can find from your traces. The green around each service is a ring chart of % success, these turn yellow if there are problems.

I could imagine a slightly more sophisticated version of this being a fantastic DevOps dashboard.

linked services

AWS CodeStar

I try to read about new AWS services as I hear about them from Amazon’s newsletters. The first two pages of the CodeStar tutorial this genuinely delighted me!

This is what I want as a developer of small projects. Very excited to try it out!

I say small projects because I would probably want more flexibility over where my source code is stored if I was thinking about enterprise size projects.

Reporting Squad Confidence

As a software team lead one of the common things I’m asked is “how is everything going?”, “will project x be ready by date y?” and it’s something I’ve always struggled with. Everyone who has worked in software delivery will understand that it is difficult to estimate this type of work. It is especially true though of teams working in a new area, where there is no prior experience of the task. This is often where my current team sits.

Over my career, this question has manifested in a number of different ways. In the time before Agile was widely adopted it was the project manager who thought that asking more often would increase the chance of a different answer. Currently, at Sky Betting & Gaming this question is turned around and instead of being pestered for an answer I report a status of Red, Amber or Green for the work the team is doing and include a short summary of completed work and next steps. This is absolutely an improvement but still felt very wrong to me. At the start of a piece of work, it’s easy to go along with the new energy and report Green, when it might be better to report Amber or Red since there are potentially a lot of unknown risks.

Recently with the help of Sean from Optimise Agility we’ve been working through some of the pain points at SB&G and I spoke to him about this interface between the leadership team and their need for assurance that the work was in hand and the delivery team and their need to do the work.

The traffic light strategy didn’t contain the information we wanted to give. I wanted to show that our iterative delivery approach didn’t match up with the Gantt chart style product roadmaps.

We decided to try an experiment with the delivery team, asking them a simple question of “Based on a release at the end of May, what confidence do you have that the following will be in production ready to be used.” Giving ten options ranging from “0 – No Chance”, to “10 – Dead Cert!”

 

Bar graph of confidences
Blurred because the specifics might not make much sense.

This gave us a quick, data-driven and cohesive expression of the remaining work. This felt right, but having this wasn’t enough; we needed to show other people that we were thinking about this and tell them. This became another quick experiment with this data being show at the edge of our team space on some spare wall.

So far the feedback has all been very positive, the experiment will continue and we’ll fine-tune this new approach over time.

Using AWS Certificate Manager and ELB with WordPress

Over the last couple of weeks, I’ve been using this website as a way to learn more about Amazon Web Services. I’ve always found topics easier to learn when I have something practical to apply them to.

During some time away from work, I started looking into making this site HTTPS. Security is something which interests me in and out of work, so it seemed like a good idea.

The plan was to add HTTPS to my WordPress EC2 instance, but I didn’t know much more than that.

I knew that Amazon had Certificate Manager, so I started there.

Here is the first piece of AWS awesomeness, but with a cost: they will give you a free certificate, but currently you can only use it with their Elastic Load Balancing. On first consideration, I thought this was going to be a hassle and increase the work required, but now on reflection, I think this is a good approach.

I can add ELB in front of my EC2 instance and use that to terminate my SSL connections. Now my application doesn’t really have to change; it can be oblivious to the security layer I want to add.

Two different ELB styles

To reduce complexity I went with the Classic Load Balancer instead of the Application Load Balancer. I didn’t feel I needed any of the application level features.

One thing to watch out is that the default health check uses “/index.html”. The install of WordPress I’m using produced an error for that URL and the ELB took my instance out of its pool, this effectively took my site down. Changing the health check to “/” was easy and brought me back online.

Out-of-the-box ELB comes with some really useful monitoring baked into its console UI. This was great for identifying and resolving the problem, a common theme on this platform.

ELB Metrics

I choose to maintain both HTTP and HTTPS for my website, mapping both the HTTP on my instance. Again my instance doesn’t know anything about the ELB.

mapping http and https

Java builds on Travis CI .org

I’d seen the name Travis a number of times in relation to Continous Integration, but never having any non-work repositories I’d never paid any attention to anything other than Jenkins.

But since starting on more private projects I did come to miss not having a  ‘it works on my machine’ check. I didn’t really have an appetite to setup Jenkins locally, although it might not be much hassle with Docker and compose, it just wasn’t something I wanted to spend time on.

This is where Travis-ci.org comes into the picture. Firstly it is completely hosted; secondly, it’s completely free. (Enterprise packages available on travis-ci.com)

The org site is specifically set up to ease integration with your own Github repositories. Once you’ve signed in with the SSO/SAML magic you’re given a list of your private repos and simple tick boxes for which you want to enable.

From there is a tiny domain-specific language file you need to insert into the root of your repo. I’ve included the simplest Java one I could get away with as an example below.

file: .travis.yml

language: java
jdk:
- oraclejdk8
sudo: false
script: mvn clean verify

The build will then trigger whenever you push to the repo. Each build produces a summary plus the raw build process output.

Adding Graphite support to Spring Boot Actuator

I made a very simple Graphite implementation of the MetricWriter interface, you can see the code in my GitHub repo spring-boot-actuator-graphite.

If you want to use it in a production environment please be aware that it does open and close a new Socket connection for each metric measurement.

Assuming you have a Spring Boot Actuator project, below is an example @Configuration class to enable writing your metrics to Graphite.

 
@Configuration
public class MonitoringConfiguration {

    @Value("${spring.metrics.export.graphite.prefix}")
    private String prefix;

    @Value("${spring.metrics.export.graphite.host}")
    private String host;

    @Value("${spring.metrics.export.graphite.port}")
    private int port;

    @Bean
    public MetricsEndpointMetricReader metricsEndpointMetricReader(MetricsEndpoint metricsEndpoint) {
        return new MetricsEndpointMetricReader(metricsEndpoint);
    }

    @Bean
    @ExportMetricWriter
    MetricWriter metricWriter() {
        return new GraphiteMetricWriter(this.prefix, this.host, this.port);
    }
}

Alexa Skills

Two things happened in order for me to get into writing Alexa Skills. The first was that I’d bought an Echo Dot as an interesting piece of tech I wanted to try out. The second was I’d recently started using AWS Lambda for my current role at Sky Betting & Gaming.

During that work with AWS Lambda I’d seen that it was possible to introduce an Alexa trigger.