Observability: my first steps towards reducing uncertainty

Image for post
Image for post
If you’ve worked in infrastructure, you know the feeling. All the credit for this art goes to ashleymcnamara

Alright, let’s talk observability. There’s been a lot of noise around observability recently, or o11y if you prefer. As systems continue to become increasingly distributed, the biggest challenge in troubleshooting issues becomes one of being able to effectively cut through all the components of the system without losing key information about the application along the way. From the moment a request comes in, to the time the response has reached the end user, being able to trace entire requests is critical in identifying and understanding problems. Observability is an important part of the strategy needed to keep things running. I’ve been aching to dig further into the topic for some time now, and decided I would use this as an opportunity to learn about a bunch of different things along the way.

Much has been written about serverless code. Writing applications without having to manage the underlying host, or even a container seems like a developer’s dream. It lets developers focus on the business logic, and not on the nuts and bolts of publishing that code into production…right? When I first read about serverless on martinfowler.com, I couldn’t quite wrap my head around the concept, clearly the hard work of running the code exists somewhere. There are other trade-offs that make serverless complex in different ways. At the same time, the distributed nature of functions further exacerbates the problem of tracing distributed systems. Gone are the days of having a monoliths and peppering printf("XXXXX made it here too") statements everywhere in hope of troubleshooting a problem with your code. Although many serverless tutorials have tried to convince me otherwise, writing code in a web browser just isn’t for me. I downloaded and installed the serverless framework and created my project.

brew install serverless
serverless create -t aws-go-dep -p helloworld

Next step, find a cloud provider to deploy my serverless code to. I know Google Cloud Platform and Azure are cooler to use than AWS these days, but AWS has recently rolled out support for golang binaries in its Lambda platform, so I thought it would be fun to give that a test drive. I’ve used AWS for about five years, and figured I was learning enough other things without having to figure out the terminology of GCP or Azure. I wanted to start fresh here, to be able to capture the pain of rolling out my application to a brand new account without any pre-existing configuration, no dish in the oven. Once I signed up, I logged into the AWS Console, I created a developer user via the IAM interface.

Image for post
Image for post
Don’t forget to check the “Programmatic access” checkbox to generate an access key and secret

Next I installed the awscli and configured my credentials locally to make using serverless a breeze.

brew install awscli
aws configure
AWS Access Key ID [None]: XXXXXXXXXX
AWS Secret Access Key [None]: XXXXXXXXXX
Default region name [None]: us-east-1
Default output format [None]:

With my credentials setup, it’s finally time to deploy my application, or so I thought.

serverless deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless Error ---------------------------------------ServerlessError: User: arn:aws:iam::xxxxxxxxx:user/develope is not authorized to perform: cloudformation:DescribeStacks on resource: arn:aws:cloudformation:us-east-1:xxxxxxxxx:stack/helloworld-dev/*...

Here the real fun began. I spent way too much time trying to sort out the different IAM permissions I needed. I started off using the AWS Console and its Visual Editor, and eventually gave up on it. Manually configuring IAM policy via the AWS Console is just a pain. So I installed terraform.

brew install terraform

On a side note, if you haven’t heard of homebrew and are wondering “What’s this brew thing he keeps typing?”, go install it right now. Seriously, the world is a better place with a tool like brew in it. Ok, so I’ve got terraform installed and remembered that it has an import feature. I thought: “Great! that’ll save me some time, I’ll just import what I already have”. Well it turns out there’s a caveat with import, it will only load your resources in your terraform state file, it will not import those resources into your configuration file. I first tried to copy and paste the JSON policy out of the AWS Console into the policy field of an aws_iam_policy resource but ran into issues when I tried to do some variable expansion. I ended up creating a aws_iam_policy_document instead and using that resource in my aws_iam_policy definition. I then, through trial and error, worked my way through all the various permissions needed for my simple helloworld program. In the end, the aws_iam_policy_document looked something like this.

You can find the properly formatted terraform config I ended up with in the repo here. Finally, at long last, I was able to deploy and invoke my hello world.

serverless invoke -f hello
{
"statusCode": 200,
"headers": {
"Content-Type": "application/json",
"X-MyCompany-Func-Reply": "hello-handler"
},
"body": "{\"message\":\"Go Serverless v1.0! Your function executed successfully!\"}"
}

This might be one of the most complicated “hello world” setup I’ve ever put together. Now that the function was up and running, I was trying to remember what I was actually trying to accomplish here. Oh right observability. I’ve been hearing and reading lots about honeycomb.io, so I setup an account through their simple 30-day free trial signup process. I followed the getting started with go guide, got the basic code working locally and got some events into my dataset.

Image for post
Image for post

With all the building blocks into place, it was time to dive a little deeper. I modified my function to make it a bit more meaningful to troubleshoot by adding a dependency on an external API, and used honeycomb’s beeline for Go and created a wrapper for my lambda handler.

I think this is the point where my mind started exploding. Looking through the honeycomb interface I was easily able to point where my code was going, whether the paths that I thought would be executed were being executed, and all the metadata about my code was available.

Image for post
Image for post
Happy path trace
Image for post
Image for post
Error trace. Silly me, Winipeg is not a real place

So, beyond the power of tracing, having the ability to take metadata accumulated along the path of my request, and combining it in various ways allowed me to identify patterns within my application that I hadn’t really thought about early on. I mean clearly this was just a simple app that doesn’t do a heck of a lot, but in a more complex system, this ability will be tremendously helpful. I can’t even remember the number of times in my career where I’ve wished I would have this type of power at my fingertips. It can help identify deficiencies, uncover unknown unknowns and trace back events during a postmortem. Now I’m thinking about all the code running in prod that I want to re-write…

All the code and terraform configuration can be found in my GitHub repo

Written by

Passionate about the environment and making the world a better place

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store