FaaS-tastic, implementing the OpenFaaS provider in 24 hours
The Cloud Security group at Cisco hosted our tenth hackathon last week. As I’m somewhat obsessed with functions and serverless these days, I thought it would be fun to spend some time trying to implement the openfaas-provider on top of our existing container platform.
Quadra
In August 2013, a group of individuals got together during a hackathon at OpenDNS and put together a really simple proof of concept: an API in front of an EC2 instance running Docker. It was so well received that a few months later, a team was formed to start building what would become the platform for engineering to deploy their services into production. Project Quadra was born. Five years later, the platform has grown to over four hundred users, supports over five thousand applications within Cisco and just recently reached the ten thousand container mark.
OpenFaaS provider
There’s a few different projects out there trying to support deploying functions on top of existing orchestration solutions, OpenFaaS offers an orchestration agnostic solution. There’s an easy interface to implement support for new providers and a few different clients that can be used to interact with the APIs.
It seemed like the best solution to try for our use-case. The goals of the hack were the following:
- give our engineers a way to deploy functions in our edge datacenters
- support established client tooling (faas-cli, serverless, terraform)
- autoscale the functions based on metrics
- do all this with minimum impact to our existing platform
The hack
Starting at 8:30am on the morning of the hackathon, I kicked things off with a simple picture. At the time, I didn’t have anyone else working on the hack with me so I imagined I would go as far as possible and enjoy the learning opportunity. Thankfully, the folks at Hashicorp that put together the nomad provider gave me a great starting point and inspiration with their diagram.
This laid out a pretty good plan of all the components that I would need to configure in order for the hack to be successful. At this point, knowing that I would need Prometheus in place to achieve some of the goals, I decided to name my project in order to recruit some folks to join my cause, because really, the name is eighty percent of what makes people want to join your team, faastastic was born. I knew that some engineers on our Monitoring and Logging team had been working with Prometheus recently, so I reached out to one of them, Brian, to convince him that he should join my team.
We split the work, got a Trello board together and started working on deploying the different components. He started off on Prometheus and the Alert Manager, and I started working on getting the Gateway and an empty provider up and running. Within about an hour of reading docs, running Makefiles and writing some TOML configuration files, the skeleton of the hackathon was up and running in Quadra. It was now time to fill in the meat of the project, this is when I realized that all the glue needed to interact with the Quadra API would be a significant effort, time to convince someone else to join the team.
With a diagram and a cool team name, it was easy to convince Dani. He and I started tackling the different handlers one at a time, working backwards from the errors we were getting back from the gateway API to only go as far as we needed to get something working. We started off with:
2018/10/03 19:03:38 Forwarded [GET] to /system/functions - [502] - 0.001234 seconds2018/10/03 19:03:38 Get http://provider.faastastic.qq.s1.usw1.opendns.com:8080/system/functions: EOF2018/10/03 19:03:41 error with upstream request to: /system/functions, Get http://provider.faastastic.qq.s1.usw1.opendns.com:8080/system/functions: EOF2018/10/03 19:03:41 Forwarded [GET] to /system/functions - [502] - 0.001733 seconds2018/10/03 19:03:41 error with upstream request to: /system/functions, Get http://provider.faastastic.qq.s1.usw1.opendns.com:8080/system/functions: EOF
It took most of the afternoon, and early evening, but by 11:00PM, we had enough working to launch a function through the OpenFaaS user interface. We mocked a few of the responses, but at least we had something working end-to-end. Brian and Dani signed off for the night, and I cranked up some Mac Quayle tunes, the Mr Robot soundtracks are fantastic albums to hack on code. I was determined to replace all the mocked code before signing off and implementing the handlers we hadn’t done yet. By about 4:30AM, I finished filling in the last provider handler and still had no idea what the demo would look like. Then, a DNS resolution problem in my provider container crept up. Every time I delete a function and launch it again with the same name, the Docker resolver pointed to the old function’s IP address. It’s way too late to troubleshoot this,mostly because I’m blankly staring at this problem and because I’m starting to think that I should deploying this into production datacenters, and figure out how use terraform to deploy my resources. I must be getting delirious, time to take a little nap.
In the morning, with some help from folks on a different hackathon team, we solve the DNS problem by overrriding the Docker provided resolv.conf
and things are starting to look up. By 10:00am, with only two hours left until the demo, the idea that we should run this in production environments in our London and Sydney datacenters continues to seem like a good plan, I mean all I need to do is to update my TOML configuration to the following:
datacenters = ["stage", "lon", syd"]
Of course, after deploying it, I realize that all the handlers that we wrote were not datacenter-aware, thankfully, the platform injects that information in the containers at runtime, which means that all I needed to do was to modify most of the handler code to support this. Easy, and a whole hour left to re-deploy everything. This will be fine… While I’m hacking the handler code, Brian was able to configure the Alert Manager to send events to the gateway, but the code we had implemented so far did not support handling the event. We added one more handler for the /system/alert
endpoint.
Demo time
A few minutes before the demo, Dani finished putting together a handful of slides and I did a dry-run with the three of us. There were a lot of moving pieces and if there’s something I’ve learned, the more complicated a demo is, the more likely things things will blow up in my face. I started out by showing the following diagram to explain all the components we deployed.
The next step was to deploy the first function, a simple hello world to the stage environment using faas-cli and show that it was actually running in Quadra. All went without a hitch, I kicked off a revolutionary tool to generate load and walked the crowd through the OpenFaaS user interface, as well as our Grafana dashboard. Within a matter of seconds, the Alert Manager picked up the increase in traffic and BOOM, my functions scaled up to four instances from the initial one, and the Grafana dashboard showed a clear picture of the trigger for the event. This was awesome, halfway done the demo.
The next step was to demo our code working in production across three datacenters, of course no one wants to deploy things to multiple DCs manually, so Dani brought in Terraform. Using Ed Wilde’s OpenFaaS Terraform provider and a sample config file Dani put together, I deployed functions to three datacenters simultaneously and then invoked each one to prove it all worked.
The outcome
In the end, we managed to deliver on each one of the goals we set out to accomplish. I hacked on a bunch of Go code and learned a ton along the way which was super fun. One of the most exciting outcomes of this, is that we managed to go all the way to production without making any changes to our existing platform. In addition, by leveraging Terraform, this enables our engineers to deploy using tools they’re already familiar with so all-in-all, a really fun hack! The only bit that’s really left to sort out is what the authentication looks like in our environment, and how we can leverage handling events other than HTTP requests* within our applications. One step at a time!
*Update: to clarify, we didnt experiment with any other triggers than HTTP requests. OpenFaaS supports other events, thanks to Alex Ellis for sharing the following link on handling different triggers with OpenFaaS.