Achieving Continuous Deployment at Moonpig
In this post I’ll describe how we achieved continuous deployment on the Moonpig website. This enabled us to go from a single release every 3 weeks, to an average of 3–4 releases per day.
This post was originally written for the Moonpig Engineering blog.
Why did we want continuous deployment?
When I joined Moonpig in 2014, releasing was a fraught process. At the time our architecture consisted mainly of two components — a web solution and a database. The release process was entirely manual, long winded and regularly required roll backs. A release to Production happened about once every 3 weeks and would typically take a whole working day to complete. It could take up to a week before we had a stable release candidate. As a company keen to adopt lean practices, it was clear that we’d need the ability to release much more easily and much more often.
Getting a plan in place
There had been a lot of talk about continuous deployment, but no concrete ideas of how to do it. Our first task was to establish a coherent plan and vision: what we wanted to achieve, and how we were going to do it.
The CTO at the time recognised the need for dedicated resource if we were to make any progress. As luck would have it, we had with us at the time an exceptionally talented contractor, Manish Kulkarni. Together we started to work out what we needed to do to make the pipe dream a reality. In formulating a plan we took a lot of inspiration from Etsy. Like Moonpig, Etsy had a monolithic solution that was very difficult to deploy. Having successfully achieved continuous deployment they documented their journey very clearly, and we took a lot of inspiration from it.
Once we had a plan in place, we shared it with the wider engineering and Ops teams to get their feedback. We then shared it with the Product team to secure their support. Up until that point there had been a lot of talk about continuous deployment, but not much action. People had become cynical. With a coherent plan in place, people started to have faith in the project once again.
In reality, once we got going the plan changed and we shuffled the priorities. This didn’t matter. The outcome remained the same, we simply found better ways to get there faster.
How we got to continuous deployment
The first step in our plan was to replace our deployment scripts with Octopus Deploy. The existing scripts were long, complex and unreliable. Simultaneously we introduced efficiencies around updating individual servers. In the past we’d take servers out of load, and deploy new code to them one at a time. When we introduced Octopus Deploy, we changed the deployment to update multiple servers simultaneously. This delivered instant improvements as the deployment time shrunk from a whole day to a couple of hours. Alongside this Octopus provided the stability we had lacked in the past.
One of the key lessons we’d learned from Etsy was the value of feature flags This would pave the way for continuous integration, and would offer us the opportunity to recover quickly from any problems introduced to the live environment.
Etsy wrote their own feature flagging framework which they open sourced. Their framework is written in Ruby, and copying their model, we wrote our own version in C#. With a framework in place, the next step was to start using it. Initially we introduced feature flags while we were still using feature branches. This allowed us to get used to the process of using and removing toggles. Continuous integration and deployment rely fundamentally on discipline from the team. Using toggles within feature branches gave teams the time to build up this discipline and to develop confidence in the toggling framework.
Much later, once we had established the value of feature toggles we recognised the need for a more sophisticated framework. We subsequently integrated Launch Darkly to handle toggling.
Continuous integration with trunk based development
With the team confidently using feature toggles, the next step was to move away from feature branches and start trunk based development. We approached this cautiously. Many team members were skeptical about trunk based development, and we all knew companies that had tried it and failed. We took a phased approach with just one team making the change initially. Once this had been achieved successfully, the rest of the teams followed suit. You can read a more detailed account of the move to trunk based in this post.
Automating load balancing
Up to this point, deployments to Production had been handled by the Ops team. This was because the taking servers in and out of load had to be done manually. A dependency on the Ops team to handle deployments made it hard to release as often as we liked, and would certainly not be practical for continuous deployment. Automating the load balancing element of the deployments enabled the engineering teams to take control of the deployments.
In parallel with the continuous deployment project, we had started breaking down our monolith and starting to build out services. As a first step in automating the load balancing we started with one of our services. We treated this as a proof of concept, and once we were confident that the approach worked, we rolled it out to all the services and our web solution. With this work completed the engineering teams now had control of all web and service deployments which instantly increased the rate at which we could deploy.
I have written previously about our technical backlog and how we manage and prioritise it. As the continuous deployment project was progressing, the engineering teams started using time allotted to reduce technical debt to building up a suite of automated tests.
Much of the code base consisted of legacy code with limited test coverage. That code was written in a way that makes it very difficult to unit test, so instead we had to build up a suite of UI tests. This is a far from ideal solution, as UI tests tend to be unreliable and costly to maintain. However, we would not be able to continuously integrate and deploy without some level of automated test coverage. A lot of effort went in to getting the coverage in place, and there is an ongoing maintenance cost. As we gradually reduce the amount of legacy code we are able to reduce the dependency on UI tests, but in the short term they do help give us confidence.
Automated deployment and restore of the databases
The final hurdle was to automate the deployment and restore of the databases across our test and Production environments. As it turned out, the greatest challenge here was human rather than technical. The DBAs were understandably nervous of the change. Having always had control of database deployments, there was a legitimate fear that without their oversight something could go wrong in Production, leaving a great mess for them to clean up.
Essentially we resolved this through discussion. When you choose to make changes, there will always be some risk. Together we identified those risks and worked out ways to mitigate them. We also agreed to carry out the changes needed in collaboration with the DBAs. This gave them complete oversight of the changes, and we were able to test them together. This gave them the confidence they needed.
As part of our workflow we had long since established a review process, wherein all proposed changes to the database were reviewed by the DBAs. A big part of their concern also stemmed from the fear that DB changes would be made on Production without them having the opportunity to review them first. Once we explained that changes of any kind, whether to code or the database, had to be reviewed, they began to have faith in the system. This was really about trust. Sharing our workflow and explaining our process started to build up this trust. As I said earlier, to successfully practice continuous deployment requires discipline. It also requires trusting one another to be disciplined. And that brings me to the final point.
Changing the culture
From start to finish it took us about 18 months to make the changes I’ve described above. Arguably it could have been done quicker, but that pace of change gave us time to make the “cultural change” that is so vital to the success of continuous integration and deployment.
As the technology changed, the teams adapted their workflow to accommodate it. They had time to understand how to successfully practice trunk based development. They had time to develop the discipline to feature toggle everything, and to use feature branches when a toggle couldn’t be applied. It’s very difficult to change culture quickly; healthy practices and processes come from teams recognising the need for them, and that recognition is often learned from making mistakes first. Teams become disciplined because they recognise the need for discipline. They develop standards and expectations and hold one another accountable for meeting those standards.
Building this discipline not only enabled us to practice continuous deployment successfully, it also helped the teams mature. This in turn allowed us to transition from Scrum to Kanban which really enabled us to exploit the potential of continuous deployment.
Reaping the rewards
To practice lean development, you need the ability to get feedback from customers fast and regularly. This is very difficult to achieve without the ability to deploy changes quickly and easily. Continuous deployment has helped the teams achieve an average cycle time of 5 days. This helps us to test changes with customers quickly and regularly. It allows us to learn fast and to iterate and adapt to meet our customers’ needs. This in turn has supported very healthy growth for the company.
Some of the key lessons I learned from this journey:
Have a plan. You will find it much easier to win support and investment if you can tell people how you are going to achieve your goals. As with any plan in an agile environment, it will change and improve, but at the start of the project you need to win confidence of those that can back it.
Conquer fear. Fear will always be a factor — there is no change without risk. Make sure everyone involved has regular input and can collaborate with you. Once the project becomes a shared undertaking there is greater commitment and willingness to succeed.
Take baby steps. Part of conquering fear involves building confidence. When you make changes, try and find a way to start small. Find a low risk case in which to prove the concept and iron out problems. Once you’re confident in the approach roll it out further.
Dedicated resource may not be vital, but you will make progress much faster with it. Continuous deployment offers enormous business benefits, so it’s well worth making the investment to achieve it.
Originally published at engineering.moonpig.com on November 21, 2017.