March 01, 2020
We have been working on removing bottlenecks to allow our teams to move fast. Basically reducing the time to take a feature from idea to production. I want to review the journey from a large codebase released 5-10 times a week to over a dozen smaller code bases releasing ~250 times a week. This will look at the practical side of how this was achieved but also the cultural change that this brought to the company.
When we started this process we had a large (several million line) monolith codebase that was being released daily. There was one code repository that was deployed as two applications powering the mobile and desktop websites. The websites were released after a short manual regression usually daily. The regression testing would focus on core areas and changes highlighted by an automated change-log. After the release the site was monitored for any increase in errors.
A natural step seemed to be to move toward a micro-frontend architecture. This would allow squads to work more independently and faster on smaller more specific codebases.
To enable this move we had to add a few pieces to our architecture.
Also it was important to centralise some parts so they can be improved and evolve overtime.
Small code bases specific to areas of the website. Based on Create React App. Using the same build/deployment workflow. Using an edge proxy to route traffic to an application.
A new centralised testing framework, supporting various testing tools. Cypress, backstop, jest integration, PACT, Lighthouse, Auto-cannon. Easy to add to codebases by convention. They run during build pipelines or after deployment to environments. No manual regression testing.
More monitoring and application health visibility was required. It is much harder to monitor 20 applications than two applications. This was achieved by created overview dashboards in Kibana to show at a high level the overall health of an application. Then more detailed views and templated views for each application. We also created tools to show versioning of modules.
To improve the time to live for a code change we introduced continuous deployment for new codebases. This was possible by a good automated regression test suite. Also centralised libraries were auto upgraded during the build process to always be deploying the latest versions.
The main change was code could be released with less risk faster. This changed people’s approach to code committing and releasing. It is now normal to commit and release small changes directly to live several times a day (usually 5-6 changes per application per day). These smaller changes has reduced the number of major incidents. In general it feels like we have more incidents but with significantly less impact overall than before. These incidents are a-lot faster to resolve and most issues are found in non-live environments and never get to production.
Follow me on twitter @andyianriley
or see andyianriley @ linkedin.