At Shapemaker, we’re using Google Cloud Platform (GCP) as our go-to cloud provider for all things infrastructure. Our software relies on an array of GCP services, including storage buckets for storing files and reports, Cloud SQL for handling databases, and Cloud Run services to keep our containers running smoothly. As our system grew, we found ourselves faced with a challenge that's all too familiar to many organisations: managing a rapidly expanding collection of cloud resources and environments manually.
We decided it was time to make our infrastructure more scalable. For us, that meant more consistency, control, and enhanced security. Our solution was to utilize "Infrastructure as Code" with Terraform. Why did we choose this? Let's break it down:
- Reproducible: Terraform lets us set up our infrastructure consistently across various environments. No more sweating over whether our dev and production setups match.
- Documentative: Everything we need to know about our configuration is right there in our codebase. No more digging through a maze of UI screens to find that elusive setting.
- Collaborative: Terraform makes collaboration easy. Changes can be proposed as pull requests, making it easy for our team to work together on our infrastructure.
- Version controlled: Every change we make is a part of the Git history, giving us the power to roll back to a previous setup if needed.
- Automated: Terraform automates the configuration of infrastructure, and it plays nicely with continuous integration tools like GitHub Actions, reducing the need for manual work.
How we set up our infrastructure
Initialising a project
To get started with Terraform we need a GCP project that contains a storage bucket to house our Terraform state.
The Terraform state contains the current state of our infrastructure and is used to plan what changes to make in our infrastructure based on differences between the state and our local infrastructure code. Read more about how Terraform works in their guide.
To simplify this process we put together a bash script. This script is designed to take the hassle out of project creation, ensuring that the GCP project comes pre-equipped with a storage bucket ready to house our Terraform state.
Further, we have created our own Terraform modules, which are reusable building blocks allowing us to declare resources consistently across different environments. The modules are organised so that they mostly correspond to services available in GCP:
Notably, within the iam-module we get an overview of how all the major permissions in our infrastructure are set. Handling this manually through a cloud user interface can be hard to maintain in a secure manner. With all permissions gathered in one place, we are in better shape to manage access control while our system is growing and becoming more complex.
Inside the cloud_run_services-module, we set environment variables dynamically based on other resources or secrets declared in Terraform. This practical feature eliminates the need for manual tinkering with environmental variables across environments, reducing the chance of outdated or conflicting configurations.
Consistent and modifiable environments
We currently manage two distinct environments: one for testing and another for production. Both environments are constructed using the same foundational modules. We tailor some of the configuration of each environment by providing input variables. These variables dictate aspects such as the scaling of our Cloud Run instances and the tier of our database instances. This approach allows us to maintain consistency in our infrastructure while accommodating the unique requirements of each environment. Here is how it looks:
Automation with GitHub Actions
The most commonly used commands from Terraform are terraform plan and terraform apply. Plan compares your local changes to the Terraform state and gives a summary of changes, while apply does the actual provisioning and configuration of infrastructure to your cloud provider.
We have automated the execution of these commands using GitHub Actions. We have a workflow for running terraform plan on every commit to a pull request. In this workflow we use the GitHub Actions tool GetTerminus/terraform-pr-commenter to comment the proposed plan to the pull request. This comment uses distinct colours to highlight additions, modifications, and deletions, offering reviewers a clear visual of proposed changes.
In the workflows, we authenticate to GCP with a special GitHub Actions service account (defined in the module github_actions_service_account). This service account is configured in Terraform to let our GitHub Action authenticate to GCP through Workload Identity.
Finally, when a pull request is merged, we run another workflow for applying the new terraform configuration to the different environments. This ensures that the infrastructure declared in the main branch always contains the truth of our infrastructure configuration.
The road ahead
Looking ahead, we might enhance our Terraform setup with a couple of useful tools. Tflint can help us tidy up our code and use best practices. Additionally, we are looking into static code analysis with Trivy to automatically spot any misconfigurations or security issues. And speaking of changes, Terraform recently switched its license from open source to Business Source License. It’s not affecting us directly, but it’s worth keeping an eye on.
Now, a few months into our change from “ClickOps” to Infrastructure as Code with Terraform, we are ready to meet increasing complexity and demand, without worrying about sacrificing speed for security. And what's equally noteworthy is the team's satisfaction. We have found Terraform to be a tool that is enjoyable to work with, and that is a significant win for us 🚀