Infrastructure continuous deployment with terraform and atlantis
cicd
terraform
aws
kubernetes
ecs
Atlantis
is a self-hosted golang application that listens for Terraform
pull request events via webhooks. I’ve
incorporated it in my recent engagement in CriticalStart but also I use it in my private infrastructure.
I think the idea is great for making terraform
workflow more easy for infrastructure teams.
With atlantis
every terraform
change need to go through review process. When PR is created it automatically run plan
displaying its output as a comment. Applying is also done by adding a comment. It’s highly configurable.
Using atlantis
allows to closing whole terraform
workflow on PR page!
I always had the feeling that checking out branch and running terraform
locally is a waste of time.
Now you can just look at plan
in PR, do the review and continue with other work.
Sounds good? Let’s dig it!
How atlantis works?
In a nutshell when repository webhook is triggered by according action (like create PR or comment), atlantis
receives
it and starts workflow. When workflow is finished then atlantis
comment on PR with result.
Note: This implies that your repo provider need to have access to atlantis
endpoint and vice versa.
You can use atlantis
with git providers like:
- GitHub
- GitLab
- Bitbucket
- Stash
- Azure DevOps
Note: atlantis
must work with remote terraform
state.
With custom workflows it’s possible to define exactly how
terraform
will be executed - flags, additional commands or even running terragrunt
.
I will write here just about two topics which I think are crucial to understand atlantis
- apply requirements and
checkout strategy.
Apply Requirements
Atlantis allows you to require certain conditions be satisfied before an atlantis apply command can be run:
- Approved – requires pull requests to be approved by at least one user
- Mergeable – requires pull requests to be able to be merged
If you decide to use apply requirements then be sure to understand what they mean for your git provider.
atlantis
communicate with git providers via API calls and every git provider has it’s own unique API.
Apply requirements are used to limit possibility of a failure due to human error. Doing apply where there are git conflicts is silly and if your team consist of more than one person then it would be best practice to also require PR being approved before applying.
Difference between approved condition for different providers as an example:
- GitHub – Any user with read permissions to the repo can approve a pull request
- GitLab – You can set who is allowed to approve
- Bitbucket – A user can approve their own pull request but Atlantis does not count that as an approval and requires an approval from at least one user that is not the author of the pull request
- Azure DevOps – All builtin groups include the “Contribute to pull requests” permission and can approve a pull request
Also each VCS provider has a different concept of “mergeability”, be sure to check it out in docs as well.
Checkout strategy
There are two strategies available:
- branch [default]
- merge
Both of them have some valid usage.
Branch
strategy will checkout code on latest branch commit and run terraform
. It
assumes that there was no terraform
changes on master
in the meantime. If there were changes you will get
unexpexted diff in your plan. To mitigate it you should assure that your branch is on top of them master by either
rebase
or master merge
to your branch.
Merge
strategy will create merge commit with master and run plan there.
I use branch
strategy because my repo force to be on top of the master. It saves time on failed plans.
Deployment
Webhook
For atlantis
to be functional a webhook is needed. Webhook and the git provider API are main communication channels.
In my case I did github
webhook with
CloudPosse module but for gitlab
I had to create
it manually. With atlantis
documentation it was piece of cake:
If you’re using GitLab, navigate to your project’s home page in GitLab
- Click Settings > Integrations in the sidebar
- set URL to http://$URL/events (or https://$URL/events if you’re using SSL) where $URL is where Atlantis is hosted. Be sure to add /events
- double-check you added /events to the end of your URL.
- set Secret Token to the Webhook Secret you generated previously
– NOTE If you’re adding a webhook to multiple repositories, each repository will need to use the same secret.- check the boxes
– Push events
– Comments
– Merge Request events
– leave Enable SSL verification checked- click Add webhook
Deployment on ECS
Setup in CriticalStart was based on atlantis provider which consist of such resources:
- Virtual Private Cloud (VPC)
- SSL certificate using Amazon Certificate Manager (ACM)
- Application Load Balancer (ALB)
- Domain name using AWS Route53 which points to ALB
- AWS Elastic Cloud Service (ECS) and AWS Fargate running Atlantis Docker image
- AWS Parameter Store to keep secrets and access them in ECS task natively
Main resource here would be ECS Fargate task running atlantis
which is exposed by ALB.
There is a bit overhead to create additional VPC and ALB and so on but it gives better isolation, which at the end of the day gives us more secure environment.
Enabling it with terraform
:
Additionally we used CloudPosse module to create github webhook for atlantis
Deployment on Kubernetes
In my private k8s
cluster I use helm
to deploy atlantis
. It has it’s own namespace where only atlantis
is deployed.
Here is helm
configuration:
To prevent commiting credentials to repository where helm configuration is we need to create two secrets:
vcsSecretName
containsgitlab_secret
andgitlab_token
awsSecretName
contains AWS CLI credentials file.
To keep such secrets in repository I use sealed secrets, which basically create encrypted Secret in SealedSecret resource. Such resources can only be encrypted with access to k8s so it’s safe to have them even in public repository although I would advice against that.
Deploying process:
Security
Securing atlantis
is complicated topic as there are multiple ways to exploit it.
- Communication
atlantis
<->git provider
should use secure channel - preciselyatlantis
shouldn’t be exposed as HTTP, webhooks should hit HTTPS endpoints. - Webhooks should have webhook secret so it’s possible to validate it and reject if it’s not legitimate.
- Restrict access to
atlantis
application - people should not have permission to exec a shell and run commands. - Provide
atlantis
credentials (API keys, ssh credentials, etc) to appliacation in secure way. - Restrict access to repository which triggers
atlantis
just to people that should do infrastructure changes, anyone with access to repository can possibly exploitatlantis
. - Use
--repo-whitelist
flag to define which repositories can triggeratlantis
. - Watch out for PRs created by
forks
. It’s possible to triggeratlantis
with them. - Run
atlantis
in isolated environment - separate k8s namespace, different ECS cluster, dedicated EC2 instance, etc - Allow communication to
atlantis
only from specificgit provider
IPs.
Conclusions
I love it!
Working with terraform
has never been easier for me.
It needs a lot of love in the beginning to get everything right but after that it’s smooth ride.
Main issue for me is security. A lot of privileges for atlantis
means blast radius is gigantic.
Anyone that can access atlantis
app, send webhooks
, comment in PR, etc can break whole infrastructure to which
atlantis
have access.
The risk is not just internal, atlantis
is exposed so additionally it’s another large surface attack vector.
At the end of the day if you can “afford” atlantis
it’s definitely worth the effort!