I’ve spent a long time around various build and deploy systems.
One thing they all have in common is they tend to collect permissions to do things. This can be a sizable problem, especially when these systems are shared between multiple areas of an organisation.
This is more often the case than not, since deployment systems are complicated to manage well and usually, a shared service team somewhere ends up doing it. Probably multiple shared services teams if it is a particularly large organisation.
I’m sure anyone reading this that has managed Jenkins and used it to manage AWS infrastructure or deployments has at some point thought to themselves - “Hmm, this Jenkins agent has quite the impressive IAM profile - it could destroy a lot of stuff if a bad actor got control of it…”.
In other words, blast radius becomes a problem.
So how to fix this?
As with anything in computers there are a million different ways i am sure - but here is my approach using GitLab, HashiCorp Vault and Keycloak together. This approach results in the CI workers themselves having zero permissions to AWS or any other infrastructure by default.
Vault has various secrets backends that can be driven by this pattern, so this approach is not limited to AWS.
I myself also make use of the Nomad backend to drive zero-trust deployments from GitLab into a container orchestrator.
Vault has secrets backends for other potential deployment targets such as Kubernetes, various databases, Azure, Google Cloud and more.
How does it work?
This is achieved by using the GitLab JWT issuer feature, combined with the HashiCorp Vault JWT authentication backend. Vault and GitLab authentication are integrated with Keycloak using OpenID Connect so that identity is consistent across the platform.
Configuration in GitLab itself is simple. It looks like so:
Additionally there are some general GitLab settings outside of the authentication provider setup. This delegates all authentication in GitLab to a Keycloak instance, along with telling GitLab to sync user profile information from Keycloak:
Once those settings are added restart GitLab and it is good to go.
GitLab, by default runs an integrated JWT issuer. This is extremely useful as far as authentication for CI jobs go, as the JWTs issued by GitLab can be used to authenticte with Vault.
GitLab will issue a JWT for every CI pipeline/job that runs, using the identity of the user that triggered the job via merge request/branch push or whatever trigger is configured. These JWTs are valid for the lifetime of the job, so are short-lived.
Those JWTs get encoded with information about the CI job, which can be used in Vault policy to decide if access should be allowed. The GitLab JWTs look like this:
This provides a lot of useful information for making decisions within Vault policy, as it includes important information such as the GitLab project group and path, environment, as well as crucially, the GitLab users identification from the OIDC provider.
The Vault side of things is also simple to configure.
We need an authentication backend, and for the purposes of proving the principal a couple of Vault roles to request from within a CI/CD pipeline in GitLab.
I already have the AWS backend configured, which gives Vault the ability to issue STS tokens for configured IAM roles. I also have the Nomad backend configured, which gives Vault the ability to issue Nomad tokens tied to specific Nomad ACL policies.
Configuration of Vault is simplest with Terraform. First add the authentication backend itself:
Add a Vault role for GitLab to use to authenticate itself.
This role will allow any GitLab project within the namespace your/group/* to authenticate against the role using the generated JWT for the job. As you can see, we could check various GitLab JWT bound_claims here.
And the policy
The policy file contents itself. This example is somewhat contrived since in a real environment you’d have different backend roles:
Thats it. GitLab now has permission to request secrets from Vault in specific jobs for specific purposes!
Firstly, control groups could be used to pause a CD pipeline deployment in production until a human confirms access is allowed via the use of a control group policy.
Secondly, Sentinal can be used to drive fine-grained descisions on policy using the GitLab JWT token information. For example, a conditional requirement on access that says production requirements require the control group flow described above, whereas in development environments it does not.
So, putting all this together results in a GitLab CI Pipeline. Here is an example I am using to manage deployments into AWS:
This is a fairly standard CI/CD pipeline for AWS.
On a push to a branch, the pipeline will run terraform validate.
For every merge request raised, it will run terraform plan and attach the results of that plan to the merge request as a comment.
I have the GitLab repository itself configured to only allow a merge after all comments are resolved and the CI pipeline has been sucessful.
Once the MR is merged, terraform apply is run to apply the plan. This is all fine and ususal but the interesting part is in these 2 lines:
First, this authenticates the CI job with Vault, and stores the returned Vault token in an environment variable. This token is tied to specific Vault policies by the role configued in Vault.
Next, the pipline requests temporary AWS credentials from Vault to allow runnning Terraform against the AWS infrastructure. These AWS credentials are requested using the vault token retrieved in the previous step.
The pipline can now complete using the generated temporary credentials, and the GitLab runner itself has no inherant permission to AWS.
As mentioned above, this can be used for many deployment targets from within a GitLab pipeline. For example, the same for Nomad: