DIY StackSets with Step Functions
Why Recreate StackSets
Currently in order to leverage StackSets in any way you need fairly substantial permissions in the target account, as per the documentation, the minimum permissions to operate StackSets are:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action":
[
"cloudformation:*",
"s3:*",
"sns:*"
],
"Resource": "*"
}
]
}
Having full S3, CloudFormation and SNS access cross account into production does feel like something we want to avoid. But StackSets give us a highly efficient and useful way to deploy stacks across large swathes of the estate, how much effort is it to build something that gives us some of the same functionality with much more restricted permissions.
Step Functions to the Rescue?
With the new Step Functions releases:
- Step Functions can now be invoked diretly from CodePipeline
- Step Functions have been added to AWS SAM
They seems like a good reason to use step functions to build StackSets-lite, so here we go…
The End Goal
By the end of the post we should have:
- A DIY StackSet Proof of Concept consisting of:
- A pipeline that deploys and executes a step function
- A step function that deploys a template out to all accounts in an organization
- Cross accounts roles that stay true to the principle of least privilege
Prerequisites
- An AWS organization set up
- The AWS CLI installed
- Git installed
- A GitHub account
Pipeline Set Up
The pipeline we have here is currently predicated on the source code being in GitHub, so let’s set that up:
- Import the example repo in GitHub
- Go to https://github.com/new/import
- Enter the url
https://github.com/JoshArmi/sam-pipeline
- Name the repository
- Begin the import
- Create an OAuth token for the pipeline
- Go to https://github.com/settings/tokens
- Click
Personal access tokens
- Click
Generate new token
- Give it full
repo
and fulladmin:repo_hook
permissions - Generate the token
- Note the token somewhere
- Once the repo successfully imports, make sure there are 7 remote branches
- Clone the imported repository locally and checkout branch
main
- Run
git merge origin/part1
- Update the values in
pipeline-parameters.json
- Update the values in
seed-parameters.json
- Add, Commit and Push the changes in
pipeline-parameters.json
andseed-parameters.json
- Assume a role in the pipeline account
- Push your GitHub token into the AWS account
- Run
aws cloudformation create-stack --stack-name GitHubOAuthSecret --template-body file://secret.yaml
- Run
aws secretsmanager put-secret-value --secret-id GitHubOAuthToken --secret-string "{\"token\":\"YOUR_GITHUB_TOKEN\"}"
- Run
- Run
aws cloudformation create-stack --stack-name Pipeline --template-body file://pipeline.yaml --parameters file://seed-parameters.json --capabilities CAPABILITY_NAMED_IAM
Pit Stop
OK let’s review where we’re at currently:
- If you look in the console we should have a pipeline that looks like this:
Now as this pipeline redeploys itself, we can work through the rest of the post by constantly pushing to our repository.
Adding The Step Function
First up we’re going to deploy the world’s simplest step function, then we’ll look at how to automatically execute the step function after deploying it.
- Run
git merge origin/part2
- Run
git push
Let’s look at what changes we’ve made:
- The pipeline now has a deployment stage executing the
buildspec.yaml
file - The buildspec deploys the SAM app contained in
template.yaml
- The SAM app creates a step function per the ASL file with a single task
- The single task is a new Lambda function which just exits
Once the pipeline finishes executing it should look something like:
Now the pipeline should successfully deploy the step function and now we can add an execution stage by:
- Run
git merge origin/part3
- Run
git push
Now when we look at the pipeline we should see four stages, with the last being a successful execution of the step function.
Recap of where we are
So we have a pipeline that deploys and executes a step function, which is a placeholder. Which looks something like:
We need to make the step function do something useful.
Stepping Into Something Useful
Now we’ll bring in a step function that actually purports to do something, so:
- Run
git merge origin/part4
- Update the parameter defaults of
template.yaml
- Add, Commit and Push the changes
- Accept the email confirming the SNS subscription
Looking at the Lambda function code, we can see that currently the functions do not have sufficient permissions to undertake their tasks.
So let’s look at fixing that:
- Run
git merge origin/part5
- Assume a role in your billing account
- Run
aws cloudformation deploy --stack-name AccountLister --template-file master-role.yaml --capabilities CAPABILITY_NAMED_IAM
- Run
aws cloudformation deploy --stack-name CrossAccountDeploy --template-file client-role.yaml --capabilities CAPABILITY_NAMED_IAM
- Run
- For each other account in your organisation, including the pipeline account:
- Run
aws cloudformation deploy --stack-name CrossAccountDeploy --template-file client-role.yaml --capabilities CAPABILITY_NAMED_IAM
- Run
- Run
git push
We’ve now deployed with the last push:
- A role in the billing account allowing us to look up all accounts under the organisation
- A role in every account that allows us to deploy tagged CloudFormation stacks and S3 buckets
- Another pipeline extension that pushes
templates/template.yaml
into an S3 bucket to deploy into the accounts - We have deployed a bucket into every account in the organization as per
./templates/template.yaml
- Failed deployments trigger an email being sent to a preset email address
Success?
If you go and look in all accounts where you deployed the CrossDeployRole, you should now have a CloudFormation stack called Bucket.
With the goal being to replace StackSets at this point we have a Proof of Concept that shows that we can build a system for pushing CloudFormation out without needing the same level of permissions as StackSets.
However the code as it stands only covers the happy path when it comes to handling CloudFormation stacks, we can extend the code but the amount of complexity we have to handle starts to increase. Currently we’re only accounting for the 3 most common out of about 25 possible CloudFormation states.
Step functions do provide a visual way to understand the complexity, and it does seem achievable to build a deployment mechanism that is both robust and maintainable.
Cleanup
To destroy all resources created:
- In the pipeline account
- Delete the Pipeline stack
- Delete the sam-app stack
- Delete the CrossAccountDeploy stack
- Delete the bucket created by the Bucket stack
- Delete the Bucket stack
- In the billing account
- Delete the AccountLister stack
- Delete the CrossAccountDeploy stack
- Delete the bucket created by the Bucket stack
- Delete the Bucket stack
- In all other accounts
- Delete the CrossAccountDeploy stack
- Delete the bucket created by the Bucket stack
- Delete the Bucket stack
Next Steps
There are a variety of options for the next system extension, including:
- Allowing for deploying to stages of accounts to minimise blast radius
- Handling more CloudFormation stack states
- Collapsing errors down to one summary email
- Emailing a summary of all successful deployments
Thoughts?
Reach out to me on Twitter or LinkedIn, I’d love to hear what people’s opinions of StackSets are, and whether they’ve journeyed down a similar path.