DIY StackSets with Step Functions

Why Recreate StackSets

Currently in order to leverage StackSets in any way you need fairly substantial permissions in the target account, as per the documentation, the minimum permissions to operate StackSets are:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": 
               [
                 "cloudformation:*",
                 "s3:*",
                 "sns:*"
               ],
            "Resource": "*"
        }
      ]
}

Having full S3, CloudFormation and SNS access cross account into production does feel like something we want to avoid. But StackSets give us a highly efficient and useful way to deploy stacks across large swathes of the estate, how much effort is it to build something that gives us some of the same functionality with much more restricted permissions.

Step Functions to the Rescue?

With the new Step Functions releases:

  1. Step Functions can now be invoked diretly from CodePipeline
  2. Step Functions have been added to AWS SAM

They seems like a good reason to use step functions to build StackSets-lite, so here we go…

The End Goal

By the end of the post we should have:

  • A DIY StackSet Proof of Concept consisting of:
    1. A pipeline that deploys and executes a step function
    2. A step function that deploys a template out to all accounts in an organization
    3. Cross accounts roles that stay true to the principle of least privilege

Prerequisites

  1. An AWS organization set up
  2. The AWS CLI installed
  3. Git installed
  4. A GitHub account

Pipeline Set Up

The pipeline we have here is currently predicated on the source code being in GitHub, so let’s set that up:

  1. Import the example repo in GitHub
    1. Go to https://github.com/new/import
    2. Enter the url https://github.com/JoshArmi/sam-pipeline
    3. Name the repository
    4. Begin the import
  2. Create an OAuth token for the pipeline
    1. Go to https://github.com/settings/tokens
    2. Click Personal access tokens
    3. Click Generate new token
    4. Give it full repo and full admin:repo_hook permissions
    5. Generate the token
    6. Note the token somewhere
  3. Once the repo successfully imports, make sure there are 7 remote branches
  4. Clone the imported repository locally and checkout branch main
  5. Run git merge origin/part1
  6. Update the values in pipeline-parameters.json
  7. Update the values in seed-parameters.json
  8. Add, Commit and Push the changes in pipeline-parameters.json and seed-parameters.json
  9. Assume a role in the pipeline account
  10. Push your GitHub token into the AWS account
    1. Run aws cloudformation create-stack --stack-name GitHubOAuthSecret --template-body file://secret.yaml
    2. Run aws secretsmanager put-secret-value --secret-id GitHubOAuthToken --secret-string "{\"token\":\"YOUR_GITHUB_TOKEN\"}"
  11. Run aws cloudformation create-stack --stack-name Pipeline --template-body file://pipeline.yaml --parameters file://seed-parameters.json --capabilities CAPABILITY_NAMED_IAM

Pit Stop

OK let’s review where we’re at currently:

  • If you look in the console we should have a pipeline that looks like this:

Stage 1 Pipeline

Now as this pipeline redeploys itself, we can work through the rest of the post by constantly pushing to our repository.

Adding The Step Function

First up we’re going to deploy the world’s simplest step function, then we’ll look at how to automatically execute the step function after deploying it.

  1. Run git merge origin/part2
  2. Run git push

Let’s look at what changes we’ve made:

  • The pipeline now has a deployment stage executing the buildspec.yaml file
  • The buildspec deploys the SAM app contained in template.yaml
  • The SAM app creates a step function per the ASL file with a single task
  • The single task is a new Lambda function which just exits

Once the pipeline finishes executing it should look something like:

Stage 2 Pipeline

Now the pipeline should successfully deploy the step function and now we can add an execution stage by:

  1. Run git merge origin/part3
  2. Run git push

Now when we look at the pipeline we should see four stages, with the last being a successful execution of the step function.

Recap of where we are

So we have a pipeline that deploys and executes a step function, which is a placeholder. Which looks something like:

Stage 3 Pipeline

We need to make the step function do something useful.

Stepping Into Something Useful

Now we’ll bring in a step function that actually purports to do something, so:

  1. Run git merge origin/part4
  2. Update the parameter defaults of template.yaml
  3. Add, Commit and Push the changes
  4. Accept the email confirming the SNS subscription

Looking at the Lambda function code, we can see that currently the functions do not have sufficient permissions to undertake their tasks.

So let’s look at fixing that:

  1. Run git merge origin/part5
  2. Assume a role in your billing account
    1. Run aws cloudformation deploy --stack-name AccountLister --template-file master-role.yaml --capabilities CAPABILITY_NAMED_IAM
    2. Run aws cloudformation deploy --stack-name CrossAccountDeploy --template-file client-role.yaml --capabilities CAPABILITY_NAMED_IAM
  3. For each other account in your organisation, including the pipeline account:
    1. Run aws cloudformation deploy --stack-name CrossAccountDeploy --template-file client-role.yaml --capabilities CAPABILITY_NAMED_IAM
  4. Run git push

We’ve now deployed with the last push:

  • A role in the billing account allowing us to look up all accounts under the organisation
  • A role in every account that allows us to deploy tagged CloudFormation stacks and S3 buckets
  • Another pipeline extension that pushes templates/template.yaml into an S3 bucket to deploy into the accounts
  • We have deployed a bucket into every account in the organization as per ./templates/template.yaml
  • Failed deployments trigger an email being sent to a preset email address

Success?

If you go and look in all accounts where you deployed the CrossDeployRole, you should now have a CloudFormation stack called Bucket.

With the goal being to replace StackSets at this point we have a Proof of Concept that shows that we can build a system for pushing CloudFormation out without needing the same level of permissions as StackSets.

However the code as it stands only covers the happy path when it comes to handling CloudFormation stacks, we can extend the code but the amount of complexity we have to handle starts to increase. Currently we’re only accounting for the 3 most common out of about 25 possible CloudFormation states.

Step functions do provide a visual way to understand the complexity, and it does seem achievable to build a deployment mechanism that is both robust and maintainable.

Cleanup

To destroy all resources created:

  1. In the pipeline account
    1. Delete the Pipeline stack
    2. Delete the sam-app stack
    3. Delete the CrossAccountDeploy stack
    4. Delete the bucket created by the Bucket stack
    5. Delete the Bucket stack
  2. In the billing account
    1. Delete the AccountLister stack
    2. Delete the CrossAccountDeploy stack
    3. Delete the bucket created by the Bucket stack
    4. Delete the Bucket stack
  3. In all other accounts
    1. Delete the CrossAccountDeploy stack
    2. Delete the bucket created by the Bucket stack
    3. Delete the Bucket stack

Next Steps

There are a variety of options for the next system extension, including:

  • Allowing for deploying to stages of accounts to minimise blast radius
  • Handling more CloudFormation stack states
  • Collapsing errors down to one summary email
  • Emailing a summary of all successful deployments

Thoughts?

Reach out to me on Twitter or LinkedIn, I’d love to hear what people’s opinions of StackSets are, and whether they’ve journeyed down a similar path.