Intro to AWS Well-Architected Framework
Building on the cloud can be complicated, confusing, and just downright difficult if you are coming at it from the on-premise world or if you don’t have any prior experience with building systems that deliver and scale software in general. It’s one thing to be able to spin up a VM and host your static website but building a highly available and scalable platform that can serve a fluctuating amount of customers around the globe while guaranteeing a certain level of performance is another kettle of fish. It doesn’t end with getting something to work, you also have to ensure its security and cost-effectiveness. A tall order to say the least.
Be it difficult as it may, even the most experienced engineers and architects base their design and recommendations on frameworks or sets of best practices that have a proven track record of not only working but bringing the highest amount of security, scalability, reliability, and efficiency to your cloud environment. In the case of AWS, the framework that is commonly referred to as the AWS Well-Architected Framework.
If you are building an environment of any size, the well-architected framework should be your guiding light. Any decisions around architecture planning are crucial to get right at the beginning.
Imagine trying to change the tires of a moving car, that’s what making architectural changes once the project is underway would feel like.
The framework is based on six pillars:
Operational Excellence
Security (The focus of this series)
Reliability
Performance Efficiency
Cost Optimization
Sustainability
It’s incredibly thorough and comprehensive. Which can be a blessing and a curse. It’s a framework that aims to be able to guide and inform every environment and organization regardless of the size and use cases. Which makes it quite a thick volume of docs to get through if you were to read it from top to bottom.
Initially, I wanted to write about IAM roles and how to go about optimizing them to make sure they are as compliant as possible for what you are doing in your organization, but then I realized that it might be useful to take a step back and explore the context in which IAM roles exist. By exploring the Security pillar of the AWS Well-architected framework we can potentially shed some light on what we can do as developers to ensure the security of our workloads and cloud account without having to become solutions architects.
There are some instruction manuals such as the ones you get when you buy a bed from IKEA which are incredibly simple and straightforward. If only all instruction manuals and building frameworks for the cloud were the same. The difference is that the IKEA instructions have one use case and the individual jigsaw pieces of Scandinavian furniture only lock into place one way and generally serve just a single purpose. Therefore the steps are generally spread across 2 to 3 pages max, with huge cartoon graphics. This makes sense because there is only one way to assemble the bed correctly.
Building on the cloud is different though since there are many variations, architecture designs, patterns, and diagrams you can choose to best support what you are trying to build. So we are never going to have a 3-page leaflet that is going to give us all the answers.
It would be safe to say that the AWS Well-architected framework is the closest thing we have to an instructions manual. A vast compilation of the best practices for building modern systems in the cloud. What we are going to try to do in this series of articles is uncomplicate this mammoth framework into the most digestible and useful sets of information specifically for startups and for the 99% of developers who might not want to use multiple accounts or have the resources or ability to access the guidance of seasoned AWS experts.
Security in the cloud is composed of six main areas:
Foundations (The focus of this post)
Identity and access management
Detection
Infrastructure protection
Data protection
Incident response
We will be exploring the "foundations" area of the security pillar. Once a solid foundation is set we can then branch out into more in-depth topics.
Foundations
Shared responsibility
Building in the cloud has a wide array of benefits, you have so many options to choose from and seemingly endless amounts of paths to explore. Arguably though, one of the main benefits of building in the cloud over on-premise is that there were things that might have been your problem in the past but now they are on somebody else's plate. Many responsibilities don’t fall on your shoulders anymore. The weight on Ops and Cloud engineers has alleviated so much that traditional sysadmin engineers that work on-premises are on average 2 inches shorter than cloud engineers, a study found. Jokes aside, by leveraging AWS you offload most tasks involved with securing, scaling, and updating infrastructure.
AWS General shared responsibility model
This shared model provides a lot of flexibility to the user around how to manage and deploy workloads and removes a lot of "toil".
The shared responsibility isn’t static though, by choosing tech stacks that leverage AWS-managed services to a higher degree such are the ones that fall into the Serverless realm we can further offload more of the annoying and cumbersome tasks to AWS.
Let’s take a look at the AWS Lambda shared responsibility model.
When using AWS Lambda, as users the only thing we have to worry about is our code and identity, and access management.
When working with a serverless stack built on AWS-managed services there is a lot to be gained by reducing your domain of action. By not having to think about server software and networking considerations we have more bandwidth to dedicate to writing great code.
Governance
Governance is the subset of guardrails we have in place that can help us support our business objectives. Through policies and control objectives, we can manage risk.
Depending on the size of your environment you might want to have an automated account vending process that will create new accounts inside an AWS Organization with a series of service control policies (SCPs) that establish guardrails of what can and can’t be done in the account. If you are at this stage a great way of automating the account vending process using IaC principles is by leveraging AWS Control Tower Account Factory for Terraform. This might be overkill for the 99% of developers or for startups.
That’s why we would skip right to the chase and apply a risk assessment matrix against our resources and let the results of that exploration inform how to govern permissions in our accounts (IAM roles and policies). By applying IAM roles and policies to the layers of user groups and resources we are embodying the same governance principle as above without having a multi-account environment. We will look at an example of this a bit further down. Coupling the risk assessment with the principle of least privilege should be what informs our governance posture.
AWS Account management
The framework recommends the separation of accounts as a best practice. By enabling an AWS organization you can set common guardrails and physically isolate workloads, permissions, and environments.
In reality though, if you are working on a small team these levels of physical account separation and management might prove to be a bit cumbersome due to the expertise needed.
That’s why I think it’s prudent if you are in a smaller environment to bring the AWS Organization mindset to the level of a single account. There are many ways you can logically and physically separate permissions, workloads, and processes.
The most important thing is to have a way to centrally manage and enforce these rules. This is possible to do even within a single account.
You might want to create narrow and precise IAM user groups and assign them IAM roles that follow the least privilege principle. Consider having multiple VPCs for different environments. If you are working with Kubernetes you can work with namespaces to ensure no resource overlap.
Additionally, you might even want to add to configure a cluster add-on like OPA Gatekeeper which can whitelist certain images in particular namespaces, that way ensuring that nothing is deployed where it is not supposed to.
Even if you don’t have a sophisticated account vending process that allows you to easily spin up and down AWS accounts. You can ensure a high level of safety by doing things like:
Having just one or two IAM admins
Not using the Root account
Not assigning roles directly to users but adding users to user groups
Enable 2 MFA for all users
Operating workloads securely
When it comes to operating our workloads securely we want to:
Identify and prioritize risk using a customized threat model.
Use automation as much as possible.
Stay up to date with the latest security risks and recommendations.
Derive control objectives from a custom threat model
Risk assessment and compliance is a huge undertaking and the large ecosystem of auditing tools and experts out there can vouch for that. If your aim isn’t to achieve a compliance certification we can cut out the majority of the fat and focus on simplified models that can give us clear insight and help us derive actionable control objectives and mitigation steps. It need not be a complex process.
We can use simple risk assessment tables to be aware of the level of risk associated with different, resources, assets, components, use cases, actors, and entry points among others. Let’s take a look at the example below which is assessing the access risk of a subset of business-critical resources.
In the sheet, we input the main resources in our environment and calculate the impact and likelihood of unauthorized READ and WRITE access to each of them. The level of risk is calculated as (impact x likelihood). You can then choose the threshold of severity to assign a certain action to. The great thing is that if a risk is discovered to be under an actionable threshold, we put it out of our minds.
In the example above with the maximum amount of risk being 25, anything over 15 is considered high risk and should restrict admin access to just the SRE team. Assigning this action at 15 is subjective and you can determine which is the best threshold for your environment.
The example above is for calculating the access risk to the resources in the environment.
Swap out the resources and perform the same scrutiny on components, use cases, actors, and entry points to be fully covered. For each of the threats associated with each area, identify mitigations and review the processes on a routine basis that makes sense for your environment.
Automate as much as possible
AWS has a fair amount of managed services that are going to make your security automation tasks a lot easier. Let’s take a look at some of them:
AWS GuardDuty: it will scan VPC flow logs and CloudWatch logs to find security vulnerabilities or anomalies. It has a 30-day free trial.
AWS Config: a service that can alert you if specific rules are broken.
AWS CloudFormation: can help with deploying compliant stacks. You can make sure that a stack is compliant by using AWS CloudFormation guard.
AWS Systems Manager: you can centralize access to VMs through the service and automate security patching using the tool also.
You can actually use a preconfigured CloudFormation stack to roll out a series of security-centric detective controls AWS managed services that could be a great starting point. The cost associated with the deployed services will be less than 2$ per month.
Stay up to date on the latest security risks and recommendations
If you are not aware of the ever-changing security landscape you might rapidly and unknowingly become vulnerable to attack. It should be one of your highest priorities to actively stay up to date with the latest security practices and recommendations. Follow the AWS Security bulletins and the What’s new in AWS blog.
Conclusion
Getting security done right, similar to other undertakings in the cloud is difficult because there is no one size fits all. Depending on the nature of your environment and the workloads you build in your account you will have to follow a series of steps, you are exposed to certain types of risks from an ever-changing landscape of possible entry points. The fact that your architecture is so uniquely composable demands careful and custom security considerations. With a clear understanding of the common security foundations, you will position yourself well to be able to detect, react and respond to security risks as they appear.
By building a threat model that is custom to your environment you will be able to directly focus actions that will be effective at protecting your environment. Use AWS-managed services so you can shift over more of the burden of responsibility for certain security aspects and you can also benefit from the easy automation that tools like Systems Manager, AWS Config, and AWS Guard duty can offer. By fitting security measures into your CI/CD pipelines and by constantly keeping up to date about the latest security risks and recommendations you can solidify your security stance without having to have a fleet of security professionals on the payroll.
In the same way, you want to lay the foundation of a house only once, the same goes for getting the foundations of your security posture right at the very beginning. Having to make changes down the line because of a breach or a mistake can be very painful.
In the next article in the series, we will be looking at Identity and Access management. We will also center part 2 around more practical examples. See you then.
Regardless if you are a Developer, DevOps, or Cloud engineer. Dealing with the cloud can be tough at times, especially on your own. If you are using Tailwarden or Komiser and want to share your thoughts doubts and insights with other cloud practitioners feel free to join our Tailwarden Discord server. Where you will find tips, community calls, and much more.