Intro and the attack vector
There’s been a ton of coverage of the recently discovered Capital One breach.
I’m generally very skeptical when AWS security makes the news; so far, most “breaches” have been a result of the customer implementing AWS services in an insecure manner, usually by allowing unrestricted internet access and often overriding defaults to remove safeguards (I’m looking at you, NICE and Accenture and Dow Jones!). Occasionally, a discovered “AWS vulnerability” impacts a large number of applications in AWS – and it also impacts any similarly-configured applications that are not in AWS (see, for example, this PR piece…um, I mean “article” from SiliconAngle). Again, this is a lack of basic security hygiene – anyone who’s worked in IT in the last 20 years knows that you need to patch any internet-facing software before an attacker finds it (and, incidentally, the time you have until a vulnerability gets found and exploited is continuously getting smaller, so you better find a way to automate that – but that’s another discussion for another post).
When I looked at the Capital One breach, I immediately assumed it would fit into one of those categories, but instead it looks like we finally have an honest-to-goodness AWS-specific hack. Furthermore, from what I can tell, it was the result of a customer trying to follow best practices.
Although I didn’t have a chance to look at the exploit before it was taken down, we can get some idea of how it worked from the text of the complaint (primarily by reading between the lines of the agent’s description of the attacker’s deployment). I’ll go into tech detail in another post, but the short version is that the attacker found a way – almost certainly through some misconfigured 3rd-party software – to get temporary AWS credentials from an EC2 instance’s metadata. The temporary credentials gave the attacker access to an S3 bucket that contained sensitive data, which she then posted online.
Notice that I didn’t write “a misconfigured EC2 instance” above; the EC2 configuration (called an “associated EC2 IAM Role”) is a recommended practice when developing applications for AWS. This is, unfortunately, an increasingly common issue with security-oriented tools and best practices; having them in place but not using them correctly (or, in the case of this attack, using them almost correctly) can sometimes be even worse than not using them at all. This is particularly heartbreaking to see as a security professional – they tried to do this correctly, but it completely backfired and opened a backdoor.
I will leave it to the reader to decide if Capital One should be forgiven for this; my personal opinion is that they have enough money and resources for a detailed security review, particularly for applications that will be collecting sensitive information from people. A cursory security review would probably have passed, but a deep dive probably would have revealed the underlying vulnerabilities (or at least reduced or eliminated the impact).
How the attack worked
A bit of background: as I stated before, the first step in the attack was to obtain the credentials for an IAM role. I won’t go into a deep dive into IAM roles – the short version is that EC2 has the ability to associate a server (“instance” in AWS parlance) with a set of permissions. In order to actually use those permissions, a user or application needs to request a special URL via HTTP that can only be accessed from the instance itself – the HTTP response will include temporary AWS credentials that grant those permissions (see here for AWS’s documentation on the process).
Overall, this is a pretty good setup and far superior to embedding static credentials directly in your code base – I have personally seen two very high-impact incidents in which someone created an IAM user and accidentally committed the credentials to a public GH repo, which is exactly what this is meant to protect against. However, it does create an interesting threat vector that otherwise wouldn’t exist, which is an EC2-specific variant of a Server-Side Request Forgery (SSRF) attack.
Let me give a simplified example of the SSRF attack; let’s imagine you hobbled together a widget for your site that’s meant to slurp in external content and make it fit in with the theme of your site. An attacker, inspecting your site, spots a call to “/widgets/format-external-content.php?external-site=[some_https_string]”. Just for grins, the attacker tries manually pasting that content into a browser, but replaces [some_https_string] with the IAM metadata string from the AWS documentation. Your plugin, not knowing what to do with the JSON response, just spits it out more or less unchanged and just like that, your credentials have been captured.
In page 6 of the complaint, the agent writes that the gist contained the IP address of a server and three commands, the first of which returned the credentials for “an account known as *****-WAF-Role” – a pretty strong indicator that the attack retrieved EC2 IAM credentials. Since we don’t have the original exploit, we don’t know exactly how it worked, but my money’s on some variation of the SSRF method described above.
Once the attacker had the credentials, it was pretty much game over; the credentials could be used to get a list of buckets and objects and then pick and choose which of those the attacker wanted.
This particular flavor of attack has been on my mind for years now and I’m surprised that it’s taken so long to surface. I’d also like to point out the irony here:
This attack was made possible by engineers following an established best practice (use an associated IAM role for AWS credentials). Had they been using a non-recommended method like a config file, the application could not have been compromised this way. The name of the role ends in “WAF-Role”; combined with the fact that the attacker referenced an IP address, this indicates that the compromised server was a Web Application Firewall (WAF), or possibly that the creators of the application just made a single role called “WAF-Role” and applied it to everything. Given that a WAF’s job is to filter out malicious requests, it’s especially ironic that it was the attack vector.
[Note 1: if anyone happens to have any of the contents of the original gist then I’d love to get a look at it to confirm these guesses – until then I’m going to draw my conclusions from the text of the complaint]
[Note 2: after writing this I saw a Krebs post in which the author claims to have some insider information that backs up my guesses and confirms that the WAF was itself the attack vector – always nice when an educated guess turns out to be correct]
Prevention
Finally, I’ll describe some measures that Capital One could have taken to prevent this kind of attack. However, before I do that, I do want to point out something in defense of Capital One; on the surface of it, this application probably looked secure. I don’t have any way to test most of these, but I’m going to guess they did the following:
- Only allowed required ports for their application, both internally and externally
- Enforced HTTPS on connections from the Internet
- Enabled automatic encryption of objects in the S3 buckets and EBS Volumes
- Used associated IAM roles rather than static credentials (*)
- Enabled CloudTrail (*)
- Implemented a Web Application Firewall (WAF) (*)
(*) – The complaint states or strongly implies that this was implemented
Honestly, this puts Capital One ahead of many other implementations I’ve seen. If Capital One followed a security review checklist (and I’m guessing they did), this application ticked all the boxes.
With that qualifier out of the way, here are some relatively easy additional steps Capital One could have taken to avoid this issue:
Easy step 1: practice least privilege in IAM Roles:
Simply put, don’t give any more permissions to an application than it needs. If this server was only functioning as a WAF (and not, for example, also as an application server) then it probably didn’t need any S3 access except perhaps to back up and restore its configuration. It definitely didn’t need the ability to list the S3 buckets owned by the account, and it probably didn’t need the ability to list anything at all. Had Capital One simply denied any “s3:List*” API access in the policy, the attacker could have full read and write privileges but still be effectively blind. A better approach still would be to only allow those S3 API calls required to the resources it explicitly needed. As it is, the high level of access implied that the Role simply had list and read privileges for all S3 objects and buckets.
Easy step 2: limit S3 access to sensitive data to the local VPC:
S3 bucket policies provide the ability to restrict access to just the local network in AWS – this means that requests from the Internet will be denied, so even if the attacker had the credentials she wouldn’t be able to do anything with them.
I’m a little hesitant to put this here; if the attacker was already able to get the IAM credentials, then theoretically she should have been able to craft HTTP requests to do her misdeeds through the EC2 instance, so adding this step would have slowed her down but might not have stopped her. In general, though, it’s another form of least privilege that absolutely should be exercised.
Easy step 3: use separate KMS keys in S3 for different projects:
AWS generally offers two choices for encryption of resources: AES-256 or KMS. These names are a bit of a misnomer – it’s really a choice between using a master key shared with all of AWS, or using a master key managed by the individual AWS account. The AWS-managed key effectively doesn’t have any access control on it, so even though the data is encrypted at rest (thereby checking the relevant compliance boxes), it’s not preventing anyone else with an AWS account from reading it. A customer-managed key, on the other hand, has a default “deny” policy, and much like S3 itself requires both the key and the requester’s IAM policy to allow access. The result of using KMS encryption in S3 is that even if credentials are breached with an overly generous S3 policy, any data encrypted by KMS is still safe unless that policy also enables decrypting data with that key.
The three suggestions above are relatively easy to implement and can easily be added to security checklists for projects with sensitive data. Although none of them would have stopped the attack, they would have greatly reduced the impact (referred to as the “blast radius” in security parlance).
There are also some general steps that can be taken to cover multiple projects, which should have been standard practice for a bank the size of Capital One:
Shared step 1: monitor API requests
This really should have already been implemented: any AWS API access from a known anonymizer VPN or TOR exit IP should raise some alarms. CloudTrail provides full logging of pretty much all S3 API calls (which is how Capital One was able to give the FBI such detailed forensic data later on), and there are plenty of tools that can scour through the logs and search for any successful API requests from any suspicious IP. Honestly, it’s a little disconcerting that Capital One didn’t catch this attack from CloudTrail logs.
Checking for suspicious IPs in CloudTrail is the tip of the iceberg and pretty easy to implement – an advanced DevSecOps team should also be looking for irregularities: why is this IAM role that typically hits the API once every few days suddenly mass downloading? Why are we seeing tons of new requests from this new IP that doesn’t belong to us or to AWS? These take time, money, and engineering brainpower, but Capital One should have plenty of all three.
Shared step 2: filter out the metadata IP address with a WAF
All EC2 metadata (including IAM role credentials) is accessed by an HTTP call to the IP address “169.254.169.254”. I can’t think of any conceivable reason to have this IP address as part of your request body or post payload; therefore, any request that includes it should probably get dropped. You can use the AWS WAF to create a role like this or add it to your own WAF (although if the WAF itself was the attack vector, that might not have saved it).
All of the above suggestions are far from the full list of precautions that Capital One could have (and should have) taken to avoid this, but implementing any one of them would have either prevented the attack or at least alerted Capital One at the time of the attack.
Besides taking the above recommended steps as general practice, a project that’s going to be collecting sensitive data and be exposed to the Internet should generally have a detailed security review as a best practice. A security architect would have asked questions like “how are we implementing least privileges?” and “For each of these components, how are we limiting the fallout if they’re compromised?” Security architects aren’t inexpensive, but they’re cheaper than a lawsuit.
For the reader: if you got some value out of this, please let me know about it in the comments. If you’d like to further discuss your security posture in AWS, feel free to reach out to me or contact us through our website. For those in South Florida, I’m going to be presenting a talk based on this article on August 22, 2019 at Venture Cafe Miami – hope to see you there!