Those businesses that do not regularly monitor and audit their cloud configuration are destined to make their public cloud infrastructure too public.
The breaches we know about back that up.
Uber was hacked and over 57,000,000 customer records were stolen, because hackers were able to obtain developer credentials from a GitHub source code repository and then gain unauthorized access to Uber’s cloud infrastructure and data remotely as an authorized user.
A cloud misconfiguration caused about four million Time Warner Cable customers to have their personal information exposed to the Internet. An outside contractor accidentally misconfigured cloud storage and made it publicly accessible, exposing Time Warner’s customer SQL database files to the world.
Even companies focusing on managing the cloud are not immune. Accenture, a corporate consulting and management firm, accidentally misconfigured their cloud storage exposing hundreds of gigabytes of data, including thousands of passwords, many of which were stored in plain text. The buckets also contained Accenture’s private signing keys, and data necessary for visibility into and maintenance of Accenture's cloud stores.
The list goes on. Cloud SSO vendor OneLogin was breached when attackers got access to the API keys of their cloud infrastructure, ultimately allowing them to gain access to their databases and encryption keys. Reports show that threat actors had the means to decrypt the sensitive data in OneLogin’s database; one wonders if that includes passwords to all the services they allow single sign on to.
Hey, even managed service providers can make mistakes. DXC Technologies accidentally committed their private API keys to their cloud service to GitHub, allowing attackers to connect to their public cloud infrastructure, spin up hundreds of virtual machines to use nefariously, ultimately incurring cloud usage costs of over $64,000 in just a few days. Think about the additional cost of the manpower needed to rectify the breach, clean up all the unauthorized ephemeral resources running, revoke the private keys and ultimately audit access to the rest of the cloud infrastructure.
The news is regularly reporting on how cloud misconfiguration is putting our deployments and data at risk. Cloud Configuration Management is vital to help battle against these risks. And from the data from our own AuditWolf platform, we see more that 70% of the cloud-enabled customers we evaluated had at least one misconfiguration that potentially impacts their business.
In an attempt to demystify cloud configuration management and make monitoring and auditing actionable to your business, we wrote a comprehensive guide that ranges from cloud configuration management definitions all the way to tools and tricks to make sure you’re gathering data correctly and actually using it to better protect yourcloud environments.
Here is what we’ll cover:
Without further ado, let’s get into the guide. Feel free to read it from start to finish or skip around to sections that are most applicable to you.
When it comes to public cloud computing, we live with a shared responsibility model. We cannot abdicate responsibility of management and security to cloud vendors like Microsoft, Amazon or Google. Ultimately, we as the customer are the custodians of the data and applications, and how they are accessed. We must take responsibility for how we want our cloud environments to work.
A simple definition...
Cloud Configuration Management is the process for establishing and maintaining consistency of a cloud resource’s settings, performance, and functional characteristics throughout its lifecycle.
In practice, this means we must understand how cloud resources are setup and accessed. Who the administrative custodians are that manage and maintain it, and who the users and applications are of those deployments and its data. And we need to know when something changes. That is what we call monitoring configuration drift.
Configuration drift can be a real problem in a cloud environment. If you aren’t careful, it may unwillingly expose your infrastructure to unnecessary risk. With the ease in which settings and configurations can be changed in the cloud, it becomes difficult to know when it happens, and by whom. Changes can be made manually through the cloud admin portal, through DevOps automation, or programmatically through API calls and/or scripts. Which is why you need to regularly audit your cloud configurations for change.
So why audit it? In this day and age of modern application development and deployment, we surely can automate this to a given standard of expectation. Ahhh, the vision of the perfectly deployed environment. How’s that going for ya? Not that easy, is it? Let’s explore why.
These days, unless you are hiding under a rock or forced to administer that ancient VAX system in the basement, you have heard of Infrastructure as Code (IaC). It’s the concept that you can deploy infrastructure… well… ‘as code’. In other words, you can deploy your systems in a consistent and reproducible manner through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. Many tools exist for this like Chef, Puppet, Azure Resource Manager (ARM) templates, AWS CloudFormation templates and even systems like Terraform. They all do a good job to help you write, plan and create infrastructure, as code.
And then while in production, someone on the CloudOps team is asked to improve that live deployment. So they make a change to the live systems…. and now it’s NOT the same environment you deployed. Its configuration has drifted away from the original approved configuration set. It all begins benign enough. You may even have the discipline to track these changes through a documented change management process. In our experience though, usually cloud environments drift far away from the original baseline and are NOT properly tracked. Why? Because it’s far too easy to make a quick change to see if that helps matters. When tickets are piling up and customers and managers are screaming, speed to make changes to improve things ultimately becomes the Achilles’ heel. There are plenty of PowerShell scripts, one off CLI commands and simple toggles in the properties of cloud resources that it’s difficult to maintain a full audit history of all changes. Which is WHY having ways to audit for this is important. It doesn’t end there though. Let’s explore another reason why auditing is important…. the dreaded DevOps dude.
DevOps is an interesting dichotomy from CloudOps. As a software engineering culture and practice that aims at unifying software development (Dev) and software operation (Ops), DevOps may be an extremely valuable idealism for your business. But with that power comes great responsibility… one that can negatively impact your cloud configurations. It has become far too easy for developers to push application and configuration changes into production without the gated checks to ensure it does not impact production. Proof of this was when a simple typo by a software engineer that was fixing a billing system in AWS crippled a huge number of critical servers, shutting down entire server farms as he was debugging slow performance in production. Or the time Azure had a worldwide outage when a developer change referenced an expired SSL certificate which virtually killed access to all cloud storage. Hey, it happens to the best of us. Which is why auditing is so important. If we assume human errors can and will occur, watching for them can help us quickly identify and remediate issues.
People are our biggest asset, and liability...
The weakest link is always the human factor when thinking about cloud deployments and data. Accidental (or malicious) misconfiguration can greatly impact your business, and shouldn't be taken lightly. Auditing is a natural mechanism to defend against this.
Of course, the ease of change to existing cloud resources isn’t the only reason you should audit your cloud configurations. The ease to which new cloud resources can be spun up should also give Ops people pause. The ability to spin up short lived ephemeral resources is one of the powers of cloud computing. You can spin up a virtual machine or container almost instantly, do some work, and shut it down even before most CloudOps teams even know that it has occurred. That was how a large managed service provider (MSP) like DXC Technologies got hammered with a bill of over $64,000 in just a few days, when threat actors gained access to their cloud account and spun up over 244 VMs in their environment. BitCoin farming here we come! In all seriousness though, it is because of this dynamic nature of elastic cloud computing that auditing your cloud environment is important. So, let’s explore ways to do that.
Let’s begin by establishing a baseline. The Center for Internet Security (CIS) has established objective, consensus-driven security guidance for Cloud Providers for both Azure and AWS. Called the CIS Benchmark, it offers detailed guidance and a step-by-step checklist that you can use to manually evaluate many of the configuration controls for your cloud environments. For your convenience, we’ve provided you with links to download the whole Benchmark Framework here:
Of course, it doesn’t end with a checklist. (It never does). Both Microsoft and Amazon offer you the ability to spin up a securely configured, or hardened, virtual instance of many popular operating systems to perform technical tasks without investing in additional hardware and related expenses. These images meet the CIS Benchmark baseline and improve your cybersecurity defenses. Some of the common threats that can be mitigated by using a CIS Hardened Image include:
Once you have established your baseline you need to find a way to regularly audit your cloud configurations. It’s important to also establish an audit trail through logs and monitoring so you can piece together who made what changes when. If you are using the Microsoft Cloud, one of the key things you can do is turn on Azure Monitoring. You can then combine that data and audit your logs using Azure Log Analytics.
Of course, if that seems like too much work, you can always subscribe to services like AuditWolf that can maintain your audit history timeline for you. Then you simply need to click on the resource to see the audit trail, including information on who made the changes, and what delta changes were made.
Now that we’ve discussed the importance of cloud configuration management, as well as some methods for auditing it, the obvious question is, “how do we improve our cloud configuration management?”
The obvious answer is that there is no tried and true silver bullet strategy here. (Sorry.)
However, there are some guiding principles and evidence-based tactics that can get you some quick wins. Hopefully, they’ll get you started on the path to an improved cloud configuration management posture.
It all begins with least privilege to your cloud environment. Use the role-based access control (RBAC) system that your cloud vendor provides and ensure that your Identity & Access Management (IAM) efforts are regularly audited.
When possible, try not to create custom roles to meet a specific user’s needs. It becomes difficult and ends up being far too complex to audit which users have access to which resources. Instead, leverage the pre-built roles provided by your cloud vendor. As an example, Microsoft provides an exhaustive list of built-in roles for Azure. Instead of creating a custom role that allows Alice in accounting access to your subscription billing, grant her membership to the “Billing Reader” role. Trust us, it will make your life much easier down the line. Plus, you get the benefit of Microsoft’s experience in locking down and hardening access to different resources by role in their cloud platform.
Make sure administrators are forced to use two-factor authentication (2FA) to log into cloud admin accounts, and prevent regular users from even being able to get access to the cloud admin portals. Had Gentoo Linux followed this guidance they would not have had their account on GitHub compromised and their intellectual property breached when attackers were able to guess an administrator’s password and take over their account. These days 2FA, sometimes called MFA, is available for little to no charge for administrators.
There is no excuse for not making it mandatory for administrative access anymore. Most providers offer 2FA for admins for free, or as little as $1/month.
With the reduction in friction for using strong authentication through single sign-on, device trust and risk-based authentication, we recommend you consider 2FA for your regular users too.
The best way to gain visibility to configuration changes is to make sure you have the right guards in place to prevent change, and then force change management through checkpoints you can control and audit. In the Microsoft Cloud, you can use Resources Manager locks to lock a subscription, resource group or resource to prevent administrators in your organization from accidentally deleting or modifying critical resources.
Resource Manager locks apply only to operations that happen in the management plane, which consist of operations sent to
https://management.azure.com. The locks do not restrict how resources perform their own functions. This means resource changes are restricted, but resource operations are not. For example, a ReadOnly lock on a SQL Database prevents you from deleting or modifying the database, but it does not prevent you from creating, updating or deleting data in the database itself. That is because those operations/transactions are not sent to the management plane, but to the application.
When a lock is on, before a configuration change can occur the lock has to be removed. When that action is taken, an audit record is produced and stored in the Azure Activity Log which you can query, allowing you to pinpoint who has taken such action, and when. You can further isolate activity by correlating the timestamps of that activity against other change events triggered by that user. A perfect mechanism to audit for change.
By putting locks on your resources you gain an extra line of defense against accidental or malicious changes and/or deletion of your most important resources. It forces administrators to make a conscious decision if they wish to make configuration changes, and allows the organization to maintain strict controls to watch for such activity.
It is common when deploying cloud-based workloads to store sensitive information that needs to be protected, such as:
Database connection strings
As a security best practice, you should never store these secrets in source control or directly in your deployment scripts. Of course, we see it time and time again how major breaches occur because such keys are checked in with the source code. GitHub is a treasure trove of keys to unlock production cloud environments. Uber was breached by it. As was DXC. Hey, developer’s carelessness in not protecting secrets in source code has even been attributed to breaching some of the most personal of sites.
So separate secrets from configuration. Use services like Microsoft’s Azure Key Vault or Amazon’s Key Management Service (KMS) or AWS Secrets Manager to protect secrets needed to access your applications, services and resources.
There are plenty of good articles on how to safely configure your cloud resources to use things like Azure Key Vault. One thing to watch for is that while Azure Key Vault provides a way to securely store credentials and other keys and secrets, your code still needs to authenticate to Key Vault to retrieve them. Managed Service Identity (MSI) makes solving this problem simpler by giving Azure services an automatically managed identity in Azure Active Directory (Azure AD). You can use this identity to authenticate to any service that supports Azure AD authentication, including Key Vault, without having any credentials in your code. It is literally a single toggle in the properties of the web application inside the Azure portal. All great tools for DevOps.
Don’t think we have forgotten about our folks in CloudOps. Your infrastructure-as-code (IaC) driven through Azure Resource Manager (ARM) templates can also leverage Azure Key Vault. You can use Key Vault to pass secure parameter values during deployment. You can reference this right in the template, which means you not only can separate secrets from configuration, but you can apply RBAC so that only authorized production release management processes can access the real secrets at time of deployment, and not people.
Tip when using Azure Key Vault...
Make sure you setup your vault with key rotation and auditing. Now that your applications no longer need to persist your keys or secrets, you can take advantage of key management automation to reduce the likelihood that a known credential or secret key can be used over an extended period of time.
More importantly, with keys always rotating it becomes easier to detect misuse when threat actor’s failed use of expired secrets creates audit events that you can alert on. Tuning your auditing to watch for such suspicious behavior will allow you to detect and remediate faster.
Friends don’t let friends right-click publish. There is even a sticker for that. All kidding aside, don’t let developers deploy to production, because it bypasses any guards against configuration change that you try to maintain. It can accidentally break production, and you won’t know why. Besides that, why are you letting developers touch production systems? Oh wait… he’s that DevOps dude right? Doesn’t matter. Stop letting people touch production.
Instead, use a CI/CD pipeline like Visual Studio Team Services (VSTS). This allows you to use automation to provide a predictable, consistent deployment mechanism, and control the configuration in the cloud. Deployed by the cloud. Combined with principles discussed in the last section for using Azure Key Vault to store configuration parameters separate from the IaC templates, you can begin small and scale up in a way so that you can deploy and remediate infrastructure and code to production lightning fast.
Even better, don’t do that in the context of a user. Instead, take advantage of granular VSTS deployment privileges using Service Principals. This way, you can delegate just enough control to VSTS to deploy exactly the way you want to, with the exact configuration you want, to exactly the right resources you want to.
There is another reason you want to use Service Principals. Techy turnover. People change. They get promoted. Leave the organization. Go on holiday. If you rely on people to push to production, things can break. But if you use Service Principals, you can build deployment processes that you can standardize and control.
Some people get confused about what Service Principals are. The best way we can frame it is to think about the traditional service account in a Windows desktop or server. While you can apply a user identity to the service, normally you would apply a “service account”. A Service Principal is basically that, but for use in the Microsoft Cloud. It is a security context for cloud resources, in this case the account to deploy applications, services, configurations and data into Azure. Hope that helps.
So stop letting people touch production. Use service principals and automate the deployment with VSTS or Azure Automation runbooks.
So it’s going to happen. Even with your strict change management processes, clear Continuous Deployment tooling and detailed ARM templates for IaC, chances are what you run in production will not match your desired state configured originally. Detecting drift between your templates and original configuration with what is deployed now in production is going to become vitally important. And this can only happen if you test your templates regularly.
One way to do this is to compare your templates and configuration parameters with what is actually deployed in the resource group in Azure. To do this, follow these steps:
Any resources that have been added or deleted need to be accounted for. As do any property changes. You need to track down WHY these changes were made after the template was deployed, and then decide if the changed resources and/or config properties should be put into the deployment template.
The template and production should match. When it doesn’t, you have drift that has to be accounted for. Regularly audit this so that if you ever have to redeploy to production, you have the last known good configuration.
Now that we’ve covered why you should care about cloud configuration management and how to audit it (and also some ways to improve it), let’s cover some actual tools and software you can use to help.
First off, you should always take advantage of the best-practice guidance your cloud vendor provides. For the Microsoft Cloud, that would be the Azure Advisor. It helps to give you free, personalized recommendations to meet Azure best practices in a consistent, comprehensive view.
Once you have viewed the recommendations (and have taken action), you can start to manage and monitor your public cloud with tools like Azure Monitoring. While you can get some really detailed information on monitoring in Azure, a key aspect when considering cloud configuration management is making sure you have Azure Log Analytics enabled and all activity logs being processed through Azure Monitor. This way metrics, activity logs and diagnostic logs are all being collected so you can detect when new resources are created or modified.
If you are taking advantage of Azure Automation, then you owe it to yourself to invest some time to learn about Azure Automation DSC. This is a service that allows you to write, manage and compile PowerShell Desired State Configuration (DSC) configurations and assign configurations to target nodes, all in the cloud.
Of course, here at AuditWolf we recognize that few people want to take the time to learn all this, setup all these tools and track all this configuration drift. Which is why the AuditWolf platform exists… to give clarity to CloudOps, all while delivering visibility and operational insights to your cloud security.
Cloud Configuration Management is incredibly important to the health of your cloud deployments and data. If your cloud environments are changing without your oversight it could expose your business to unnecessary risk that can affect your infrastructure, ruin your reputation or erode your user’s trust.
This guide has defined cloud configuration management as well as given you ways to audit it. It has also given you ideas to improve your cloud configuration management processes. But it’s just a start. Now it’s on you to implement these strategies and cloud configuration management tools. It’s up to you to iterate and even innovate in the area of cloud configuration management excellence.
Your business depends on it. Do you really want your public cloud to become too public? Follow this guidance so you don’t become another statistic on poorly configured cloud environments.