Posted by Dana Epp
Azure Resource Locks : Preventing Unwanted Changes And Deletion Of Your Cloud Resources
You lock up things that are important to you like your house, car, and bicycle right? Of course you do. You really should be doing that with your cloud resources too. In the latest episode of #KnowOps, we cover why its important to lock your resources using Azure Resource Locks.
You can check it out below.
Ever look back on an outage, and wonder if it could have been avoided? I've got an interesting story to share. I'm not really sure I wanna talk about this, but maybe learning from my mistakes can help you avoid the consequences of inaction.
Dana Epp here, welcome to the channel that helps aspiring Azure administrators like you and me to know ops and well, master the Microsoft Cloud. I'm glad you're back, unless of course this is your first time. In which case, welcome. I hope you'll smash the subscribe button, and join us as I publish more videos each and every week.
So sometimes learning from the experience of others can be a real asset. And man, can you learn from my war wounds. I'm probably being a little harder on myself than I really should be. But you know what, the buck stops with me. Blaming other peoples' actions doesn't help things. Let me explain.
A few months back I got a call from someone I worked with on an Azure deployment from several years ago. They had an outage, and the system didn't failover the way it was designed to. When they looked into the incident, they realized that the secondary region failover infrastructure wasn't there. Like gone, nowhere to be found. So they were checking with me to see if I knew if it was actually deployed at some point.
I knew it was, and I had information to back it up. But it got me thinking, "How the hell could an entire region be gone?" Then it hit me. After I'd left, someone who didn't understand what it was for deleted the backup resource group destroying all the failover infrastructure. This was because we didn't have the discipline back then to put in safeguards to prevent it. I really should have known better. It'd be easy to say it's not my problem. This all happened after I'd left, but that's really a cop out.
As administrators, we have an obligation to apply available safeguards to prevent it if we can. It was my responsibility to make sure that the guardrails were applied to protect the infrastructure and data from other peoples' mistakes. Hell, the last few episodes I've been showing you how to use the power of the Azure CLI, and haven't done much to talk about how easy it is to destroy stuff with just a few keystrokes, and how to protect it. Seems like now's as good a time as any to go over this.
Azure resource locks are designed for this very thing. They lock down resources to prevent changes and accidental deletion. Had that been applied to that deployment, whoever it was that deleted the backup infrastructure would have had a warning and a chance to stop it.
There are two levels of resource locks. The first are delete locks which prevent the destruction of resources, but still allow you to modify them. The second are ReadOnly locks, which won't even allow you to change their configuration, nevermind delete them. You can apply a lock at the subscription resource group or resource level, but you have to be careful because ReadOnly locks applied incorrectly can prevent you from working with Azure resources properly. As an example, if you apply a ReadOnly lock on a VM, no one can actually start or stop that VM until the lock is removed. Maybe that's your intention, but I know of several occasions where people were going bonkers when they couldn't figure out why they weren't able to manage their servers in Azure. All because of a lock that was on and they didn't understand it. Which gets me to my next point.
Even though resource locks have been in Azure for a long time, people rarely use them. And when they do, their first inclination is to just apply a ReadOnly lock at the subscription level and be done with it.
Don't do this.
Except in rare occasions, that will probably just give you a headache. What I recommend, is you apply DeleteLocks at the resource group level, and then apply ReadOnly locks on the individual resources as needed. Here's why.
Firstly, if you apply a lock at the subscription level, whenever you delete a lock you remove the guardrails across everything in that subscription, which is kinda counter to the whole purpose of locks. During the management window, all resources could then be touched, which isn't a good thing. Bringing the lock down to the resource group level isolates access to the smallest set of resources.
Secondly, since removing locks generates a log item in the activity log, you get better auditing fidelity, grouping management activities where you need the most, and filtering out the noise. And this can become even more important if you apply RBAC in a way to get authority to manage locks to more people.
Which reminds me, we really need to talk about access control for this.
By default, only administrators with owner or user-access administrator roles can work with locks. This can be a real nuisance if you apply ReadOnly locks to things like storage or VMs, and you need to give people management access to those resources. With locks on, they just can't. And giving them owner access isn't a good idea either. You should always be thinking about least privilege when considering granting access.
Behind the scenes, resource locks are controlled by the Microsoft.Authorization/locks/* action so you can easily create custom roles and scope the access to the resource group. That's what I do. I create custom lock administrator roles and apply them to the appropriate resource groups depending on if it's QA or production environments. It allows me to give devs access to resources they need in QA, but prevents them from touching locks in production, which are reserved to the Cloud ops team. Oh, and one interesting side note. Resources inherit higher level locks. So if you have a lock at the subscription level and at a resource, the resource will have multiple locks. And when several locks end up on a resource, it's possible you won't have authority to remove a higher level lock even if you have a role with permission to remove locks at the resource. And the most restrictive lock always takes precedence, even if a higher level administrator applied a less-restrictive lock. In other words, it doesn't matter how powerful your account is, the restrictiveness of the resource lock always wins.
So how do we apply resource locks? In the portal there's a blade dedicated to it for every resource. It's literally called Locks. You can add or delete a lock in just a few clicks. If you use the Azure CLI, you can manage this using the az lock command. One little tip when you're using the CLI. The lock type isn't called delete like it is in the portal, but instead called CanNotDelete. I don't know why they decided to be different, but just remember that.
Azure resource locks are an important tool in your administrative belt. Use them, and use them wisely. Learn from my mistakes. Always apply resource locks to your production infrastructure. Consider applying delete locks at the resource group level so people can't accidentally destroy things. Apply ReadOnly locks to sensitive resources to prevent configuration drift. Make sure staff are granted custom roles with authority to remove those locks when needed. And audit everything by watching for the lock events in the activity log.
Doing so may just save your butt one day.
So are you gonna use resource locks? I hope so. Let me know in the comments. Would love to know if this resonates with you, and if you're finding it useful. Hit the like button to show me. Smash the subscribe button if you haven't yet, and don't forget to share this with your peers. Until next time, thanks for watching. We'll see you in the next episode.
As administrators, we have an obligation to apply available safeguards to prevent [accidental deletion] if we can. #knowops @auditwolf