Auditing Azure RBAC Assignments

Microsoft Azure, Powershell

I recently had a need to create a script to generate a report on Azure RBAC role assignments. The script does a number of things given the domain for your Azure AD tenant:

  • Reports on which users or AD groups have which role;
  • The scope that the role applies to (e.g. subscription, resource group, resource);
  • Where the role is assigned to an AD group, it uses the function from this blog post to recursively obtain the group members http://spr.com/azure-arm-group-membership-recursively-part-1/
  • The script reports on whether a user is Co-Administrator, Service Administrator or Account Administrator
  • Report on whether a user is sourced from the Azure AD Tenant or an external directory or if it appears to be an external account
The user running the script must have permissions to read permissions e.g. ‘Microsoft.Authorization/*/read’ permissions
The script can either output the results as an array of custom objects or in CSV format which can then be redirected to a file and manipulated in Excel.
The script could be run as a scheduled task or via Azure Automation if you wanted to periodically run the script in an automated fashion, it can also be extended to alert on certain cases such as when users from outside your Azure AD Tenant have access to a subscription, resource group or individual resource. The latter item is not a default feature of the script as depending on your organisation you may legitimately have external accounts (e.g. if you’re using 3rd parties to assist you with deploying/building or managing Azure).
The script has been published to my GitHub repo. Hopefully it will be of use to others.

HDInsight and WebSSH Security Issue

HDInsight, Microsoft Azure

Background

This post relates to an unpublished ‘feature’ of Microsoft Azure HDInsight Linux clusters that is misconfigured such that it allows users to obtain root access to clusters without having knowledge of the ‘admin’ account name or password via a web console.

I originally raised this with Microsoft Support around the end of October / beginning of November 2016. Initially, support informed me that they had discussed it with the product team and that the security issue that I was reporting was not a security issue because:

  • The security boundary of HDInsight is the Virtual Network (VNET) and
  • The clusters are only intended for single user tenancy (ironically a MSFT Cloud Data Solution Architect recently said to me that HDInsight fully supports multiple users – which I guess is sort of true now with secure clusters being in preview).

Eventually they agreed that it was indeed an issue and disabled the feature on all new clusters as an interim measure.

 

What is the issue?

An Azure HDInsight Linux cluster consists of head, worker and zookeeper nodes – these nodes are Azure VMs, although the VMs are not visible nor can the individual VMs be managed in the Azure Portal you can SSH to the cluster nodes.

When you provision a cluster you are prompted to set to credentials:

  • One that will be used for the Ambari web interface – which you can login to over HTTPS and a <cluster name>.azurehdinsight.net domain.
  • The other for a local account that will be created on ALL nodes in the cluster which you can then use to SSH to the cluster ssh <user>@<cluster name>-ssh.azurehdinsight.net

The SSH account by default has passwordless sudo – that is you can run sudo su and become root without being prompted for your password.

One of the packages that is installed when you provision a HDInsight cluster is hdinsight-webssh running apt-cache show hdinsight-webssh shows us that it is a Microsoft package (there are other Microsoft HDInsight packages they are all prefixed with hdinsight-):

Running netstat you can see that there is a nodejs based web terminal running and listening on port TCPv6 port 3000:

If you run

you will see the process (which incidentally also runs as root!).

The configuration for the service/application is here:

/etc/websshd/conf.json

It looks like that a number of python scripts are run when you provision a cluster to start ambari, configure hive etc. one of which is to start this websshd service with /opt/startup_scripts/startup_webssh.py

Impact of the issue

The issue cannot be easily exploited by an external attacker e.g. one that does not already have access to infrastructure in the Azure Virtual Network (VNET) that the HDInsight cluster resides in. Such an external attacker would first need to gain access to (doesn’t need to be a privileged account) on any other system hosted in the same VNET and from this point they can easily gain root access on the HDInsight cluster by simply browsing to http://

<clusternodeipaddress>:3000 which would automatically give them a web based shell as the user that has passwordless sudo without entering any username or password.

However, since the default NSG rules allow connectivity within a VNET (as opposed to a default deny that requires all traffic to be explicitly allowed) this makes it easier for an attacker to extend their reach.

Another possibility is that an external attacker would need to find a vulnerability in the proxy servers and/or the various web interfaces that are accessible via the proxies.

In the case of a malicious user who has authorised access to say an application or web server, they would be able to take advantage of the misconfiguration to obtain root access to the HDInsight cluster as described above.

In either case an external attacker or malicious user can then use the root access to exfiltrate data, plant malicious software etc.

Summary

Microsoft have since disabled the service (although the last time I checked back in December 2016 the package is still installed but the service is not running, nor is there a systemd unit file installed.

Microsoft didn’t explain why the package is installed in the first place but I can only assume it was added as a convenience when the product team were developing or testing.

Browser based terminals are problematic when it comes to security but it’s worse when the endpoint is

  1. Unencrypted
  2. Performs no authentication
  3. Drops you in as a user that has passwordless sudo

As an added measure you can disable passwordless sudo for the admin account – which probably shouldn’t be enabled anyway.

KVM Automation

KVM

Introduction

This blog post describes one option for automating the build of a KVM guest.

There are alternative ways to automate the build but the method that is described here uses the ability to pass a kickstart file to the virt-install command when creating a new VM. Kickstart is a file contains the answers to all the normal questions that an interactive installer would ask during installation. The kickstart script installs software packages, configures SELinux, auditd, rsyslog etc.

Using virt-install with a kickstart file

The virt-install command is used to create new virtual machines / guests; it supports a

parameter allows you specify the path to an Anaconda Kickstart file.

An example virt-install command is provided below:

The key parameters as it pertains to automating the install are:

  • The

    parameter specifies the path to the kickstart file on the host machine
  • The

    parameter then specifies where the kickstart file is on the VM.
  • The

    parameter specifies to not enter into a console – which is the default behaviour. The reason we disable this is because if we enter into a console it requires manual intervention to exit from the console after the kickstart installation completes in order to continue with the rest of the script for building a encrypted VM.

 

The Kickstart file

 The format of the Kickstart file will not be covered in detail here however, the key configuration lines that are important for automating the KVM VM build are highlighted below:

  • specifies that a text based installation should be performed

  • specifies that the VM should be shutdown after the kickstart installation completes – this is important as we use this to detect when the installation and configuration is complete before we move on to encrypting the VM operating system disk.

  • specifies the URL for the package repository and that it can be reached via the proxy 192.168.0.20 (if you have direct internet access then this line is not required, also if you are using an internal repo then the URL should be modified accordingly)

  • specifies the location of an additional repo, in this case, the EPEL repo and that it can be accessed via the proxy
  • The line below sets the password for the root user
  • It is stored in hashed form you can generate this by running the command below:

There was also a requirement to configure auditing and logging. Some of these files were quite long and so it was too unweildly to simply hardcode the entire contents of the files into the kickstart file and using heredocs to write them out to a file on the guest. In light of this I used base64 encoding and gunzip to encode the file.

The Kickstart file includes blocks of code such as the example below:

This command decodes a base64 encoded string and then decompresses it and dumps it to a file; the string contains the code for a shell script. This is a convenient way to included scripts without including the entire code using heredocs.

To create the base64 encoded and compressed script enter the script as is into a file, then run the command:

 

Detecting completion of the kickstart script

After the virt-install command is run the virtual machine build script virt-create-guest.sh script waits for the VM to enter the shutdown state (recall that the kickstart file specified that the machine should be shutdown after installation) it does this using the following snippet by running

virsh domstate <guest vm name> and check if it returns “shut off”.

 

Azure AD Authentication (Connect-AzureAD) in Azure Automation

Microsoft Azure

It is now (has been for a while) possible to modify Azure AD via the Azure Automation. The example below uses the Run As Automation Account to first Connect to Azure AD and then run the appropriate commands. You can also create a dedicated Run As account if you want, as well as use a username and password (less secure).

Before you write your code make sure that you:

  • Add the “AzureAD” module to the Automation Account
  • Give the Azure Automation Run As account the appropriate permission as show at the end of this article

Automation Code example (list all the groups in AD):

Give the Azure Automation Run As account the appropriate permissions:

  • Go to Azure Active Directory -> App registrations -> The Run Ass Account.
  • Then go to the API access as show:

  • Give the appropriate access, example below:

Don’t forget to click grant permissions!

Azure ASR Error- 78052 Master target contains different types of scsi controllers.

Microsoft Azure

This is a bit of a self-explanatory one, but I thought I would mention it anyway. When you build an ASR Master Target server make sure if you have more than one SCSI controller that they are of the same type, it doesn’t matter what type they are (LSI Logic SAS, VMware Paravirtual, ect..) but they both need to be the same or you will get the following error on the Azure portal when you attempt to fall back the machine to On-premeses.

 

Azure ASR Error- 90068 disks specified not present

Microsoft Azure

Quick fyi for anyone using Azure ASR, make sure if you are protecting a virtual machine located in Azure to unselect the temp drive disk D when you are adding the machine to ASR protection. If you try and protect the disk to on-premises, you will get the below error message. If you do you will need to delete the protection and reprotect without drive D. The below error only occurs when you try and reprotect to on-premises, it seem to work fine if you reprotect to another azure location.

 

Traffic Manager Endpoint monitor and ADFS /adfs/probe

Microsoft Azure, Windows

Microsoft has a very nice post on how to setup Traffic manager in front of an ADFS farm for high availability, where both sites are in Azure but in different GEO locations or one in Azure and one on premises. The Article is located here: https://docs.microsoft.com/en-us/azure/active-directory/active-directory-adfs-in-azure-with-azure-traffic-manager. What the article lacks is how to setup proper ADFS monitoring, which monitors both tte WAP and the ADFS service, at the moment the article only goes into details which monitor the WAP service.

So this post will go over how to configure your environment so the health point will report the status of both WAP and ADFS.

Some info before we begin:

  • The solutions is achieved by monitoring the /adfs/probe/ on the ADFS server via the WAP proxy
  • The solution will report failure if the WAP proxy is not forwarding or the ADFS service is down. So we are monitoring the whole solution.
  •  It will work if you have an external load balancer in front of the WAP servers and an internal one in front of the ADFS servers, for simplicity I will outline how it’s done on the non-load-balanced solution but it’s the same procedure for both.
  • You can’t monitor /adfs/probe on the WAP server as that will only give you the status of the WAP server
  • You can create a rule on the WAP server to redirect /adfs/probe to the ADFS server, but it will get ignored and show you the status of the WAP server.
  • I tested this on Server 2016 but it will work for 2012 R2 as well
  • If you are using 2012 R2 make sure you update your WAP to the latest version so you can forward HTTP traffic
  • We use HTTP as this prevents certificate problems and because Traffic manager does not support SNI.
  • You can’t monitor the “/federationmetadata/2007-06/federationmetadata.xml” because the way you set this up for Traffic manager means you are monitoring the ADFS on a different DNS so the request will not be forwarded.

Essentially this is what we are doing

adfs_probe_check

Once you setup the environment as per Microsofts Article above we need to do the following:

The variables for my test environment:

  • ADFS URL and Federation Service Name – test123.blah.local
  • Traffic Manager DNS – adfstest.trafficmanager.net
  • WAP server public IP dns (this can be replaced by a load balancer) – http://mytestadfsa.westeurope.cloudapp.azure.com
  • Custom monitor path (you can choose anything but the default which is /adfs/) –  /adfsprobe/

The Steps:

  • Change the Traffic Manager Configuration to point to our custom monitor path for the endpoint monitoring

configuration-microsoft-azure

  • Create an HTTP rule on the WAP server in the Remote Access Management Console to forward (via Pass- through) the WAP DNS + our custom monitor path to the ADFS server. I assume that your WAP server host file has been modified to point the ADFS URL to the ADFS internal IP or load balancer IP

wap-rule

iis-url-rewrite

  • The rule to be created is Reverse Proxy with the following settings:

arp-rule

  • And finally change your Public DNS record and create a CName for your ADFS URL (test123.blah.local) to point to the traffic manager DNS name (adfstest.trafficmanager.net)

And you are done.

Powershell Add-Computer error when executed remotely.

Windows

When you execute the PowerShell command: “Add-Computer -DomainName “contoso.com” -Credential $domainjoinuser -Restart” remotely or in a non-interactive environment you may get the following error:

The root of the problem is (given that your password is correct) when running things interactively the domain is pre-appended and as such you only need to provide the user. But in a non-interactive environment, the domain is not known as such it’s a very simple fix, make sure you either include the short domain names like “contoso\DMAdmin” or the full FQDN “[email protected]. The error occurred for me by running an Azure custom script which called a PowerShell script non-interactively.

The ACL RemoveAccessRule Not Working

Windows

If you try and modify the ACL via PowerShell but the command RemoveAccessRule is not working, by that I mean you run it no errors come up but the rules and not being removed.  The problem is that inheritance is turned on and you are trying to remove a rule that is obtained from inheritance. To fix this problem you first need to disable inheritance, save the rule and then get the acls again. After that remove will work. Code can be found below:

 

New Virtual Disk Error “UseMaximumSize” (Storage Pool)

Windows

This is a heads up post for anyone who is creating a new virtual disk from a storage pool in Storage Spaces. If you are creating a disk and want to use the “Maximum size” parameter then you have to make sure that the Provisioning type is set to Fixed and not Thin. This is a normal expected behaviour as the Thin disk expands automatically and take space as it needs so you can’t pre-located 100% of the size, you will need to specify an initial disk size. For fixed size “-ProvisioningType Fixed” you can use the “-UseMaximumSize” parameter.

The unfriendly Powershell Error:

The GUI Error:

virtualdisk-error