Tuesday, 23 September 2014

Deobfuscation tips: Nuclear EK landing page

DISCLAIMER: There isn't a single way to deal with obfuscated data/code. There are many automated and semi-automated tools available to help you with that. In this post though I'll be using none. The aim here is to walk through some code deobfuscation manually. This is not a comprehensive Nuclear EK landing page analysis. Only bits related to data/code obfuscation are covered.

NOTE: Exploit Kit sample used in this post was captured in September 2014. Taking the ever changing nature of EKs, the described below might not be applicable to the newer variants.

'Nuclear launch detected'

I'll be using Nuclear EK landing page sample here. Note a huge blob of numbers stored in 'G4Ah' variable and a string stored in 'qjv' variable. The string serves as a lookup key and the numbers blob is actually a sequence of 2 digit numbers that are used to find a character in 'lookup key' at the position = 2 digits value. The JavaScript on the landing page does quite a simple job - it splits the blob into 2 digits chunks, loops through each chunk value to find the corresponding character in the 'lookup key' and adds the found character to a string. This might sound a bit confusing, so let's translate it into a Python script to better understand it.

lookupKey = "LOOKUP_KEY_GOES_HERE"
encodedString = "NUMBERS_BLOB_GOES_HERE"
listOfValues = map(''.join, zip(*[iter(encodedString)]*2))
decodedString = ""

for index in range(len(listOfValues)):
    if int(listOfValues[index]) < 10:
        element = int(listOfValues[index])
    else:
        element = int(listOfValues[index]) - 2
    decodedElement = lookupKey[element]
    decodedString += decodedElement

print(decodedString)

You'll notice an 'if' condition in the 'lookup' loop - for any value greater than 10 subtract 2 from it and then perform the lookup. This is done to compensate for the escape '\' characters in the lookup key. I'm not entirely sure why '10', but assume the code logic that generates the key will not include characters that require escaping into the first 10 character positions of the key.

Before we can run the script we need to put the values into 'lookupKey' and 'encodedString'. Where the value for 'encodedString' is hard to miss in the landing page code, the value for 'lookupKey' might be challenging. From my personal experience with Nuclear EK landings, I found that the characters positions in the key are random, but its size is always 95 characters. The simplest, but not always reliable way to find the lookup key is to search for a variable assigned a long string value. If this method fails you'll have to follow the JavaScript code to find it.

Now, if we use the corresponding values from our landing page sample and run the script, we get the following output.

Another KISS approach to data obfuscation. Happy deobfuscation!


Thursday, 18 September 2014

Deobfuscation tips: SweetOrange EK landing page

DISCLAIMER: There isn't a single way to deal with obfuscated data/code. There are many automated and semi-automated tools available to help you with that. In this post though I'll be using none. The aim here is to walk through some code deobfuscation manually.This is not a comprehensive SweetOrange EK landing page analysis. Only bits related to data/code obfuscation are covered.

NOTE: Exploit Kit sample used in this post was captured in September 2014. Taking the ever changing nature of EKs, the described below might not be applicable to the newer variants.

Sweet and Sour

I'll be using SweetOrange EK landing page sample here. Note a huge blob of text stored in one of the list items '<li>' followed by some JavaScript code. We'll skip the general code analysis/overview and focus on the tag. It has 'id' attribute with a rather unique value - 'UHOhpWHd'. Searching for this value in the webpage code leads to the following JS function:

function uQqRecfwwf(DDDDDDDDD) {
    TLKQDEnOiE = 0;
    var ARNTO = uQqRecfwwf;
    var DnB = 0;
    UhOnd = String(Math.asin);
    if (isNaN(UhOnd.match(/aSI/ig))) {
        var k = (UhOnd.match(/In/i));
        if (k != null) {
            iuhquweh = trenoSZ(uyHMzLQ);
            XdwrQ["spl" + "ice"](Math.acos(1), 2, "UHOhpWHd");
            XdwrQ["spl" + "ice"](1, Math.acos(1), iuhquweh);
            ardddds = DDDDDDDDD;
            DnB = cXEcJlM()[qqTbFg()];
        };
    };
    return DnB;
}

At this stage it's unclear what exactly happens to the blob's data rather then being stored in 'XdwrQ' array. Before we start chasing the rabbit down the rabbit's hole and submerge into other functions, let's find out what calls 'uQqRecfwwf' function.

aCaBOUakjB = uQqRecfwwf(BFG423SDFFSDF);

Ok, it's being called during a variable initialization. Are there any operations on this variable after the function call returns?

JvUhBCJkFV = aCaBOUakjB.substring(Math.acos(1) + 70).replace(/AdgSf344_42/, "");

The value returned is passed through '.replace' where any occurrence of 'AdgSf344_42' string is removed and then '.substring' cuts off first 70 characters and what's left is stored in another variable called - 'JvUhBCJkFV'. Note the string that's being removed - 'AdgSf344_42'. This string is repeatedly present all over the data blob, so it's safe enough to assume that these 'replace' and 'substring' operations are performed on the data stored in the blob. Now let's trace 'JvUhBCJkFV' variable.

JvUhBCJkFV = JvUhBCJkFV["CmYhLBWtlcUAuPllrQzwcR".charAt((Math.acos(1) + 1) * 21).toString().toLowerCase() + "QeTyvpnauVDwgPrSyUmddePl".substr((Math.acos(1) + 1) * 21, 3).toLowerCase() + "hSReqJbsEYDVGYtSdMaKvAce".toLowerCase().substr((Math.acos(1) + 1) * 21, 3)](/__hhg7_/, "<");
JvUhBCJkFV = JvUhBCJkFV["DFhBbAbrvTfutCnsmdcFnR".charAt((Math.acos(1) + 1) * 21).toString().toLowerCase() + "pxUBkDdBwEcWypAIICsrJePl".substr((Math.acos(1) + 1) * 21, 3).toLowerCase() + "mVrJDwMWDZuUhgpMZSmVEAce".toLowerCase().substr((Math.acos(1) + 1) * 21, 3)](/__Db8__/, ">");
JvUhBCJkFV = JvUhBCJkFV["QOmUjeNOFcQYONcLQZAtOR".charAt((Math.acos(1) + 1) * 21).toString().toLowerCase() + "bWXExXVfqvcqeIuyNfJaRePl".substr((Math.acos(1) + 1) * 21, 3).toLowerCase() + "iebFbNjCVybBwfEvorJqqAce".toLowerCase().substr((Math.acos(1) + 1) * 21, 3)](/_uio0__/, "&");
JvUhBCJkFV = JvUhBCJkFV["oZBZyFhKBOkJHNmWnmNCKR".charAt((Math.acos(1) + 1) * 21).toString().toLowerCase() + "PIPoCStPfHZIWGgmYzekFePl".substr((Math.acos(1) + 1) * 21, 3).toLowerCase() + "gfAOlzFwZvghrQYqhRPhnAce".toLowerCase().substr((Math.acos(1) + 1) * 21, 3)](/__cc0__/, "%");

This is a little bit hard to read, isn't it? This is another type of code obfuscation used in this sample. Thought it's not something I want to cover in this post, let's deobfuscate it for the sake of easiness. Simply taking the code enclosed into square brackets and evaluating it using a JS sandbox yields the code below.

JvUhBCJkFV = JvUhBCJkFV.replace(/__hhg7_/, "<");
JvUhBCJkFV = JvUhBCJkFV.replace(/__Db8__/, ">");
JvUhBCJkFV = JvUhBCJkFV.replace(/_uio0__/, "&");
JvUhBCJkFV = JvUhBCJkFV.replace(/__cc0__/, "%");

More replacements!! Ok, let's do all of the above replacements and see if we get some readable code. The following simple Python script will help us do it:
 
def readFile(filename):
    fo = open(filename, "rb")
    textFile = fo.read()
    fo.close()
    return textFile

encodedString = readFile('SW_landing.txt')
encodedString = encodedString[70:]
encodedString = encodedString.replace('AdgSf344_42', '')
encodedString = encodedString.replace('__hhg7_', '<')
encodedString = encodedString.replace('__Db8__', '>')
encodedString = encodedString.replace('_uio0__', '&')
encodedString = encodedString.replace('__cc0__', '%')

print(encodedString)

Before running it, we'll need to save the data blob into 'SW_landing.txt' file and place it in the same folder with the Python script. If you ever need to deobfuscate another SweetOrange EK landing page using this script, keep in mind that the replacement strings can change, though so far I've only seen '.substring' value and the first replacement string changing.

After running the script we get the output. With the exception to some base64 encoded strings, the code is pretty readable and can be analyzed further if required. Now let's sum things up a little bit.

So, from the data/code obfuscation perspective SweetOrange EK landing page contains a relatively big blob of obfuscated data and a JavaScript to deobfuscate it into yet another JavaScript. The deobfuscation is implemented through a series of simple replacement operations. KISS.


Happy deobfuscation!


Sunday, 8 June 2014

CottonCastle EK: "I hate to break this to you, but this isn't gonna be an open casket."

NOTE: The information is based on a sample captured on 2014-06-06

Thanks to @Set_Abominae for sharing this sample.

Update 2014-06-10@kafeine shared his experience with this exploit kit. Covers the history of the name, how it was first detected and what other exploits it has in its arsenal.

Whenever there is any doubt, there is no doubt

It's a rather interesting name for an exploit kit. Trying to find any references to 'Cotton Castle' you end up with links pointing at an amazing looking location in Turkey - Pamukkale. I can't be sure if the exploit kit name tips you off on the country of the origin, but let's find out what it's made of to be sure we better understand this threat.

In this particular sample a compromise attempt started with visiting a webpage with a link to a JavaScript that starts the redirect chain. The following HTTP traffic was observed.

'CottonCastle EK' HTTP traffic

The JavaScript starting the redirect chain is 'jquery.place.min.js'. According to almighty Google Search, the name belongs to a legit JavaScript application developed by 'Designcise' and meant to assist web developers with organizing a webpage content. Nevertheless, in this particular case that's not what this script is doing at all. The script appears to be checking the following before initiating a redirect:
  • presence of the word 'government' in the current session 'cookies'
  • types of 'frames', 'XMLHttpRequest', 'mozSetImageElement'
  • monitoring users interaction with the webpage - mouseovers, clicks and movements
Part of JavaScript checking browser 'cookies' and some element types

Part of JavaScript hooking mouse activity

Part of JavaScript showing 'EventListener' and 'attachEvent'

Once the required for redirect conditions are met, the browser is redirected to an HTML page containing another JavaScript.

Part of JavaScript showing the function initiating the redirect

This type of behaviour could be found with some of the news/ads reach websites where additional content is pulled based on user's webpage interaction and in this case might be exactly that, but we'll carry on assuming this activity is a malicious.

The webpage the browser is taken to is rather simple in terms of the content - 1 line of text that looks like a news headline.

Part of the page showing the line of text and an <iframe>

Let's take a look at some WHOIS data for the domain hosting this page.

Extract from WHOIS results for 'leveloped.in'

Doesn't appear to be well 'hidden' if it was registered with a malicious intention, once again, making me think that this could be just a case of a compromised news/ad feed. Anyway, let's focus on the '<iframe>' link that leads to another webpage.

Part of the second HTML page in the redirect chain

Note yet another JavaScript URI that looks like a request for an ad banner - 'static.js?ads-banner=1620850'.

Part of JavaScript redirecting to EK landing page

'Surprisingly', there is no ad banner in the response. Instead it contains a lightly obfuscated JavaScript that compiles the next URL in the redirect chain. It takes a predefined URL, adds the referrer to it and requests the resulting URL.

This was the last redirect in the chain, so the browser finally arrives on the 'CottonCastle EK' landing page. Before we proceed with the analysis of the EK parts, let's quickly sum up what we've found so far and speculate a little bit about it. It's hard to tell for sure where the actual badness starts. Technically, it would be fair to say it starts from the very first page we landed on - 'ru.hellomagazine.com/tags/40-viktoriya_bonya.html', but we should also consider a scenario where the first redirect to 'leveloped.in/government/70d83bde3d5f7e09/' through 'ru.hellomagazine.com/js/jquery.place.min.js?i=1400557776' JavaScript could be a normal operational mode for 'ru.hellomagazine.com' to pull news/advertisement content for its web pages. So, in this case the news/ad feed is compromised and more likely begins at 'expokot.com/hilosifipu/static.js?ads-banner=1620850'. On the other hand, all domain names involved in the delivery of news/ad feeds could be a part of a well organised malvertising network and the redirect initiating link has been maliciously injected into 'ru.hellomagazine.com' website pages. Alright, that would be enough for speculations.

The statement below is true. 
The statement above is false.

The server replies with 203 HTTP status code when the landing page is requested. More likely, a rather unusual status code is being used by the server side of 'CottonCaste EK' for some internal processing or it could be specific to this particular sample only. String variables on the landing page are lightly obfuscated.

'CottonCastle EK' string obfuscation example

On top of that, the code is padded/fragmented by comment blocks.

'CottonCastle EK' comment blocks fragmentation example

As a first step, the code will launch an '<applet>' containing a Java application.

'<applet>' launching Java application

Next, the code will attempt to create a 'ShockwaveFlash' ActiveXObject and if successful will identify its version.

Part of JavaScript code that creates a 'ShockwaveFlash' ActiveXObject

Version value returned will be broken down into individual number values and stored as an array. The array will be converted into a string  by XORing each array element by a static key '343' and adding them together. The resulting string will be passed to a function that generates a GET request using a pre-defined URI and the passed ShockwaveFlash version value, plus, a bunch of other pre-defined parameters stored as HEX strings.

Part of JavaScript generating GET request for a ShockwaveFlash component

Just for a little bit of extra fun, we can find what version of Shockwave Flash Player was installed on the machine this particular sample of 'CottonCastle EK' was captured from. Here is the part of GET request corresponding to this sample - '/forum/tracker/3/ON/0dc93648889f84dcc7f0f70c25fbe9c6/349.341.456.342/'. If we take each individual numeric value from '/349.341.456.342/' and XOR it by '343' we get the version of Shockwave Flash Player - '10.2.159.1'.

This particular GET request received HTTP 409 response. I assume that the server side of 'CottonCastle EK' responds with HTTP 409 Status code when it decides not to serve the exploit component or something went wrong sending it.

Lastly, another JavaScript on the landing page will try to identify 'ScriptEngineMajorVersion', 'ScriptEngineMinorVersion' and 'ScriptEngineBuildVersion' values. Like in the case with 'ShockwaveFlash' version values, each of the found values will be XORed by '343' and the resulting string passed to a function that generates a GET request using a pre-defined URI and the value passed. The request initiation is set on a 2.5 seconds timer.

Part of JavaScript generating GET request using detected values

So, the Java application will be requested and executed first. The request is implemented using JNLP file.

'CottonCastle EK' Java application JNLP file

Surprisingly, JNLP file has no 'Security Warning Window' bypass attributes and interestingly enough other attributes are properly named. According to the naming, this Java application is called 'jBitTorrent Client' that provides 'Java implementation of the bittorrent protocol' and will run on '<j2se version="1.6+" />'. The execution starts with 'com.s' class file. Let's take a look at the JAR file content.

'CottonCastle EK' JAR file structure

The two class files 's.class' and 't.class' contain a 'wrapper' code. In a nutshell, the code decrypts and loads some of the JAR file components. The '.dat' files included in the JAR file serve different purpose:
  • 'd.dat' - Windows PE executable
  • 'j.dat' - operating system reconnaissance, payload download and execution
  • 'p.dat' - JVM parameters collector, execution path selector
  • 'u.dat' - exploit code for CVE-2013-0422 (JmxMBeanServer)
Every '.dat' file is RC4 encrypted. The decryption key is stored in 'adv' parameter passed to JVM. In this particular sample it was - 'OrbitWhite'.

The Java code is fairly obfuscated. String values are also encrypted with RC4(using the same decryption key) and stored in HEX representation. 'java.lang.reflect.Method' features are used a lot.

Example of obfuscated string values

The execution flow is the following:
  1. 'wrapper' initialization
  2. 'wrapper' decrypts 'u.dat' file and passes it to 'javax.script.ScriptEngineManager'
  3. decrypted from 'u.dat' JavaScript exploits 'CVE-2013-0422'
  4. 'wrapper' decrypts 'p.dat' and loads it as a class file
  5. 'p.dat' class file gathers passed to JVM parameters, decrypts 'j.dat' file and passes it to 'javax.script.ScriptEngineManager'
  6. 'j.dat' JavaScript collects values of some OS parameters, decrypts and drops the payload stored in 'd.dat' file
  7. 'j.dat' JavaScript downloads a VB script and executes it
  8. VB script checks for presence of AV, downloads an executable file, decrypts, stores and executes it
  9. 'j.dat' JavaScript deletes all the cache and temp files created during the execution
Let's take it from step 6, 'j.dat' JavaScript decrypts 'd.dat' file and attempts to save it into the folder specified under 'allusersprofile' environment variable. The new filename for 'd.dat' file will be - 'java.dll'. If the script fails to save the file under 'allusersprofile' it saves it into the folder specified under 'temp' environment variable.

Part of 'j.dat' JavaScript that handles 'd.dat' file

After that, the script decrypts the value stored in 'session' variable that was pre-loaded by 'p.dat' file. The decrypted value is the URL for an HTML page containing encoded VB script. The page is downloaded and processed by 'mshta' tool.

Part of encoded VB script 

VB script is decoded and executed by the following JavaScript:

JavaScript that decodes and executed embedded VB script

List of operations performed by VB script:
  • query list of all running processes
  • check results of the query against a pre-defined list of processes
  • callback to a pre-defined URL in the event of blacklisted processes detected
  • build filename and filepath for malware payload
  • download, decode and execute the malware payload
  • callback to a pre-defined URL reporting a success deployment
List of running processes is pulled from 'WMI service'

Part of VB script creating the list of running processes

The generated list is checked against the list of process names belonging to some security products - 38 in total. Each entry on the list is formatted - 'code_letter:process_name,'. NOTE: the list below contains only the product or service names. Filenames corresponding to these services/products were purposely left out.
  • AVG Scanning Core Module - Server Part
  • AVG Watchdog Service
  • Ad-Aware Antivirus Service
  • ArcaVir
  • Avast! Service
  • Avira Scheduler
  • BitDefender Agent
  • BullGuard Behavioural Detection
  • CA eTrust Antivirus
  • Comodo Agent Service
  • Dr. Web
  • ESET Service
  • F-Secure Host Process
  • G DATA Personal Firewall
  • Ikarus Security Software
  • Jetico Personal Firewall
  • K7TotalSecurity Service Manager
  • Kaspersky Lab
  • McAfee Service Host
  • Microsoft Security Client User Interface
  • Norman Privacy Tools
  • Norton 360
  • Norton AV
  • Norton Internet Security
  • Omniquad firewall or Dynamic Security Agent or AGuardDogSuite
  • Outpost Firewall
  • PC Tools Security Service
  • PC Tools ThreatFire Service
  • Panda Software Controler
  • Rising Antivirus
  • Solo Antivirus
  • Solo Scheduler
  • Sophos Administrator Service
  • Sophos Anti-Virus
  • Trend Micro Anti-Malware Solution Platform
  • TrustPort Antivirus Management Agent
  • ZoneAlarm ForceField 
Interestingly enough, only two of the products are actually blacklisted - 'Norton 360' and 'Norton Internet Security'. If a blacklisted process is found the script will call to 'http://bretan.key-updates.pw:4433/forum/posting/' + '1000/' + 'code_letter' corresponding to a detected process. Otherwise, it'll proceed with creating the filename for the malware payload.

Part of VB script creating the filename for the malware paylaod

Another rather strange 'twist' in here. If you take a look at the screenshot above you'll notice that the name is created based on a condition. 'ap' variable is evaluated and depending on the result the prefix of the filename is selected. The value 'a' corresponds to 'avast! Service'. I'm not quite sure why this product receives a 'special treatment', but the possible filenames for the malware payload are - 'Windows-Patch-KB874923-x86.exe' or 'Windows-KB874923-x86.exe'. The file will be saved to the folder specified under '%TMP%' environment variable. Prior being saved to the disk, the file is run through a XOR routine that decrypts it. The decryption key is stored in plain text in the VB script. In this particular sample it is - 'e2400a24ac76b37cb0adff1dfd022e08'. Once decrypted and saved, it's executed using 'cmd.exe'. Spawned 'black screen' will try to calm a worried user down with 'echo Install Windows Updates'

Part of VB script that performs decryption and execution

The last thing the script does is generating the callback URL. The format is the same as per case with blacklisted process except to the middle part - instead of '1000/' it uses '111/'.

At this stage the control is passed back to 'j.dat' JavaScript and the very last operation it performs - 'clean up'.

Part of JavaScript showing the 'clean up' function

The script takes a good care cleaning up the 'leftovers'. In another rather interesting 'twist', 'Kaspersky Labs' are getting 'special treatment' during this process. On top of deleting temporary files created during the overall execution, the script deletes the content of 'Kaspersky Lab' folder located in '%programdata%'(Win Vista+) or '%allusersprofile%'(Win XP). 'Kaspersky Labs' products keep application tracing '.LOG' files in this location, so more likely the script is cleaning it out to remove its traces there too.

This would conclude the Java analysis part, so let's sum things up a little bit. I have to admit it was a rather interesting 'descent' for me. Do not want to sound as if I'm admiring someone with malicious intentions, but as a techie I have to say it's a nicely written piece of code. Use of Minification and 3 different programming languages is actually quite cool. JavaScript in 'j.dat' was a particularly interesting piece. At some point I even started thinking could this EK be a 'state sponsored' work, but that would be just silly, right. One thing I was not able to figure out is the purpose of the Windows PE file. It's only 2.5KB in size with 3 really short functions inside. Anyway, the Java part of this EK is fairly complex - detection evasion through encryption, system reconnaissance, support for blacklisting certain processes, clean up procedure. Overall observation is the Java part of 'CottonCastle EK' focuses more on being undetected rather being successful.

If you build it, he will come.

Since this sample doesn't have the 'Shockwave Flash Player' exploit part(@kafeine has covered this in one of his blog posts), the last component of 'CottonCastle EK' we have in this sample is the 'IE' exploit part. As mentioned in the beginning of the post, after collecting some of the values related to the browser environment a GET request is sent to the server. If the server is satisfied with the request it returns an HTML page with a JavaScript that decodes a blob of data stored in one of the variables.

Example of the data blob

JavaScript that decodes the data blob

The result of the decoding is another JavaScript.

Part of the decoded JavaScript showing indicators of 'CVE-2013-2551' exploit code

Thanks to @regenpijp for help identifying the vulnerability this exploit code is targeting - CVE-2013-2551.

Additional information on this EK can be found here - http://malware.dontneedcoffee.com/2014/06/cottoncastle.html

This concludes 'CottonCastle EK' analysis. For the information on the delivered payload please see the summary table below.


Summary Information
Name: CottonCastle Exploit Kit
Date captured: 2014-06-06
Date analysed: 2013-06-07
Source/Credits: Data source - @Set_Abominae.
Intel source - @kafeine
Infection vectors detected: Java / Shockwave Flash Player / Internet Explorer
Vulnerabilities targeted: CVE-2013-0422
CVE-2013-2551
CVE-2014-0515
Landing page
Obfuscation: Yes
TDS: Multi redirect chain
JNLP: Yes
JVM parameters: Yes
Java infection vector
Captured with: Java 1.7.05
Obfuscation: RC4 encrypted string values, Java Reflections, Minification
JAR extra content: Number of RC4 encrypted files with '.dat' file extension
Initial Payload delivery method: URL
Initial Payload encryption/encoding: XOR. key - 'e2400a24ac76b37cb0adff1dfd022e08'
Initial Payload store location: System Temp folder
Initial Payload filename: Static. 'Windows-Patch-KB874923-x86.exe' or 'Windows-KB874923-x86.exe'
Browser infection vector
Analysed with: Internet Explorer 7
Initial Payload delivery method: URL
Initial Payload store location: NA
Initial Payload filename: NA
Automated analysis
Exploit components:
JAR - https://www.virustotal.com/ (no detection)
Delivered malware:
EXE(MD5 b619fce7efde0453c06f68565a8bdbb6)
https://malwr.com/
https://www.virustotal.com
Additional Information

Java malicious component is implemented with the use of different evasion techniques. Execution is controlled and depends on the values of different system/OS parameters.


Friday, 30 May 2014

Dissecting Tips: OLE and Office Open XML



DISCLAIMER: The choice of tools is based on a personal preference. The same results can be achieved using similar set of tools. This is not a step-by-step guide - these are just some tips.

If you're to try the described below you'll need to have the following skills and tools:

Skills
  • Fair understanding of Object Linking and Embedding (OLE) file structure
  • Fair understanding of Office Open XML file structure
Tools

NOTE: The file samples used in this blog post were sourced from phishing emails roaming around at the end of March 2014.

The fastest way to check if an OLE file has any malicious content embedded is to run it through 'OfficeMalScanner' tool. There is a couple of option keys to help you do that - 'scan' and 'info'. There is also a couple of switches available - 'brute' and 'debug' - that can further increase the chances of finding malicious content.

OfficeMalScanner usage

The screenshots below shows the tool output for a DOC file that was attached to a phishing email. Taking that we do not know what hides inside, it makes sense to analyse the file using both options - 'scan' option first.

OfficeMalScanner 'scan' option output

No suspicious content has been found, but note the comment at the bottom of the output - the tool is recommending to analyse the file using 'info' option key.

OfficeMalScanner 'info' option output

Now the tool detected an embedded VB script and dumped it into a folder. Quick glance at the script shows that it will download and execute a file.

part of extracted VB script

'OfficeMalScanner' tool can detect and extract embedded EXE files. The screenshot below shows an example of the tool output when it detects an EXE file embedded into a DOC file.

OfficeMalScanner output - detected and dumped embedded EXE

'OfficeMalScanner' tool can also handle Office Open XML files. Below is an example of the tool output when used with 'inflate' option.

OfficeMalScanner 'inflate' option output

NOTE: Simply changing an Office Open XML file extension to 'zip' and opening the file with an archiving tool of your choice will allow you to extract its file structure as well.

The decompressed files will be stored in 'DecompressedMsOfficeDocument' folder in user's '%TEMP%' location. In this particular example, the tool highlighted one file - 'word/vbaProject.bin' to be suspicious and suggested to run the tool against it using 'scan' or 'info' options.

OfficeMalScanner 'info' option output for embedded into DOCX file VB script

The tool has found and extracted an embedded VB script. This script doesn't seem to be reaching out to any external sources, like, we've seen in a previous example. Instead, it extracts 'text'(<w:t>) from each 'paragraph'(<w:p>) in the document, saves extracted data to a file and executes it.

Extract from malicious VB script (no execution part included)

At this point we know there is an executable file hidden in this document, but since it's represented as text, 'OfficeMalScanner' tool will not detect it. The screenshot below shows an example of the paragraphs and the text stored in them that reassemble the executable file.

'word/document.xml' file view in 'XML Explorer' tool

The following simple Python script can help to reconstruct the file from the text strings.

import zipfile, re

def saveFile(filename, content):
    fo = open(filename, "wb")
    fo.write(content)
    fo.close()
    return

def main(inputFile, outputFile):
    docxFile = zipfile.ZipFile(inputFile)
    textContent = docxFile.read('word/document.xml')
    textContentInOneString = re.sub('<(.|\n)*?>','',textContent)
    bytesOnlyRegexGroup = re.search(re.escape("&amp;H") + ".*[a-zA-Z0-9]{2}", textContentInOneString)
    bytesOnly = bytesOnlyRegexGroup.group(0).replace("&amp;H","").decode('hex')
    saveFile(outputFile, bytesOnly)

readFrom = "C:\\infected\\27.05.2014\\Law Society message.docx"
saveTo = "C:\\infected\\27.05.2014\\extracted.bin"

main(readFrom, saveTo)

Checking the extracted file.

extracted file header

Target confirmed. Further info on the file is available on VT.

Other files contained in Office Open XML file structure that might be useful during an analysis

'\[Content_Types].xml' file view in XML Explorer tool

'[Content_Types].xml' file holds the list of all the content types used in the document.

'\word\_rels\document.xml.rels' file view in XML Explorer tool

'\word\_rels\document.xml.rels' file contains details about any embedded elements. In the example above it shows 4 embedded OLE objects. These are not necessarily malicious objects. Anything embedded into a DOCX file is stored as an OLE object. These objects can be found in '\word\embeddings' folder and can be analysed with 'OfficeMalScanner' tool. If the tool finds nothing suspicious 'SSViewer(Structure Storage Viewer)' utility can be used to extract the content of an OLE object for further analysis. The screenshot below shows an OLE file opened in SSViewer tool. OLE file components can be extracted and saved as a data stream file.

extracting content of an OLE file using SSViewer tool

The content will be saved into a file with '.stream' extension. Further file header analysis is required to determine the file type. In this particular example, the extracted content turned out to be WMF(Windows Metafile) file.

example of a file extracted from an OLE object

Saving a stream to a file will not always reconstruct the original file. The snapshot below shows a stream extracted from an OLE object that was embedded into DOCX file.

example of a stream file extracted from an OLE object

Apart from showing the location where the embedded file is stored on the originating machine, note the 'MZ' and 'This program must' strings. It's safe to assume that the embedded file is an EXE file, but in the current format it has some extra bits. To be able to restore the original file from the saved stream, we need to remove the data preceding the EXE header and the extra data at the end. Where the preceding data is not a problem and simply removing everything up to 'MZ' will give us the beginning of the original file, dealing with the extra bit at the end might be tricky. One of the ways to deal with it is to use 'PEStudio' tool.

'overlay' detected in PEStudio

PEStudio has detected some extra bytes(overlay) starting at offset 0x00322E00. Now we need to find the offset address at the end of the stream file and remove the overlay.

the end of the extracted stream containing the overlay

Once the extra data is removed, the original EXE file is fully restored and can be analysed further. If for whatever reason we want a copy of the overlay data PEStudio can be used to save it into a file.


saving 'overlay' to a file

extracted 'overlay' file


Hope these tips are helpful.