DISCLAIMER: The choice of tools is based on a personal preference. The same results can be achieved using similar set of tools. This is not a step-by-step guide - these are just some tips.
If you're to try the described below you'll need to have the following skills and tools:
Skills
- Fair understanding of Object Linking and Embedding (OLE) file structure
- Fair understanding of Office Open XML file structure
- OfficeMalScanner - MS Office forensic tool
- XML Explorer - XML file viewer
- SSViewer - Structured Storage Viewer
- PEStudio - Windows executable file scoring tool (optional)
- Text/Hex Editor of your choice
NOTE: The file samples used in this blog post were sourced from phishing emails roaming around at the end of March 2014.
The fastest way to check if an OLE file has any malicious content embedded is to run it through 'OfficeMalScanner' tool. There is a couple of option keys to help you do that - 'scan' and 'info'. There is also a couple of switches available - 'brute' and 'debug' - that can further increase the chances of finding malicious content.
OfficeMalScanner usage
The screenshots below shows the tool output for a DOC file that was attached to a phishing email. Taking that we do not know what hides inside, it makes sense to analyse the file using both options - 'scan' option first.
OfficeMalScanner 'scan' option output
No suspicious content has been found, but note the comment at the bottom of the output - the tool is recommending to analyse the file using 'info' option key.
OfficeMalScanner 'info' option output
Now the tool detected an embedded VB script and dumped it into a folder. Quick glance at the script shows that it will download and execute a file.
part of extracted VB script
'OfficeMalScanner' tool can detect and extract embedded EXE files. The screenshot below shows an example of the tool output when it detects an EXE file embedded into a DOC file.
OfficeMalScanner output - detected and dumped embedded EXE
'OfficeMalScanner' tool can also handle Office Open XML files. Below is an example of the tool output when used with 'inflate' option.
OfficeMalScanner 'inflate' option output
NOTE: Simply changing an Office Open XML file extension to 'zip' and opening the file with an archiving tool of your choice will allow you to extract its file structure as well.
The decompressed files will be stored in 'DecompressedMsOfficeDocument' folder in user's '%TEMP%' location. In this particular example, the tool highlighted one file - 'word/vbaProject.bin' to be suspicious and suggested to run the tool against it using 'scan' or 'info' options.
OfficeMalScanner 'info' option output for embedded into DOCX file VB script
The tool has found and extracted an embedded VB script. This script doesn't seem to be reaching out to any external sources, like, we've seen in a previous example. Instead, it extracts 'text'(<w:t>) from each 'paragraph'(<w:p>) in the document, saves extracted data to a file and executes it.
Extract from malicious VB script (no execution part included)
At this point we know there is an executable file hidden in this document, but since it's represented as text, 'OfficeMalScanner' tool will not detect it. The screenshot below shows an example of the paragraphs and the text stored in them that reassemble the executable file.
'word/document.xml' file view in 'XML Explorer' tool
The following simple Python script can help to reconstruct the file from the text strings.
import zipfile, re
def saveFile(filename, content):
fo = open(filename, "wb")
fo.write(content)
fo.close()
return
def main(inputFile, outputFile):
docxFile = zipfile.ZipFile(inputFile)
textContent = docxFile.read('word/document.xml')
textContentInOneString = re.sub('<(.|\n)*?>','',textContent)
bytesOnlyRegexGroup = re.search(re.escape("&H") + ".*[a-zA-Z0-9]{2}", textContentInOneString)
bytesOnly = bytesOnlyRegexGroup.group(0).replace("&H","").decode('hex')
saveFile(outputFile, bytesOnly)
readFrom = "C:\\infected\\27.05.2014\\Law Society message.docx"
saveTo = "C:\\infected\\27.05.2014\\extracted.bin"
main(readFrom, saveTo)
Checking the extracted file.
extracted file header
Target confirmed. Further info on the file is available on VT.
Other files contained in Office Open XML file structure that might be useful during an analysis
'\[Content_Types].xml' file view in XML Explorer tool
'[Content_Types].xml' file holds the list of all the content types used in the document.
'\word\_rels\document.xml.rels' file view in XML Explorer tool
'\word\_rels\document.xml.rels' file contains details about any embedded elements. In the example above it shows 4 embedded OLE objects. These are not necessarily malicious objects. Anything embedded into a DOCX file is stored as an OLE object. These objects can be found in '\word\embeddings' folder and can be analysed with 'OfficeMalScanner' tool. If the tool finds nothing suspicious 'SSViewer(Structure Storage Viewer)' utility can be used to extract the content of an OLE object for further analysis. The screenshot below shows an OLE file opened in SSViewer tool. OLE file components can be extracted and saved as a data stream file.
extracting content of an OLE file using SSViewer tool
The content will be saved into a file with '.stream' extension. Further file header analysis is required to determine the file type. In this particular example, the extracted content turned out to be WMF(Windows Metafile) file.
example of a file extracted from an OLE object
Saving a stream to a file will not always reconstruct the original file. The snapshot below shows a stream extracted from an OLE object that was embedded into DOCX file.
example of a stream file extracted from an OLE object
'overlay' detected in PEStudio
PEStudio has detected some extra bytes(overlay) starting at offset 0x00322E00. Now we need to find the offset address at the end of the stream file and remove the overlay.
the end of the extracted stream containing the overlay
Once the extra data is removed, the original EXE file is fully restored and can be analysed further. If for whatever reason we want a copy of the overlay data PEStudio can be used to save it into a file.
saving 'overlay' to a file
extracted 'overlay' file
Hope these tips are helpful.