Reversing – Pending Investigations

AUCTF Reversing Writeups

I thought it was time for a reversing writeup involving a little Python and Cutter (radare2 GUI) legwork; so I picked 2 binaries I did during AUCTF 2 weeks ago. I think you can still get the binaries from the site.

1. Sora

A nice little Kingdom Hearts reference to start us off. This binary was pretty simple; it asks for a key as input, mashes the key up in an encrypt function, and compares it to a ‘secret.’

It’s good practice for making keygens, or really easy if you have a template and decompiler. Let’s start by analyzing the main function:

Decompiler View (love that cutter uses the Ghidra decompiler).

So as you can see, we want to get to the print_flag function. Thus we want a return value from the encrypt function that isn’t zero.

Let’s take a closer look at encrypt in the decompiler:

Looks kind of nasty. We have our input string arg1, this thing obj.secret, a lot of arithmetic transformation, and 2 possible return values. There’s also variable var_18, which is the iterator. Var_18 keeps incrementing, but as we can see, if it makes it past uVar1, which is the length of obj.secret, we get a return 1 (which we want).

Let’s examine the break condition:

We don’t want to break out of the loop because that causes a return 0. It’s a little hard to read, so let’s break it into pieces (or if you follow the complicated arithmetic, feel free to skip to The Secret):

(char *)(arg1) – this is the first character of our input string

(char *)(arg1 + (int32_t)var_18h) – this is a character in our string chosen by the iterator var_18h; if our string is “ABCD” and var_18h is 2, the current value of this expression is “C”.

(char *)(arg1 + (int32_t)var_18h) * 8 + 0x13) % 0x3d + 0x41 – the character in our input string gets multiplied by 8, that product gets added to 0x13, the result modulo’d by 0x3d and that result added to 0x41.

(int32_t)*(char *)(arg1 + (int32_t)var_18h) * 8 + 0x13) % 0x3d + 0x41 != (int32_t)*(char *)(int32_t)var_18h + _obj.secret))

The statement above is the full expression. The character in our input string is transformed by those operations and compared to the character at the same position in the secret. If the two do not match on any character, we break the loop and fail.

Sorry if all those steps convoluted the problem, but I think it’s good to write for beginners.

The Secret

This secret’s pretty easy to find: switch to the disassembly and double click on the use of _obj.secret:

So we have the secret; it’s “aQLpavpKQcCVpfcg”. We need a string, that when mangled in the way described, matches this secret. So let’s make a keygen.

The Keygen

I don’t know how other people make keygens, but I usually use a while loop and make an alphabet, input the secret, and have an empty string that becomes the key. We’ll iterate through the alphabet, mangle each character according to the algorithm, and check to see if it matches the current character in the secret. If it does, we add it as an element in the key, and keep going until our key is the same length as the secret.

Since sora is an interactive binary, I’m gonna assume that only printable characters can be inputted. So I used the string.printable constant from the string module.

Okay, enough teasing; here’s the code:

#!/usr/bin/python
import string

alphabet = string.printable
ciphertext = "aQLpavpKQcCVpfcg"
decrypted = ""
i=0
while True:
        if (len(ciphertext)<1):
                break
        x = ord(alphabet[i]) #ord turns the char into a number, then we mangle it
        x*=8
        x+=0x13
        x%=0x3d
        x+=0x41
        if (chr(x)==ciphertext[0]): #don't forget to turn the number back into a char
                decrypted+=alphabet[i] #add the matched char to the key
                ciphertext = ciphertext[1:] #I remove the front char from the ciphertext to increment 
        i+=1
        if (i>=len(alphabet)):
                i=0
print(decrypted)

So there it is. I prefer to remove the first character of the ciphertext with string slicing (string[1:]) each time a match is found, so I don’t have to iterate both the alphabet and the ciphertext.

When we run our keygen sorakey.py, it spits out a key pretty much immediately, and we can test our key against the sora binary:

The text we get back from sora means the key was accepted! And it works on the server; I’ve tested. So that’s one challenge down 🙂

2. Don’t Break Me

The next challenge is similar but a bit more involved.

It also looks for a key to validate. I know what you’re thinking: Are those hex bytes the key? Sadly, no. But they do make a cool message:

So if we examine the main function for dont_break_me, we see that there’s more going on than last time:

So in brief, our input is scanned into acStack8224, stripped of its newline, encrypted and then compared to the result of get_string. This function takes a pointer to arg_8h and fills its buffer with the secret. But if we look at get_string, the secret string is built at runtime and we can’t see it in a disassembler:

There’s a debugger check too, so if we debug it we’ll have to patch some jumps. Right? Well, fortunately there’s a way around it, and that way is called ltrace. ltrace runs binaries and intercepts calls to imported libraries; in this case, the output of strcmp is especially useful to us:

It might be hard to read, but I input “test”, it’s mangled into “VAEV” and compared with the string “SASRRWSXBIEBCMPX”. That’s our ciphertext. So ltrace saved us a lot of time!

2 Roads: Encrypt or Decrypt?

Finding the ciphertext was the easy part. Now we have to examine the encrypt function to see how input is mangled. But before we do that, a little Easter egg from the challenge creators. They included a decrypt function! It’s never referenced/used by the code, so it really is just extra. What we were going to do was make a keygen to find the winning combo, but we could just take the ciphertext and rewrite decrypt in Python. We’ll do that at the end.

You’ll see that the while loop in encrypt looks pretty similar to sora. The iterator var_ch increments up till the length of our input string. Characters in our input string are transformed. This time, instead of checking against the character in another string, each mangled character is just appended to an output string (iVar3). But how is it mangled?

Keygen Against the Ciphertext

One complicating factor is that encrypt uses arguments passed in from main (see the use of the highlighted arg_10h and also arg_ch:

We need to go back to main to find out what values are passed:

So, arg_ch is 0x11 and arg_10h is 0xC. Now can substitute these values into the keygen.
Let’s redo our keygen from sora and change the arithmetic transformations:

#!/usr/bin/python
import string

alphabet = string.printable
ciphertext = "SASRRWSXBIEBCMPX"
decrypted = ""
i=0
while True:
        if (len(ciphertext)<1):
                break
        x = ord(alphabet[i]) # changes start here
        x-=0x41
        x*=0x11 # this was arg_Ch
        x+=0xc # this was arg_10h
        x=x+int(x/0x1a)*(-0x1a)+0x41 # changes end here
        if (chr(x)==ciphertext[0]):
                decrypted+=alphabet[i]
                ciphertext = ciphertext[1:]
        i+=1
        if (i>=len(alphabet)):
                i=0
print(decrypted)

And see if we get our key!

Well, that’s not the prettiest key, but it works. The thing about a keygen is that multiple values may be accepted. You can constrain the value to just letters or numbers or any smaller set by changing the alphabet you use, but keep in mind there may not be a key in those constraints. But anyways, let’s try the other route; re-implementing decrypt function in python.

Decrypt the Ciphertext

When we re-examine decrypt, one additional call that was not in encrypt stands out (see the highlighting):

We see that arg_ch is passed into this new function called inverse, and the result (iVar3) is used in the arithmetic transformation. So in order to re-implement decrypt, we’ll have to re-implement inverse(arg_ch).

I honestly have no idea why it’s called an inverse function and didn’t want to spend a ton of time on math. But regardless, this function processes arg_ch, which is the value 0x11. Once all the pieces are put together, it looks like this:

The Decryptor

#!/usr/bin/python

secret = "SASRRWSXBIEBCMPX"
decrypted = ""
def invert(x):
        i=0
        j=0
        while (j<0x1a):
                if ((x*j)%0x1a==1):
                        i = j
                j+=1
        return i

for i in secret:
        y = ord(i)
        y+=0x41
        y-=0xc
        y*=invert(0x11)
        intermediate = y+int(y/0x1a)*(-0x1a)+0x41
        decrypted+=chr(intermediate)
print (decrypted)

It’s a shotgun script for sure, but simple enough. Let’s see what happens when we decrypt the secret with our script decrypt.py:

Ooh and that’s a much more satisfying key, IKILLWITHMYHEART. And, you probably guessed it: if we constrain our keygen to using the alphabet string.ascii_uppercase, we’ll get this key generated 🙂

Well that’s it! A bit of a long blog post for 2 fairly simple rev challenges, but I’m just happy to be posting again. I’ve been doing a lot of forensics lately, so I’ll likely be posting rev and malware for the next couple of weeks. Thanks for reading!

Malware Analysis from Virustotal: DeepLinks PDF Exploit

Last week, I went to a local security meetup for the first time. That coupled with some recent networking and building connections on Twitter has been super motivating for me. I now have a lot more things to analyze from different repositories, and seeing pros and veteran security people post regularly on Twitter motivates me to get something out. So this next sample comes from VirusTotal (they were kind enough to give me an academic account):

Malicious PDFs in General

PDFs are organized in a way that makes cross references quite visible. Streams and different types of objects are easily parsed from text and are generally quickly recognizable when you know what you’re looking for.

Good objects to look out for in malicious PDFs are OpenActions, JavaScript, Automatic Actions, Embedded Files and Embedded Flash. You can open PDFs in a text editor to see objects, but I’m a fan of Didier Steven’s PDF Tools (which come, fortunately, preinstalled on the FLARE VM I use).

Diving In

The first tool I ran was pdfid, which parses the names of known PDF objects to give an overview of a PDF’s contents:

As we can see, this file includes several JavaScript objects, an embedded file, and an OpenAction, which definitely warrant further investigation. To look at individual streams of interest, I used pdfstreamdumper, a tool from Sandsprite.

Only the bottom window is really relevant here; the top window is just gibberish that gets displayed when the PDF loads.

The object in the main window may be nonsensical, but I used a cool feature of the tool to search for all of the Javascript objects and see them at a glance (visible in the bottom window of the tool). There aren’t too many objects to look through in this case, but it’s good to think of scenarios with tons of objects and how one would efficiently search through them.

The object I’m most interested in at this point is the one with the OpenAction which also seems to contain a function, although the second object with the embedded file definitely seems relevant. So, let’s take a look:

The OpenAction object and its encapsulated function.

This OpenAction may look a little weird, but it’s barely enough obfuscation to even fool an automated system. The things to take notice of are the keys, like [‘cName’] and [‘nLaunch’], which are standard parameters you can look up. In this case, the big picture is that the variable hadapet is used to open a file called ‘downl.SettingContent-ms’ with the ‘exportDataObject’ function. nLaunch refers to the way the file is exported/opened, and cName refers to the filename.

Now, where can we find the opened file, downl.SettingContent-ms? In order to do that, we need merely go up to the 2nd object.

Object 2 doesn’t seem to contain much, but it points us in the right direction to find the file that gets launched. Object 2 is a file specification describing Object 1, which you can see from the line “/F 1 0 R/UF 1 0 R.” We can see that Object 1 is described as being the file we are looking for, downl.SettingContent-ms. So let’s focus on that object, the embedded file which is the meat of the exploit:

Here we have what appears to be an XML-formatted file which holds the downloader function of the malware. Within the DeepLink tag is the main exploit, which uses Windows Powershell to download an executable from a remote server, then creates a process using that executable. Clearly, remote code execution is enabled by this DeepLink tag, because otherwise you usually wouldn’t be able to call Powershell from inside an XML file. You can read more about the exploit method here.

Detection Rates:

Fortunately, this PDF is now well detected by antiviruses on VirusTotal and has an incredibly low community score. However, on reverse.it, there appeared to be a detection rate of only 5%, at 3/57 antiviruses flagging the file. I wanted to see what was being flagged by reverse.it’s behavioral analysis, and I did note the embedded file, plaintext IP and WMIC reference were indicators, but I didn’t see much on the DeepLink tag or use of Powershell.

IOCs:

Command & Control/URL: hxxp(:)//169.239.128.164

MD5: 6354A39C95A58B85505E6C8152443100

Strings: DeepLink, Powershell, .exe

Next Time

I’ve also been working on some Windows PE malware and will make another post for that soon. I’ll be putting a lot of time into Practical Malware Analysis, now that I’m done with technical interviews for the time being. Stay tuned and thanks for reading.

Dionaea (Honeypot) Update

After spending many hours on my old and slow iPod trying to install the nepenthes honeypot program through a terminal emulator, I realized that it was a terrible idea and moved on. I ended up installing Dionaea on my Raspberry Pi instead, using a client-server deployment method called Modern Honey Net. If you plan to follow the Raspberry Pi deployment guide, I have tips at the end.

With this method, a sensor like my Raspberry Pi reports attacks and submits payloads to a central server. I decided to just keep a VM running on my desktop to be the server. I had to troubleshoot network problems and debug conflicts between services already running on my VM and the MHN server program, but in the end it was worth it:

Here we have the first 2 attacks on my honeypot (1/min so far).

MHN’s guide is extremely helpful and seems very straightforward, but pay close attention to the deployment script for Dionaea on the Raspberry Pi. I searched for hours to figure out why my install wasn’t completing; it turns out one of the main problems is that the RPi deployment script downloads an old version of openssl that doesn’t exist in the repositories anymore. I had to go 4 updates up to find a version of the library that worked. I might need to contact the developers about that… (Update: there was another bug with one of the files being out of date so I had to reinstall the honeymap module. Details at https://github.com/threatstream/mhn/issues/619.)

In other news, I’m going through some interesting technical interviews that I’ll be taking a pit stop to prepare for. I’ll be going through microcorruption because I think I’ll have to be able to hack an embedded device. If I do write-ups for microcorruption, I’ll definitely have a spoiler alert.

0x00637961

Short Post: Weeks in Review

Hey guys,

Just wanted to do a short post to update you all on things. The past 2 weeks have been really eventful. On January 15th, I went to the MIT AI (Artificial Intelligence) Policy Conference to learn about how AI and Machine Learning are currently being used in research applications. I also got a chance to see how policymakers and the media perceive AI and its potential. The conference covered applications on everything from healthcare and privacy, to transportation, to national security. I’d like to say I was surprised by the lines of discussion, but it’s clear that technology drastically outpaces the means to legislate and legally understand implications. As one gentleman said, “if the research community doesn’t define AI [and its practical consequences], lawyers will.”

This past week I also got the opportunity to attend the Cybersecurity Insight event, hosted by MIT Sloan in collaboration with Kaspersky Labs. I was really excited to see the presentations, as they were talking about Critical Infrastructure Security, which is a big interest of mine. Unfortunately I had to work, so I missed the information-based presentation, but I got off just in time to attend their CTF! The challenges were really fun: I learned more about exiftool and image metadata, and I got to show off my knowledge of memory forensics. It just so happened that their challenge was kind of similar to the one from the last blog post :). I received an archive with a strange file (my Mac identified it as a MacOS binary, which was dubious). Since the file was a gigabyte I decided against reversing it and put it right into Volatility with the imageinfo plugin. When it turned out to be a Windows 7 memory image, I was off to the races.

The challenge was to first find the suspicious process that had been injected and look through its File handles (with the handles plugin to find the file it had written to the Desktop, which contained the first half of the flag. This half-flag was encoded in Base64, which would be key to recognizing the second half of the flag. The second challenge was to dump that malicious process from the image and do a little reversing. The executable was compiled in .NET, so decompilers were readily available, if not easy to install on my Mac. With the code decompiled, one could see that the malware iterated through the registry to find a given key. By using the hivelist plugin from Volatility, you could find several suspicious subkeys (Flag and Notflag, for example). But only one subkey appeared to be in Base64 encoded format. After combining the two halves of the flag in a Base64 decoder, the flag was revealed! That was just one of the challenges available, but definitely my favorite. It was a really fun event overall and I’m glad I went.

RingZer0 CTF Malware Analysis: Capture 2

Welcome back. I recently found the RingZer0 CTF website while looking for some malware analysis/RE challenges. CTF-style malware analysis challenges can be harder to find online; I’d definitely like to see a Vulnhub for compromised machines, where the challenge is to recreate the infection timeline, but for now I’ll settle.

Capture 2 seems like an interesting challenge because the given file is a memory image. I’ll run it over to my RemSift machine (Remnux and SIFT installed on Ubuntu) and hopefully expand my memory forensics knowledge.

I ran volatility’s `imageinfo` plugin on the image to identify the OS and version with a search of the KDBG structures.

It appears to match the profile for Windows XP Service Pack 2. This is good because if I have to pull malware from this image and analyze it, which is likely, I’m much more likely to understand Windows libraries.

Question 1: What is the CVE of the exploited vulnerability?

Well, that’s a tough question to begin with. CVEs are very specific identifiers for exploitable vulnerabilities, and there are thousands of them. If I’m lucky, I can look in the command history for the memory image using volatility’s plugin cmdscan, and maybe the attacker will have used a metasploit module with a CVE I can look up.

Except I forgot cmdscan only works for Windows 7 and above. Let’s try the consoles plugin instead.

Well unfortunately, the plugin didn’t give me a command history as it might on a Windows 7 machine. We have a process ID and we could probably get some strings out of memory, but I feel like this might be a dead end. Maybe we can work backwards from more evidence to get the CVE, so I’ll move on.

Question 2: Process Name and PID of the Exploited Process

Okay, I might know how to do this. A process that has been exploited by malware should show signs of compromise, including the loading of malicious libraries or remapping of memory addresses. One of the easiest ways to use a process to call malicious code is writing malware to a place where it can be executed in memory. The first way to look for evidence of this in memory is looking at the process maps for containers that have the permissions Read, Write, and Execute set. This information can be found in a process’s VAD, or Virtual Address Descriptor.

Many processes may have memory containers with RWX set, so searching through all the VADs in memory could be tedious. Fortunately, there is a Volatility plugin that searches for VADs with RWX set on memory containers; it’s called malfind.

A process with the VAD protection RWX; in this case, malicious.

As you can see, malfind is extremely helpful. It displays the process name, process ID number and the address in question, in case we want to dump the memory at this location for further analysis.

Fortunately, it also displays the beginning of the data at the location in hexidecimal and ASCII, and you can see the ‘MZ’ translated from the hexidecimal value 4d 5a. MZ is the file header for a Windows executable, which means the data at this location could be a malicious executable injected into svchost.exe. Plus, svchost is a commonly used process for hijacking and injection because there are usually several legitimate instances running on a given machine. I submitted this as the answer, and jackpot! Let’s move on:

Question 3: Connect back IP and port?

This question is likely asking about the network activity of the compromised machine connecting back to the attacker. More than likely, a backdoor of some kind was used; perhaps a reverse shell. Let’s run the memory image through the gauntlet of volatility’s network plugins, starting with connections and connscan:

`connections` didn’t display any results, but `connscan` pulled through. This is likely because the connections listed by `connscan` were terminated by the time the memory image was acquired.

As you can see, there were several remote network processes occurring on different ports. However, only one of them matches the Process ID of the compromised process svchost.exe, which is 1092. The infected machine is connecting back to 10.0.75.16 (looks like a computer on the same internal subnet) through port 21. Port 21 is commonly used by FTP, the File Transfer Protocol, and is another indication this may be our reverse shell to the attacker.

And we were correct. Moving on…

Question 4: What is the Victim’s User Password?

The first thing that comes to mind when thinking about extracting passwords from memory is a post-exploitation tool called Mimikatz. Used offensively, it exploits the lsass.exe process with malicious code and reads passwords from memory structures. It was recently adapted into a Volatility plugin for use on offline memory dumps, which will be helpful to us here. Let’s run it.

Running the mimikatz plugin on the memory image.

No dice on that. Well, maybe it’s am issue with my RemSift installation. I pointed the mimikatz plugin at the whole memory image, but there’s another approach where you dump the memory of the lsass process and point mimikatz at it instead. Unfortunately, I tried this and found that volatility doesn’t have support for minidumps (the format of the process memory dump). This means I’ll need to take it to a native installation of Mimikatz, on Windows.

Using Mimikatz against the minidump on my FLARE (Windows 7) VM.

No luck there either. Maybe we can try a different approach to recovering the passwords, although I can’t imagine why Mimikatz would be failing. Let’s look for password hashes in the registry hives.

I’ll use the volatility plugin hivelist to find the memory addresses of the SYSTEM and SAM hives, which hold the hashes to the passwords we want. After that, there’s a plugin called hashdump that parses the hashes. Let’s try that strategy:

Using Volatility’s `hashdump`, passing in the offsets of the SYSTEM and SAM hives.

Okay, good signs: we see the user and hash for ‘victim’, the account we need. Let’s see if the hash is crackable.

I used hashkiller.com, an automated hash decrypting website. Some of you may use CrackStation, but I’m glad I tried another site, as it didn’t work for me.

And it worked! The decrypted password was correct.

Well, I think that’s enough for now; I’ll be attempting more challenges like these in the next CTF-related blog post. Thanks for reading!

Practical Malware Analysis Chapter 3

This week I’m getting back to Practical Malware Analysis after looking into some honeypot options. But now I need to get back on the grind; I’ll come back to that later.

Chapter 3 of PMA (as I’ll refer to it) is a dynamic analysis refresher, helping aspiring analysts develop a workflow for finding those host-based and network indicators. I won’t repeat all of their write-ups, which are quite detailed, but I will outline my dynamic analysis process and explain why I picked that order. But first:

My Lab

I’m using Oracle’s VirtualBox (yeah, I know) with a host-based adapter for my analysis. Currently I’m working with a Windows XP machine as my analysis machine, and a Remnux machine for network forensics.

In order to simulate network traffic for malware I set up the Remnux box as the DNS server for the Windows box, and of course they are on the same subnet so they can communicate. PMA recommends using ApateDNS, but I prefer just going through Control Panel and making it a lasting change. Besides, it’ll just be one less program to open later on a crowded Dynamic Analysis screen.

Changing the DNS server through the Control Panel.

The final important thing one should do before analyzing any sample is to snapshot, saving the state of the virtual machine (VM). But now to the meat of the matter:

Dynamic Analysis Workflow

Start Process Explorer and Process Hacker
Start Netcat Listeners (ports 80, 443)
Start Process Monitor (Procmon)
1st Registry Snapshot (Regshot)
Inetsim, Wireshark
Run malware
Analyze Process Explorer, Process Hacker
Wait 5 minutes if it has not elapsed
2nd Regshot
End Procmon
Analyze Wireshark, Netcat, Inetsim, Procmon, Regshot
Revert snapshot

Explanation

Process Hacker and Process Explorer are very useful for runtime analysis. They don’t generate tons of logs like Procmon, so it’s fine to run them first. I start Procmon after that because its filtering capabilities can eliminate the noise of later programs. However, Regshot has fewer capabilities to deal with noise. So I prefer to do as few operations between Regshots as possible.
I start Inetsim and Wireshark right before executing the malware to avoid any noise from the Windows box attempting to look for network shares, request updates, or use NetBios.
I prefer not to end Procmon or Wireshark captures until sufficient time has passed. For example, Lab 3-2 waited a minute before executing.

Things I Learned

One tip from PMA that was especially helpful was in the capabilities of Process Explorer. During Lab 3-2, you use rundll.exe to execute the malware and eventually an svchost.exe is spawned that uses that DLL. But as many ~~geeks~~ people know, there are often many svchost processes running simultaneously. Of course, there are many ways to narrow down which process used the DLL (my first instinct was to check the properties of each and search through the handles), but few are as quick as:
- Process Explorer: Find > Find Handle or DLL

Well, that’s it for the first post! Feel free to leave me some feedback and I’ll post an update when I finish Chapter 4 (or I’ll get sidetracked with some CTF problem).