Shattered Tablet and The Basics of Ghidra Scripting
Why I started this.
Recently I was doing one of the reverse engineering challenges on HackTheBox (not a paid promotion(HTB, if you're reading this we can fix that)) and could have solved the challenge by doing some work on a piece of paper but instead decided to turn it into an over engineered learning opportunity on scripting with Ghidra.
The challenge
What started all of this comes from Shattered Tablet, a challenge that states "Deep in an ancient tomb, you've discovered a stone tablet with secret information on the locations of other relics. However, while dodging a poison dart, it slipped from your hands and shattered into hundreds of pieces. Can you reassemble it and read the clues?"
After analyzing the binary, the decompiled main function looks thus:
undefined8
As we can see, an array is created with a length of 64 (0x40) bytes. fgets is then called to read 0x40 from stdin and store those bytes into the local_48
array. Then, a complicated if statement compares the input stored in local_48
byte by byte in an unsorted way.
One could solve this by hand by finding the first index within the if local_48[0] == 'H'
and jotting that down on paper, then going to the next index local_48[1] == 'B'
and repeating this processes until we have the flag.
But that's boring and doesn't make you better. So let's do something unnecessary and infinitely harder.
Java vs. Python
Ghidra has two language options for interacting with the API. Java, which is the language Ghidra was created in, and Python (kinda.) Selecting Java as my scripting language for this project really isn't about "why Java," but "why not Python." Python holds a special place in many programmers, hackers, and reverse engineer's hearts. It's approachable, often time it's the first language people learn, and there's a gorillian libraries out there that usually just take care of the thing you want to do. That being said, readers should know that I'm the biggest hater of Python. If Python has no haters then I am dead. There's many reasons for this but I'll keep them in scope of Ghidra to limit the amount of ire invoked by Python fanatics.
- The official API doesn't support Python3. Python2 support ended in 2020.
- It's slower.
- The API doesn't make sense in a pythonic way. Consider the code:
=
=
=
Instantiating an object and using methods to accomplish all the work is 90% of Java.
Essentially you're taking a performance hit, taking a risk on something going wrong that would be impossible to fix during the conversion of python2 -> Jython -> Java workflow just to almost write Java anyway.
Setup
First, we'll setup the Ghidra/Eclipse integration so we can use the LSP features. Be warned if you're using Arch Linux or a distro that uses the AUR repos that you're missing some necessary files and will need to install Ghidra from the (official repository)[https://github.com/NationalSecurityAgency/ghidra].
First, ensure that you're using the appropriate Java version for GhidraDev. You can find that in <path_to_ghidra>/Extensions/Eclipse/GhidraDev/GhidraDev_README.html
. as of 4.0.0 GhidraDev requires JDK 21. If you don't do this and are running something like JDK 17, there won't be any obvious errors. Just things silently breaking on the back end.
Next, in Eclipse, install the GhidraDev plugin.
In the help menu, click on "Install New Software"
Then, click "Add..." and "Archive..."
And browse to the
GhidraDev-*.zip
file.
After a few checks on trust and a prompted restart of Eclipse, we should be ready to get to work.
In Ghidra, click "Window" and open "Script Manager"
As you can see here, Ghidra comes with a few premade scripts but none that fit our current predicament. In the top right of the window, click "Create Script" and select "Java"
Next it'll ask you where you want to save the Scripts and what filename to save it as. I'll name it SortVar48.
Finally, to see if the intergration as worked, we right click our new script and select "Edit with Eclipse."
If everything works, we should see that the LSP can poll Ghidra specific information to give us nice IDE features. As a sidenote, I'm using Hyprland, which both Ghidra and Eclipse seems to hate, so my colors or fonts might look very different from yours.
Do it.
With everything set up, we can get to work. There's a few tags that help improve our script management and quality of life via tags.
@author
Pretty self explanatory. This will set the name of the author for the script manager to display.@category
Determines where the script appears within the category tree.@keybinding
Sets a keyboard shortcut for the codebrowser window to run the script. Something like//@keybinding K
You must be in the Listing window for this to work (as far as I can tell)@toolbar
Places a button on the toolbar that executes the script. Can be used with a custom image too.//@toolbar /path/to/image
First, we'll try to decompile the "main" function and print the result into the scripting console.
//@author numonce
//@category CTF
//@keybinding K
;
;
As shown above, we're creating a DecompInterface and hooking it into to our current program. We then get the main function by getting a list of all functions that have "main" within their name space and grabbing the first (and only in our case) one. Lastly we decompile the function and store it into a variable to be printed.
Going back the Ghidra and hitting 'k' we see the following in out scripting console.
Now that we have the decompiled function into a variable, we can just use regular ol' regex to grab the local_48's within the if's, sort them by the hex within the brackets, and grab the assosiated character.
I'll be the first to admit that I'm awful at regex, and Java has it's own flavor, so this part took a while. I ended up with the pattern: local_48\\[([0-9a-fA-FxX]+)\\]\\s*==\\s*'(.)'
which matches on the entire string local_48[0x22] == '4'
, the hex within [0x22]
, and the character at the end '4'
in different groups.
<-- SNIP -->
Pattern pattern ;
List matches ;
Matcher matcher ;
while
for
}
}
<-- SNIP -->
First, I created a class Match to hold the data index [0x22]
the match local_48[0x22] == '4'
, and the character '4'
, finally creating some methods to read the individual data for debugging. Then, I create the aformentioned pattern, and empty list to hold the Matches
, and execute the regex against the decompiled funciton. In a while loop, I parse out the data and append the Match
type to the matches
list. To check if all of this is working, I print the match
and character
members for all the Match
types in matches.
Finally, I want to take the index member and sort the matches
in numerical order. Then, I want to grab all of the characters and concat them into a flag variable and print it.
;
;
;
;
;
;
;
;
Running this script results in thse simple: