Introduction
Extracting specific content from a file can often become a complicated task, especially when the desired data starts at an irregular byte offset. While the dd
command is a versatile tool for such operations, it can be inefficient if not used correctly. In this tutorial, we’ll walk you through a step-by-step process to extract content from a file at non-standard byte offsets, offering an efficient approach that minimizes time and resource usage. We’ll use the VirtualBox guest ISO as our example file, and demonstrate how to extract a tar archive embedded within it.
Tutorial Start
In this tutorial, we will explore an example using the VBoxLinuxAdditions.run
file located on the VirtualBox guest ISO.
First, let’s examine the file using binwalk
:
binwalk /run/media/VBoxLinuxAdditions.run
The output may look like the following:
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 Executable script, shebang: "/bin/sh"
...
18888 0x49C8 POSIX tar archive (GNU), owner user name: "estAdditions-amd64.tar.bz2"
We’re interested in extracting the tar file, which starts at byte 18888.
Using dd with Single-Byte Blocks
One way to achieve this is by using the dd
command like so:
dd if=/run/media/VBoxLinuxAdditions.run bs=1 skip=18888 of=some.tar.bz2
This command skips 18888 bytes and copies the rest of the file into some.tar.bz2
. However, this method is slow due to the small block size of 1 byte.
Using dd with a Larger Block Size
You can speed up the process by setting a larger block size. For example:
dd if=/run/media/VBoxLinuxAdditions.run bs=18888 skip=1 of=some2.tar.bz2
In this case, we’ve set the block size to 18888 bytes, making the operation much faster.
When the Target Offset is Not Divisible by the Block Size
What if the desired file starts at an offset that is not divisible by a convenient block size? You can use multiple dd commands in a sequence.
Calculations First
First, find a common block size smaller than the target offset. In this example, we can use 512 or 1024 bytes; we’ll go with 512 for demonstration purposes.
We can use “bc” to run quick calculations in Linux. Setting scale=5 gives allows the calculator show up to 5 decimal places, otherwise the output will be without decimal and might be confused for an integer when in fact its a rational number. To calculate the number of blocks to skip, divide 18888 by 512:
$ bc
scale=5
18888/512
36.89062 # this means we need 36 blocks of 512 bytes
512*36
18432 # this is how far this next dd will skip
18888-18432
456 # that means that our tar file will begin after 456 bytes
We need to skip 36 blocks of 512 bytes to get close to our target offset of 18888 bytes:
dd if=/run/media/VBoxLinuxAdditions.run bs=512 skip=36 of=step1.out
This command is much quicker than using a single-byte block size. However, the output is not yet a proper tar file.
Fine-Tuning with a Second dd
After examining step1.out with binwalk, we find that the tar file starts at an additional offset of 456 bytes:
binwalk step1.out
We can then skip these additional bytes with a second dd command:
dd if=step1.out bs=456 skip=1 of=Vbox.tar.bz2
Verifying the Extracted Content
Finally, let’s verify that the extracted file is a legitimate tar file:
tar tf Vbox.tar.bz2
The output should list the files contained within the tar archive, confirming that the extraction was successful.
Conclusion
While this method may appear complex, it’s a powerful way to extract content from files starting at non-standard offsets. It also offers a faster alternative to using a single-byte block size with dd.