File Slicing: Extracting Data from Non-Standard Offsets Efficiently with Binwalk and dd

Introduction

Extracting specific content from a file can often become a complicated task, especially when the desired data starts at an irregular byte offset. While the dd command is a versatile tool for such operations, it can be inefficient if not used correctly. In this tutorial, we’ll walk you through a step-by-step process to extract content from a file at non-standard byte offsets, offering an efficient approach that minimizes time and resource usage. We’ll use the VirtualBox guest ISO as our example file, and demonstrate how to extract a tar archive embedded within it.

Tutorial Start

In this tutorial, we will explore an example using the VBoxLinuxAdditions.run file located on the VirtualBox guest ISO.

First, let’s examine the file using binwalk:

binwalk /run/media/VBoxLinuxAdditions.run

The output may look like the following:

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Executable script, shebang: "/bin/sh"
...
18888         0x49C8          POSIX tar archive (GNU), owner user name: "estAdditions-amd64.tar.bz2"

We’re interested in extracting the tar file, which starts at byte 18888.

Using dd with Single-Byte Blocks

One way to achieve this is by using the dd command like so:

dd if=/run/media/VBoxLinuxAdditions.run bs=1 skip=18888 of=some.tar.bz2

This command skips 18888 bytes and copies the rest of the file into some.tar.bz2. However, this method is slow due to the small block size of 1 byte.

Using dd with a Larger Block Size

You can speed up the process by setting a larger block size. For example:

dd if=/run/media/VBoxLinuxAdditions.run bs=18888 skip=1 of=some2.tar.bz2

In this case, we’ve set the block size to 18888 bytes, making the operation much faster.

When the Target Offset is Not Divisible by the Block Size

What if the desired file starts at an offset that is not divisible by a convenient block size? You can use multiple dd commands in a sequence.

Calculations First

First, find a common block size smaller than the target offset. In this example, we can use 512 or 1024 bytes; we’ll go with 512 for demonstration purposes.

We can use “bc” to run quick calculations in Linux. Setting scale=5 gives allows the calculator show up to 5 decimal places, otherwise the output will be without decimal and might be confused for an integer when in fact its a rational number. To calculate the number of blocks to skip, divide 18888 by 512:

$ bc
scale=5    
18888/512
36.89062      # this means we need 36 blocks of 512 bytes

512*36
18432         # this is how far this next dd will skip
18888-18432   
456           # that means that our tar file will begin after 456 bytes

We need to skip 36 blocks of 512 bytes to get close to our target offset of 18888 bytes:

dd if=/run/media/VBoxLinuxAdditions.run bs=512 skip=36 of=step1.out

This command is much quicker than using a single-byte block size. However, the output is not yet a proper tar file.

Fine-Tuning with a Second dd

After examining step1.out with binwalk, we find that the tar file starts at an additional offset of 456 bytes:

binwalk step1.out

We can then skip these additional bytes with a second dd command:

dd if=step1.out bs=456 skip=1 of=Vbox.tar.bz2

Verifying the Extracted Content

Finally, let’s verify that the extracted file is a legitimate tar file:

tar tf Vbox.tar.bz2

The output should list the files contained within the tar archive, confirming that the extraction was successful.

Conclusion

While this method may appear complex, it’s a powerful way to extract content from files starting at non-standard offsets. It also offers a faster alternative to using a single-byte block size with dd.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

infotinks

My Notes, Articles & Guides for Linux, Windows and Networking.

File Slicing: Extracting Data from Non-Standard Offsets Efficiently with Binwalk and dd

Leave a Reply Cancel reply