Uncovering GStreamer secrets

In this blog post, I’ll show the results of my recent security research on GStreamer, the open source multimedia framework at the core of GNOME’s multimedia functionality.

I’ll also go through the approach I used to find some of the most elusive vulnerabilities, generating a custom input corpus from scratch to enhance fuzzing results.

GStreamer

GStreamer is an open source multimedia framework that provides extensive capabilities, including audio and video decoding, subtitle parsing, and media streaming, among others. It also supports a broad range of codecs, such as MP4, MKV, OGG, and AVI.

GStreamer is distributed by default on any Linux distribution that uses GNOME as the desktop environment, including Ubuntu, Fedora, and openSUSE. It provides multimedia support for key applications like Nautilus (Ubuntu’s default file browser), GNOME Videos, and Rhythmbox. It’s also used by tracker-miners, the Ubuntu’s metadata indexer–an application that my colleague, Kev, was able to exploit last year.

This makes GStreamer a very interesting target from a security perspective, as critical vulnerabilities in the library can open numerous attack vectors. That’s why I picked it as a target for my security research.

It’s worth noting that GStreamer is a large library that includes more than 300 different sub-modules. For this research, I decided to focus on only the “Base” and “Good” plugins, which are included by default in the Ubuntu distribution.

Results

During my research I found a total of 29 new vulnerabilities in GStreamer, most of them in the MKV and MP4 formats.

Below you can find a summary of the vulnerabilities I discovered:

GHSL	CVE	DESCRIPTION
GHSL-2024-094	CVE-2024-47537	OOB-write in isomp4/qtdemux.c
GHSL-2024-115	CVE-2024-47538	Stack-buffer overflow in vorbis_handle_identification_packet
GHSL-2024-116	CVE-2024-47607	Stack-buffer overflow in gst_opus_dec_parse_header
GHSL-2024-117	CVE-2024-47615	OOB-Write in gst_parse_vorbis_setup_packet
GHSL-2024-118	CVE-2024-47613	OOB-Write in gst_gdk_pixbuf_dec_flush
GHSL-2024-166	CVE-2024-47606	Memcpy parameter overlap in qtdemux_parse_theora_extension leading to OOB-write
GHSL-2024-195	CVE-2024-47539	OOB-write in convert_to_s334_1a
GHSL-2024-197	CVE-2024-47540	Uninitialized variable in gst_matroska_demux_add_wvpk_header leading to function pointer ovewriting
GHSL-2024-228	CVE-2024-47541	OOB-write in subparse/gstssaparse.c
GHSL-2024-235	CVE-2024-47542	Null pointer dereference in id3v2_read_synch_uint
GHSL-2024-236	CVE-2024-47543	OOB-read in qtdemux_parse_container
GHSL-2024-238	CVE-2024-47544	Null pointer dereference in qtdemux_parse_sbgp
GHSL-2024-242	CVE-2024-47545	Integer underflow in FOURCC_strf parsing leading to OOB-read
GHSL-2024-243	CVE-2024-47546	Integer underflow in extract_cc_from_data leading to OOB-read
GHSL-2024-244	CVE-2024-47596	OOB-read in FOURCC_SMI_ parsing
GHSL-2024-245	CVE-2024-47597	OOB-read in qtdemux_parse_samples
GHSL-2024-246	CVE-2024-47598	OOB-read in qtdemux_merge_sample_table
GHSL-2024-247	CVE-2024-47599	Null pointer dereference in gst_jpeg_dec_negotiate
GHSL-2024-248	CVE-2024-47600	OOB-read in format_channel_mask
GHSL-2024-249	CVE-2024-47601	Null pointer dereference in gst_matroska_demux_parse_blockgroup_or_simpleblock
GHSL-2024-250	CVE-2024-47602	Null pointer dereference in gst_matroska_demux_add_wvpk_header
GHSL-2024-251	CVE-2024-47603	Null pointer dereference in gst_matroska_demux_update_tracks
GHSL-2024-258	CVE-2024-47778	OOB-read in gst_wavparse_adtl_chunk
GHSL-2024-259	CVE-2024-47777	OOB-read in gst_wavparse_smpl_chunk
GHSL-2024-260	CVE-2024-47776	OOB-read in gst_wavparse_cue_chunk
GHSL-2024-261	CVE-2024-47775	OOB-read in parse_ds64
GHSL-2024-262	CVE-2024-47774	OOB-read in gst_avi_subtitle_parse_gab2_chunk
GHSL-2024-263	CVE-2024-47835	Null pointer dereference in parse_lrc
GHSL-2024-280	CVE-2024-47834	Use-After-Free read in Matroska CodecPrivate

Fuzzing media files: The problem

Nowadays, coverage-guided fuzzers have become the “de facto” tools for finding vulnerabilities in C/C++ projects. Their ability to discover rare execution paths, combined with their ease of use, has made them the preferred choice among security researchers.

The most common approach is to start with an initial input corpus, which is then successively mutated by the different mutators. The standard method to create this initial input corpus is to gather a large collection of sample files that provide a good representative coverage of the format you want to fuzz.

But with multimedia files, this approach has a major drawback: media files are typically very large (often in the range of megabytes or gigabytes). So, using such large files as the initial input corpus greatly slows down the fuzzing process, as the fuzzer usually goes over every byte of the file.

There are various minimization approaches that try to reduce file size, but they tend to be quite simplistic and often yield poor results. And, in the case of complex file formats, they can even break the file’s logic.

It’s for this reason that for my GStreamer fuzzing journey, I opted for “generating” an initial input corpus from scratch.

The alternative: corpus generators

An alternative to gathering files is to create an input corpus from scratch. Or in other words, without using any preexisting files as examples.

To do this, we need a way to transform the target file format into a program that generates files compliant with that format. Two possible solutions arise:

Use a grammar-based generator. This category of generators makes use of formal grammars to define the file format, and subsequently generate the input corpus. In this category, we can mention tools like Grammarinator, an open source grammar-based fuzzer that creates test cases according to an input ANTLR v4 grammar. In this past blog post, I also explained how I used AFL++ Grammar-Mutator for fuzzing Apache HTTP server.
To create a generator specifically for the target software. In this case, we rely on analyzing how the software parses the file format to create a compatible input generator.

Of course, the second solution is more time-consuming, as we need not only to understand the file format structure but also to analyze how the target software works.

But at the same time, it solves two problems in one shot:

On one hand, we’ll generate much smaller files, drastically speeding up the fuzzing process speed.
On the other hand, these “custom” files are likely to produce better code coverage and potentially uncover more vulnerabilities.

This is the method I opted for and it allowed me to find some of the most interesting vulnerabilities in the MP4 and MKV parsers–vulnerabilities that until then, had not been detected by the fuzzer.

Implementing an input corpus generator for MP4

In this section, I will explain how I created an input corpus generator for the MP4 format. I used the same approach for fuzzing the MKV format as well.

MP4 format

To start, I will show a brief description of the MP4 format.

MP4, officially known as MPEG-4 Part 14, is one of the most widely used multimedia container formats today, due to its broad compatibility and widespread support across various platforms and devices. It supports packaging of multiple media types such as video, audio, images, and complex metadata.

MP4 is basically an evolution of Apple’s QuickTime media format, which was standardized by ISO as MPEG-4. The .mp4 container format is specified by the “MPEG-4 Part 14: MP4 file format” section.

MP4 files are structured as a series of “boxes” (or “atoms”), each containing specific multimedia data needed to construct the media. Each box has a designated type that describes its purpose.

These boxes can also contain other nested boxes, creating a modular and hierarchical structure that simplifies parsing and manipulation.

Each box/atom includes the following fields:

Size: A 32-bit integer indicating the total size of the box in bytes, including the header and data.
Type: A 4-character code (FourCC) that identifies the box’s purpose.
Data: The actual content or payload of the box.

Some boxes may also include:

Extended size: A 64-bit integer that allows for boxes larger than 4GB.
User type: A 16-byte (128-bit) UUID that enables the creation of custom boxes without conflicting with standard types.

An MP4 file is typically structured in the following way:

ftyp (File Type Box): Indicates the file type and compatibility.
mdat (Media Data Box): Contains the actual media data (for example, audio and video frames).
moov (Movie Box): Contains metadata for the entire presentation, including details about tracks and their structures:
trak (Track Box): Represents individual tracks (for example, video, audio) within the file.
udta (User Data Box): Stores user-defined data that may include additional metadata or custom information.

Once we understand how an MP4 file is structured, we might ask ourselves, “Why are fuzzers not able to successfully mutate an MP4 file?”

To answer this question, we need to take a look at how coverage-guided fuzzers mutate input files. Let’s take AFL–one of the most widely used fuzzers out there–as an example. AFL’s default mutators can be summarized as follows:

Bit/Bytes mutators: These mutators flip some bits or bytes within the input file. They don’t change the file size.
Block insertion/deletion: These mutators insert new data blocks or delete sections from the input file. They modify the file size.

The main problem lies in the latter category of mutators. As soon as the fuzzer modifies the data within an mp4 box, the size field of the box should be also updated to reflect the new size. Furthermore, if the size of a box changes, the size fields of all its parent boxes must also be recalculated and updated accordingly.

Implementing this functionality as a simple mutator can be quite complex, as it requires the fuzzer to track and update the implicit structure of the MP4 file.

Generator implementation

The algorithm I used for implementing my generator follows these steps:

Step 1: Generating unlabelled trees

Structurally, an MP4 file can be visualized as a tree-like structure, where each node corresponds to an MP4 box. Thus, the first step in our generator implementation involves creating a set of unlabelled trees.

In this phase, we create trees with empty nodes that do not yet have a tag assigned. Each node represents a potential MP4 box. To make sure we have a variety of input samples, we generate trees with various structures and different node counts.

In the following code snippet, we see the constructor of the RandomTree class, which generates a random tree structure with a specified total nodes (total_nodes):

RandomTree::RandomTree(uint32_t total_nodes){
uint32_t curr_level = 0;

//Root node
new_node(-1, curr_level);
curr_level++;

uint32_t rem_nodes = total_nodes - 1;
uint32_t current_node = 0;

while(rem_nodes > 0){

uint32_t num_children = rand_uint32(1, rem_nodes);
uint32_t min_value = this->levels[curr_level-1].front();
uint32_t max_value = this->levels[curr_level-1].back();

for(int i=0; i<num_children; i++){
uint32_t parent_id = rand_uint32(min_value, max_value);
new_node(parent_id, curr_level);
}

curr_level++;
rem_nodes -= num_children;
}
}

This code traverses the tree level by level (Level Order Traversal), adding a random number (rand_uint32) of children nodes (num_children). This approach of assigning a random number of child nodes to each parent node will generate highly diverse tree structures.

After all children are added for the current level, curr_level is incremented to move to the next level.

Once rem_nodes is 0, the RandomTree generation is complete, and we move on to generate another new RandomTree.

Step 2: Assigning tags to nodes

Once we have a set of unlabelled trees, we proceed to assign random tags to each node.

These tags correspond to the four-character codes (FOURCCs) used to identify the types of MP4 boxes, such as moov, trak, or mdat.

In the following code snippet, we see two different fourcc_info structs: FOURCC_LIST which represents the leaf nodes of the tree, and CONTAINER_LIST which represents the rest of the nodes.

The fourcc_info struct includes the following fields:

fourcc: A 4-byte FourCC ID
description: A string describing the FourCC
minimum_size: The minimum size of the data associated with this FourCC

const fourcc_info CONTAINER_LIST[] = {

{FOURCC_moov, "movie", 0,},
{FOURCC_vttc, "VTTCueBox 14496-30", 0},
{FOURCC_clip, "clipping", 0,},
{FOURCC_trak, "track", 0,},
{FOURCC_udta, "user data", 0,},
…

const fourcc_info FOURCC_LIST[] = {

{FOURCC_crgn, "clipping region", 0,},
{FOURCC_kmat, "compressed matte", 0,},
{FOURCC_elst, "edit list", 0,},
{FOURCC_load, "track load settings", 0,},

Then, the MP4_labeler constructor takes a RandomTree instance as input, iterates through its nodes, and assigns a label to each node based on whether it is a leaf (no children) or a container (has children):

…

MP4_labeler::MP4_labeler(RandomTree *in_tree) {
…
for(int i=1; i < this->tree->size(); i++){

Node &node = this->tree->get_node(i);
…
if(node.children().size() == 0){
//LEAF
uint32_t random = rand_uint32(0, FOURCC_LIST_SIZE-1);
fourcc = FOURCC_LIST[random].fourcc;
…
}else{
//CONTAINER
uint32_t random = rand_uint32(0, CONTAINER_LIST_SIZE-1);
fourcc = CONTAINER_LIST[random].fourcc;
…
}
…
node.set_label(label);
}
}

After this stage, all nodes will have an assigned tag:

Step 3: Adding random-size data fields

The next step is to add a random-size data field to each node. This data simulates the content within each MP4 box.
In the following code, at first we set the minimum size (min_size) of the padding specified in the selected fourcc_info from FOURCC_LIST. Then, we append padding number of null bytes (x00) to the label:

if(node.children().size() == 0){
//LEAF
…
padding = FOURCC_LIST[random].min_size;
random_data = rand_uint32(4, 16);
}else{
//CONTAINER
…
padding = CONTAINER_LIST[random].min_size;
random_data = 0;
}
…
std::string label = uint32_to_string(fourcc);
label += std::string(padding, 'x00');
label += std::string(random_data, 'x41');

By varying the data sizes, we make sure the fuzzer has sufficient space to inject data into the box data sections, without needing to modify the input file size.

Step 4: Calculating box sizes

Finally, we calculate the size of each box and recursively update the tree accordingly.

The traverse method recursively traverses the tree structure serializing the node data and calculating the resulting size box (size). Then, it propagates size updates up the tree (traverse(child)) so that parent boxes include the sizes of their child boxes:

std::string MP4_labeler::traverse(Node &node){
…
for(int i=0; i < node.children().size(); i++){ Node &child = tree->get_node(node.children()[i]);

output += traverse(child);
}

uint32_t size;
if(node.get_id() == 0){
size = 20;
}else{
size = node.get_label().size() + output.size() + 4;
}

std::string label = node.get_label();
uint32_t label_size = label.size();

output = uint32_to_string_BE(size) + label + output;
…
}

The number of generated input files can vary depending on the time and resources you can dedicate to fuzzing. In my case, I generated an input corpus of approximately 4 million files.

Code

You can find my C++ code example here.

Acknowledgments

A big thank you to the GStreamer developer team for their collaboration and responsiveness, and especially to Sebastian Dröge for his quick and effective bug fixes.

I would also like to thank my colleague, Jonathan Evans, for managing the CVE assignment process.

References

The post Uncovering GStreamer secrets appeared first on The GitHub Blog.

Source: Read MoreÂ