WebP: A Case Study in the Hidden Vulnerabilities of Image Formats

8 minute read Published: 2024-05-13

Hacked by an image? How could that be possible? This is the story of how a seemingly innocuous image format, WebP, was discovered to have a critical vulnerability that could be exploited to execute arbitrary code on a victim's machine.

Table of Contents

Understanding Image Formats and Background on WebP

Under the hood, images are just binary data that represent pixels on a screen. Different image formats encode this data in various ways, each with its own strengths and weaknesses. Some formats, like JPEG and PNG, are well-known and widely used. Others, like Google's WebP format, are newer and less common but offer advantages such as better compression and transparency support. When a user opens any picture, the picture has to be rendered on the screen by the computer by decoding the binary data contained in the image file. This decoding process is handled by an image decoder, which reads the image data, interprets it, and displays the image on the screen. The decoder must understand the specific format of the image file to perform this task correctly.

Above: The process of encoding and decoding a JPEG image.

You'll often hear about image formats being "lossy" or "lossless." Lossy formats, like JPEG, discard some image data to achieve higher compression ratios, while lossless formats, like PNG, preserve all the original data. WebP is unique in that it can be both lossy and lossless, depending on the settings used during encoding. This flexibility makes it a popular choice for web developers looking to balance image quality and file size.

The libwebp is a popular open-source library that is used to encode and decode WebP images. It is widely used in web browsers, image editing software, and other applications that support the WebP format. The library provides functions for reading and writing WebP images, as well as encoding and decoding the pixel data. The vulnerability in WebP that we will discuss later is related to the lossless encoding and decoding process used by the libwebp library.

The Complexity of Huffman Encoding and Decoding

WebP's lossless variant is based on a technique called Huffman encoding, which is used to compress data by assigning shorter codes to more frequent symbols. This process involves building a binary tree where each symbol corresponds to a unique path from the root to a leaf node. The tree is then used to encode and decode the data efficiently.

To illustrate this concept, consider a simple example where we have four symbols (A, B, C, and D) whose frequencies are as follows:

A: 50%
B: 25%
C: 15%
D: 10%

The Huffman tree for these symbols might look like this:

   root
  /    \
 A      root
       /    \
      B      root
            /    \
           C      D

In this tree, the symbol A is encoded as 0, B as 10, C as 110, and D as 111. When encoding a sequence of symbols, we traverse the tree from the root to the leaf node corresponding to each symbol, generating a binary code along the way. This process ensures that more frequent symbols are assigned shorter codes, reducing the overall size of the encoded data.

Huffman tables are used to store the mapping between symbols and their corresponding codes. These tables are typically constructed during the encoding process and are required for decoding the data. For the example above, the Huffman table might look like this:

Symbol	Code
A	0
B	10
C	110
D	111

As you can see, the most frequent symbol (A) is assigned the shortest code, while the least frequent symbol (D) is assigned the longest code. This property of Huffman encoding helps achieve efficient compression by reducing the average code length. Huffman tables effectively capture the structure of the Huffman tree, with one table per alphabet size (number of symbols) used in the encoding process.

In the case of WebP, the Huffman tree is stored in the image file along with the encoded pixel data. When the image is decoded, the tree is reconstructed, and the pixel data is decoded using the tree's structure.

The WebP Vulnerability (CVE-2023-4863)

In September 2023, a critical vulnerability was discovered in the WebP image format that allowed attackers to execute arbitrary code on a victim's machine. The vulnerability, assigned CVE-2023-4863, was caused by a heap buffer overflow in the Huffman decoding process. By crafting a specially designed WebP image, an attacker could trigger the overflow and overwrite memory in the decoder, leading to a variety of potential exploits.

But triggering the vulnerability is not as simple as sending a malicious image file to a victim. The attacker must carefully construct the image to exploit the specific conditions that trigger the heap buffer overflow. This requires a deep understanding of the WebP format and the internal workings of the decoder, making it a non-trivial task even for experienced security researchers.

Ben Hawkes, the Project Zero team lead at Google, described the following steps to trigger the vulnerability:

Construct a sequence of 4 valid Huffman tables that are maximally sized for two different alphabet sizes (280 and 256).
Next, construct a very specific type of invalid Huffman table for a third alphabet size (40).
Finally, the vulnerability will be manifested in the BuildHuffmanTable function in the libwebp library when the invalid Huffman table is processed. The issue is that there was no length check on the buffer that was allocated for the Huffman table, which could lead to an out-of-bounds write when the invalid table is unpacked by the decoder. This means the function could be coerced into writing data beyond the allocated buffer, leading to an out-of-bounds write and potential memory corruption. The overflow will lead to a double-free vulnerability, which can be exploited to execute arbitrary code.

If any of the above steps are not followed correctly, the vulnerability will not be triggered and the image decoder will simply throw an error. Practically, this means that exploiting the vulnerability requires a high level of expertise and knowledge of the WebP format, making it unlikely to be used in widespread attacks.

mistymntncop, a security researcher, created a proof-of-concept exploit for the WebP vulnerability, and produced this visualization of the unbalanced Huffman table the is constructed from the crafted WebP image:

Above: An unbalanced Huffman tree created from a crafted WebP image.

In contrast, here is the visualization balanced Huffman tree created from a valid WebP image:

Above: A balanced Huffman tree created from a valid WebP image.

How Was This Not Caught Earlier?

Even though there are many fuzzing tools and techniques available to detect vulnerabilities in software, the reason this bug was not caught earlier is mostly due to the sheer complexity and specificity of the conditions required to trigger it. As Hawkes mentions in his zero-day disclosure, "[o]ut of billions of possibilities," only a 4 table sequence of a specific size and an invalid table of a specific size will trigger the vulnerability.

To see for yourself how complex the conditions are, you can check out the nearly 600-line C code that was used to trigger the vulnerability here.

Conclusion

This vulnerability has since been patched, so if you're using the most recent versions of your browsers than you can rest easy that you're not susceptible to this attack. However, the WebP vulnerability serves as a reminder that security vulnerabilities can lurk in unexpected places, even in seemingly benign image formats. As software systems become more complex and interconnected, it is essential to conduct thorough security assessments of all components, including image decoders and parsers. By identifying and addressing vulnerabilities proactively, we can help prevent potential exploits and protect users from malicious attacks.

Understanding Image Formats and Background on WebP

The Complexity of Huffman Encoding and Decoding

The WebP Vulnerability (CVE-2023-4863)

How Was This Not Caught Earlier?

Conclusion

References