Automate PDF Testing

Advanced Topics — March 5, 2021

Introduction

Automated verification of PDF’s has traditionally been a challenging task in test automation. Due to this, teams would often go for automating their applications leaving the PDF testing in manual hands which is error prone. In this article we will review the requirements of PDF testing as well as approach to automating PDF tests using Applitools.

Why PDF Testing?

As more organizations digitally transform, their operating model requires the documents to also be electronically produced and sent to their customers. Assume a customer visits an insurance company or bank to open an account. Increasingly, this work occurs exclusively with electronic records. After successful setup a digital copy of the record is provided to the customer. PDF offers the most sophisticated document layout and necessary security to serve as an electronic record. Account statements, invoices, receipts, documentation, and disclaimers all get distributed as PDFs.

When organizations produce their transactional or customer related documents as PDFs, it becomes important for the organization to be able to test the output produced. Failing to do so can cause massive loss to the organization or can have legal implications in case documents are not correctly formatted or wrong content is published. Thus testing the generated outputs is mandatory from both quality and legal perspective.

What to Automate?

In sectors like insurance, medical, banking the end user documents need to be very accurate hence we need to ensure that the PDF’s are fully tested before being published to recipients. Consider an application producing customer letters using a PDF template,  the output files would be produced using a template where various sections of the PDF are dynamically updated with the customer data. 

When testing for layout the document should be fully formed with the specific sections present in the right location and in the right order while when testing for content we need to ensure that the correct is accurate and not only the content is present on the desired page but also sometimes in the right location as it could impact the processing of the documents by downstream systems. Thus a test for PDF needs to verify that both the content and layout of the output document are correct. 

How traditionally organizations did PDF Testing?

Let us consider the following bank statement sample summarizing the transaction details and other critical information for a customer Mary Jane. In this sample the customer address, branch address, account number and account transaction summary are dynamic data while the rest of the information is static.

Usually organizations take an approach of validating the data using API testing and finally using solutions such as PDF box to test them on a page. However a fully formatted document is rarely automated and most organizations rely on manual testing to validate the output document. As more and more organizations are generating electronic documents it becomes fairly problematic to review each of them and hence PDF documents are tested based on a sample size.

Thus, traditionally organizations have not attempted end to end automation of PDF and performed manual testing to check the data that is published to the PDF without testing the entire layout of the final document.

Application of Visual AI in testing PDF

Applitools is an AI powered Visual testing platform. Using various algorithms it enables testing of any user interface with 99.99% accuracy thus only reporting real differences visible to the human eye which include any changes to color, contrast, position, size or content.

 In case of PDF we can Automate all or selected pages from the fully formatted PDF document thus highlighting any visible difference. In addition we can further refine our tests  by either targeting specific sections of the page or ignoring sections which are not relevant to our test. 

There may also be a need sometimes to validate the structure of the document without testing for content and that can also be achieved using a layout algorithm.

PDF Testing solution

Applitools PDF tester is a codeless utility that allows you to automate the PDF testing of your small or large documents using Visual AI and also allow for validation of the content in a page or a region across selected pages or all pages of the PDF. 

In the above example, let’s consider we want to validate that the customer address and branch address is correct on the PDF and also test the remaining layout of the PDF document. Below is how this can be tested using Applitools PDF Tester. Firstly, to set up a job we need to identify which pages we need to test (only for multi page PDF) for layout and then add specific content assertions we are aiming to validate, following is the job xml:

The above xml can be manually created or programmatically built using any script and then executed with the Pdftesting.jar application. Utility can be executed on the command line using any batch process.

Once executed it provides the following results:

If you notice, the utility reports on all our content assertions in the PDF document and report the result as ‘Passed’ or ‘Failed’. In addition it is testing the fully formatted output document against a baseline and reports if any differences are found by Applitools.

Logging into Applitools dashboard we can review all the differences spotted by AI

Here, AI is highlighting all the positioning differences and missing elements but it can also report any color, contrast, positional or font size changes compared with the baseline document thus ensuring the document is accurate before being published. 

A further improvement to these tests can be done by using annotations (on Applitools dashboard) to ignore specific sections like images or test transactions region as Layout. This way we can target only specific items to be tested in our published document.

Conclusion

While organizations have largely handled and automated the  testing of their web and mobile applications, they have still struggled to automate the PDF testing within various processes. Utilizing the capabilities of AI in testing for a completed document along with testing dynamic data will help teams include PDF testing as part of their end to end testing instead of using a manual approach.

Are you ready?

Get started Schedule a demo