Can Image-Based Functional Test Automation Tools Automate Visual Testing?

Advanced Topics — Published September 23, 2013

In our previous blog post we have discussed the importance of Visual Testing and highlighted the basic requirements from an effective Visual Test Automation tool. A common misconception is that Functional Test Automation tools with Image-based testing capabilities (such as Sikuli, Ranorex, eggPlant, SeeTest, and more) are suitable for automating Visual Testing. In this blog we are going to focus on the strengths and weaknesses of Image-based Functional Testing tools and show why they are inadequate for automating Visual Testing. 

Functional vs Visual Testing
Functional vs Visual Testing

There is no doubt that some of you have successfully used or are currently using Image-based testing tools to automatically verify the correctness of certain visual elements of your applications. For example, you may be using Sikuli within your Selenium tests to verify that certain images or buttons are correctly displayed on screen. Although this type of automatic validation has substantial value for some applications, it is merely a drop in the sea of Visual Testing.

Our requirements from a Visual Test Automation tool are much stricter.

We expect such a tool to validate the layout and appearance of each and every (complete) screen of the application under test (AUT), running on different browsers, screen resolutions, devices, and operating systems. We expect it to validate the correctness of the UI when it is zoomed in and out, when hovering over various UI elements, and when the application’s window is resized. Moreover, we expect the tool to facilitate fast and easy test maintenance and failure analysis. In other words, we expect the tool to eliminate the need for subsequent Manual Visual Testing.

All UI Test Automation tools rely on 3 basic mechanisms to define and execute tests:

  1. UI element locators: allow the tester to locate UI elements to act upon
  2. Human Interface Device (HID) event generation: allows the tester to simulate keystrokes, mouse clicks and similar HID events on a UI element in order to drive the AUT.
  3. UI data extraction: allows the tester to read data from a UI element in order to validate the correct behavior of the AUT.

Clearly, the ability to identify and locate UI elements is key to UI Test Automation. Different UI frameworks (Flash, HTML, Android/iOS UI, Windows Forms, WPF, Java Swing, etc.), use entirely different UI element representations, and expose different APIs to locate and manipulate them. However, in all these frameworks, application developers can assign unique identifiers to UI elements, which allow them to be easily located by UI Testing tools, regardless of the execution environment of the AUT. For example, the id property of an html DOM element (an html UI element) can be used to identify it on any browser, operating system and device in which the web page is presented.

For the purpose of Functional Test Automation, which focuses on testing the functionality of applications rather than their visual appearance, Object-based Test Automation tools are by far the preferred tools of choice. These tools are tightly coupled with the UI framework of the AUT and natively implement the 3 basic automation mechanisms above. Providing that the UI elements of the AUT are properly identifiable it is relatively easy to develop robust, cross-platform tests using these tools.

However, there are situations in which Object-based tools cannot be used effectively:

  1. The AUT is based on a new UI technology that is not yet supported by an object-based tool.
  2. The UI elements of the AUT are not easily identifiable or not accessible. Improving the testability of the AUT is sometimes not possible due to lack of development resources or lack of cooperation between testers and developers. Testability improvements are especially difficult or even impossible when testing legacy systems or when using 3rd party GUI components.
  3. UI element identifiers frequently change. This situation can occur when developers carelessly modify the application’s UI or when using development tools that automatically generate the UI code in a way that does not preserve the identities of UI elements.
  4. An entire UI region of the AUT that consists of multiple functional sub-regions is represented by a single UI element (e.g., a UI element representing a city map). The functional sub-parts (e.g., points of interest on the map) cannot be identified by an Object-based tool.

In such situations, Image-based testing tools come to the rescue. Unlike Object-based tools, Image-based tools are completely independent of the underlying UI framework.

Image-based Testing tools implement the 3 basic automation mechanisms using image processing techniques as follows:

  1. UI element locators: a tester identifies UI elements by images manually cropped from screenshots of the AUT screens. During test execution, the automation tool takes screenshots of the AUT and searches for the specified image. If found, the coordinates of the image can be used as targets for subsequent operations.
  2. HID event generation: generated events are targeted at the coordinates of detected images or relative to them.
  3. UI data extraction: image-based tools are limited to extracting images and their locations. Further image processing techniques such as OCR are required to obtain the data represented in these images.

Relying on UI element images for identification is the main strength of Image-based test automation tools but also their main weakness. Unlike Object-based tools, image-based tools are extremely sensitive to the visual environment in which the AUT executes. As a result, writing robust, cross-platform tests using these tools is much more challenging than with object-based tools.

 Here are a few examples of common challenges:

  1. Different web-browsers and operating systems render texts and images differently, resulting in different UI element images in each execution environment.
  2. The application’s UI layout usually adapts to the size of the application’s window, causing the size or location of UI elements such as buttons, text boxes and images to change. This is especially problematic for full screen applications because their size is dynamically determined by the screen resolution of the hosting device.
  3. Identical UI controls may appear multiple times in a single application screen (e.g., multiple text boxes in a form). Further tuning is required to identify a specific image occurrence.
  4. When the AUT evolves and its UI changes, the images of all affected UI elements must be carefully recaptured and adjusted manually by the tester.

In order to address these challenges several best practices and tool features were developed over the years: UI element images must be as small as possible to avoid distortion due to layout changes, but large enough to allow the UI element to be uniquely identified on the screen. Basic image processing techniques (such as Thresholding, Edge-Detection, Similarity, Downsizing, etc.) can be applied to allow similar (but not identical) images to match in order to overcome rendering differences, and multiple images may be specified to identify a single UI element in various execution environments. Nevertheless, due to the inherent complexity of this approach, cross-platform image-based tests are generally considered much harder to develop and maintain than object-based tests.

Figure 1: An example web-application screen with multiple visual UI elements to verify
Figure 1: An example web-application screen with multiple visual UI elements to verify

Since Image-based tools were designed and refined for Functional Test Automation purposes, their limitations become even more severe when applied to Visual Testing:

  • Full screen validation: A visual test must verify that the appearance and layout of complete application screens are correct. As described above, using full screen image identifiers results with fragile image based tests, and requires massive maintenance overhead as images must be recaptured whenever the application changes. Identifying multiple small elements and programmatically validating their respective positions is also not practical as there are tens and even hundreds of UI elements in any normal application screen (see figure 1). Focusing on a subset of elements does not guarantee that other elements of the screen are displayed correctly or even visible, and thus reduce the coverage and effectiveness of the test.


  • Accuracy: The ultimate goal of visual test automation is to eliminate the need for subsequent manual visual testing and therefore the test results should be highly accurate. The image processing tools offered by Image-based testing tools are effective for robust UI element identification but not for validating their appearance. For example, thresholding and edge detection completely ignore the color and style of UI elements; Image downsizing and similarity can easily result in false matching of images of different content. For example, ’+’ is only a few pixels different from ’-’.


  • Dynamic content: Most applications consist of dynamic content such as ads, dates, currency rates, user information, etc. Image-based tools do not provide the means to ignore dynamic regions while validating their position, size or style.


  • Failure analysis: When a visual test fails, the differences between the expected screen and the actual screen must be investigated in order to understand whether the failure is due to a bug or due to a valid change in the application’s UI. In many cases, the differences can be very difficult and time-consuming for a human to detect without supporting tools. Even massive differences can be due to a single UI element that has moved or changed its size – identifying such rogue elements can be a frustrating and tedious process without proper tool support. Image-based tools do not provide any support for image difference root-cause analysis.


  • Automatic maintenance: Application UI changes usually affect the layout and appearance of multiple application screens. Image-based tools require the tester to manually update the identifying image of each and every affected UI element, and take into account the different execution environments in which tests are expected to run. The lack of automated maintenance capabilities in image-based tools (e.g., automatically collecting candidate images, automatic approval of new expected images across different execution environments, and automatic propagation of match settings, etc.) make them impractical for automating Visual Testing.

In Conclusion…

Image-based Test Automation tools are excellent for automating Functional UI Testing, especially in situations where Object-based Test Automation tools cannot be used. Several open-source and commercial Image-based tools have reached maturity and perform well in practice. Image-based tools can also be used for validating the visual correctness of key UI elements of software applications. However, because of the inherent limitations described above, Image-based Test Automation tools are inadequate for comprehensive Automated Visual Testing. The limitations are especially apparent for Web and mobile applications that run on multiple browsers, screen resolutions and devices.

To read more about Applitools’ visual UI testing and Application Visual Management (AVM) solutions, check out the resources section on the Applitools website. To get started with Applitools, request a demo or sign up for a free Applitools account.

Are you ready?

Get started Schedule a demo