extractTextRegions method

Class: EyesPlatform: Selenium 3Language: Java SDK:

Search a region for instances of text that match a pattern using OCR and return their location, dimensions and the text found.

An example use of this method is to find the location of a particular instance of text, such as on a button, so as to click on it as part of a test.

For more information see Eyes OCR support .

This feature is experimental. Please note that the functionality and/or API may change.

Syntax


Map<String, List<TextRegion>> value = eyes.extractTextRegions(textRegionSettings);

Parameters

textRegionSettings
Type:TextRegionSettings
A list of one or more string literals or patterns to be searched for using OCR. For details of the possible patterns see below.

Return value

Type: Map<String, List<TextRegion>>
The returned object is a map of key/value pairs, where the keys are the patterns specified in the pattern list passed as a parameter, and each value is a list of lines that matched the string pattern or literal. Note that this behavior is different to the extractText method described, which returns multiple lines as a single string with indicating where a line break was detected. Each matched line is represented as an object that consists of the following properties:
  • The text that was found.
  • The top left corner of the text's bounding rectangle in the captured image.
  • The dimensions of the text's bounding rectangle.
Only patterns that matched are returned. If no text is found, then an empty object is returned (a map without any keys).

Remarks

In the current implementation the search region is implicitly the current browser viewport (what you see when you load a URL in the browser without scrolling).

Defining Patterns and Hints

An OCR pattern/hint may be composed of any of the following characters:

.Matches any character.
\dMatches any digit 0-9.
\l(Lowercase L) Matches any letter a-z or A-Z.
\wMatches any word character a-z, A-Z, or _.
\SMatches any non-space character.
+Repeats the previous literal character or character class one or more times, for example, "\d+" is any multi-numeral digit and "\w+" is any word that contains only letters or an underscore. This pattern cannot cross a line break.
\Escapes a character that has a special meaning – specifically use this to specify the literals "\", ".", and "+" by using " \\", "\", and "\+".
spaceThe OCR is tolerant of spaces between characters, so you don’t have to add them to the pattern. Where a space is detected in the image, it is translated into a single space. If you add an explicit space in the pattern, then it matches any number of spaces.
Any other character represents itself.

Depending on the programming language you use, the back-slashed character classes may need to be specially encoded in the string, for example, by using a double back-slash such as "\\w".

Example patterns

  • "\w+": Match a word

  • "\d+": Match a number

  • "\S+" : Match mixed alphabetic and digital data

  • "\d+/\d+/\d+": Match a date, such as 01/04/1972

  • "$\d+.\d+": Match an amount of money, such as $150.00

Example

String[] patternList = {};
Map<String, List<TextRegion>> resultRegions = eyes.extractTextRegions(
        new TextRegionSettings(
                ".+",
                "applitools",
                "Click here",
                "\\d+-\\d+-\\d+",
                "\\S+: \\d+"));
for (Map.Entry<String, List<TextRegion>> entry : resultRegions.entrySet()) {
    System.out.printf("for pattern %s found:\n", entry.getKey());
    for (TextRegion info : entry.getValue()) {
        System.out.printf("x: %d, y:%d, width:%d. heigth:%d, text '%s'\n",
                info.getX(), info.getY(), info.getWidth(), info.getHeight(), info.getText());
    }