extractText method

Class: EyesPlatform: AppiumLanguage: Java SDK:

This method allows you to search for the occurrence of text on an application page using OCR.

Pass one or more regions to be searched by passing one or more OcrRegion objects to this method. The OcrRegion constructor takes as a parameter a region specified by a CSS locator, a DOM element, or a rectangular region where the OCR should be done. Additional methods can be appended to the constructor in fluent API style to specify additional options. In particular, you can use the OcrRegion.hint method to specify the expected text as literal text or a pattern so as to overcome disambiguities that can arise in OCR such as differentiating between the digit 0 (zero) and the letter O.

For more information see Eyes OCR support .

This feature is experimental. Please note that the functionality and/or API may change.

Syntax


List<String> value = eyes.extractText(ocrRegions);

List<String> value = eyes.extractText(ocrRegions);

Parameters

ocrRegions
Type:BaseOcrRegion ...
Pass one or more objects which define where Eyes should search for text and optionally define other attributes such as the expected text.
ocrRegions
Type:OcrRegion ...
Pass one or more objects that define where Eyes should search for text and optionally define other attributes such as the expected text.

Return value

Type: List<String>
The method returns an array of strings, one per region defined by the parameter(s). If no text is found, then the array element for that target is an empty string. If the OCR finds multiple lines of text (text with different vertical offsets), then it returns them in a single string, in left-right-top-bottom order, separated by newline (\n) characters.

Remarks

The search area can be in any part of the application window (not only the viewport) but not within a sub-frame.

Defining Patterns and Hints

An OCR pattern/hint may be composed of any of the following characters:

.Matches any character.
\dMatches any digit 0-9.
\l(Lowercase L) Matches any letter a-z or A-Z.
\wMatches any word character a-z, A-Z, or _.
\SMatches any non-space character.
+Repeats the previous literal character or character class one or more times, for example, "\d+" is any multi-numeral digit and "\w+" is any word that contains only letters or an underscore. This pattern cannot cross a line break.
\Escapes a character that has a special meaning – specifically use this to specify the literals "\", ".", and "+" by using " \\", "\", and "\+".
spaceThe OCR is tolerant of spaces between characters, so you don’t have to add them to the pattern. Where a space is detected in the image, it is translated into a single space. If you add an explicit space in the pattern, then it matches any number of spaces.
Any other character represents itself.

Depending on the programming language you use, the back-slashed character classes may need to be specially encoded in the string, for example, by using a double back-slash such as "\\w".

Example patterns

  • "\w+": Match a word

  • "\d+": Match a number

  • "\S+" : Match mixed alphabetic and digital data

  • "\d+/\d+/\d+": Match a date, such as 01/04/1972

  • "$\d+.\d+": Match an amount of money, such as $150.00

Example

Example not yet available.