Automated Testing for SEO Data

Advanced Topics — Published October 21, 2021

The existence of HTML metadata is critical for websites around the web, but there is little or no emphasis placed on automated testing around such data. Even so, tons of articles will explain the importance of metadata or more commonly known, SEO data. Yes, you will find tools that help to validate the structure of some metadata, tools like Google’s Structured Data Testing Tool, but what about regressing testing of the actual metadata? In this article, I will be discussing SEO automated testing and demonstrating how you can use Cypress to validate SEO data like title, description, and JSON-LD structured data within a web page.

In 2021, Google accounted for over 70% of all desktop search traffic and you need SEO to get a better ranking on Google.

But what does Google has to do with SEO?

What is SEO?

It is through SEO (Search Engine Optimization), that search engines like Google can make sense of our web pages on the internet and in turn, let end users discover our website/business. It’s a structured way of how you can tell search engines and websites about your website.

What happens is that search engines use bots to crawl pages on the web, automatically finding sites, gathering information about those web pages putting them in a database. This database is indexed in such a way that information about these pages can be found easily. When you search on your favorite search engine, it pulls information from this index to provide the most relevant web pages that match what you are looking for.

Through optimizing sites with SEO data, individuals and organizations can guarantee that their content will be found on the web through search engines, and get organic traffic.

Factors that Contribute to SEO

There are many factors, but we will specifically talk about page title, description, and JSON-LD structured data. The main reason is that at minimum, you want to have a title and a corresponding description. If you want to get advanced and even have better-displaying results of your content on Google, using JSON-LD structured data is the way to go.

Title Tag

It was confirmed that Google uses HTML title tags 87% of the time when serving search results, I would think this is a good indication that maybe we should be paying more attention to our web page titles.

The <title> tag defines the title of your HTML document or web page. It is within the <head> tag of the HTML document and it is shown in the browser’s title bar. This is the title that is shown in search results. It should also be noted that you cannot have more than one title tag on your web page.

Browser tab with the title displayed

Description meta-tag

The <meta> tag is used to describe the metadata of a webpage where page description, keywords, etc. are specified. This tag also should be within the tag of the HTML document. Metadata like the description is used by the browser and other applications to display information about the web page.

Slack message with webpage description displayed

Script Tag with JSON-LD Structured Data

The <script> tag is used to include JavaScript on your web page, but in this particular case, it’s to embed the JSON-LD structured data in your HTML. This information is used by Google to get additional information about your website, and it also enables special search result features and enhancements.

Special Google search results based on structured data

Why and When Would You Want to Automate SEO Testing?

Companies are releasing software much faster today and having a way to easily test and validate your SEO data for web pages can be a boost of confidence. Since SEO data is within the page document (HTML code), it’s not something that we actively think about.

It’s like they say, “Out of sight, out of mind!” This makes SEO a great candidate for automated testing since viewing page sources can be such an eyesore and there is the chance of you missing the small details.

Page source of a web page

To keep your ranking on Google or any other search engines, you have to ensure that your SEO data is always present on your web page because if it’s not there then your site will not be indexed properly or at all depending on what is missing.

If you are migrating your site using a different tech stack, say from WordPress to React, you want to ensure that all of your HTML metadata that contributed to your site’s SEO is being migrated properly on these new React pages. Also, if a company is constantly updating the frontend of their website, it would be a good idea to have tests in place to ensure that the metadata is still present on the page before release.

The scenarios mentioned highlight that with releases or migrations, there is a good chance that your SEO data can be removed or edited, and we need to ensure that this remains over time.

Automating SEO Verification with Cypress (Demonstration)

Most automation frameworks give us an easy way to get the title of a webpage, but getting meta tags and structured data is simple if you know what you are looking for.

Before we get started with the exciting part of this article (the code), take a second to observe the different tags of the index.html file below. We will be using this as our webpage for verifying the different SEO factors we have mentioned so far.

Page Title Verification

Using the cy.title command in Cypress we can get the title of any page that we are currently on and verify that it has the title we are expecting

Title tag with text of a web page

As you can see, verifying that we have the title is insanely straightforward, since the Cypress title command grads the title from the HTML document so we do not have to specify any selector.

On the other hand, we don’t have that luxury to get the meta tags and script tags, but by understanding that we can select any element in the HTML document, we can use the cy.get command to select those tags so we can extract the text and JSON data that we need.

Page Description Verification

When selecting the description of the page, we target a meta tag with the name equals description. There will always be multiple meta tags in the header of the page, so we have to be explicit in stating the one we want to target. After getting the meta tag with the name description, we’ll then have to confirm that the content attribute for that tag contains the text that we are expecting. Pretty simple, right?

Meta tag with description of a web page

Page Structured Data Verification (JSON-LD)

Querying a script tag initially might seem like a very odd thing to do because most automation engineers are only used to selecting like a button tag, an input tag, and so on, the common ones that the user interacts with.

The good thing is that a script tag is just that a “tag.” We can query it just like we would any other tag on the page, but the trick is the data that we get back. In this case, we will be targeting the script tag with type application/ld+json, and the content is a JSON object, but it will be returned by Cypress as a string.

While we can perform validation on the string, it would be better to parse the string to a JSON object which will be much better to work with.

Script tag with news article JSON LD structured data

Let’s breakdown what is happening:

First, we need to query the script tag using cy.get command, which will return a JQuery object. We can then use the get .text method from the JQuery object to get the text of the script tag.

    // Query the script tag with type application/ld+json
    cy.get("script[type='application/ld+json']").then((scriptTag) => {
       //....
    });

By using the built-in JSON.parse function in nodejs, we can then transform the JSON string, into the actual JSON that is within the script tag.

// we need to parse the JSON-LD from text to a JSON to easily test it
      const jsonLD = JSON.parse(scriptTag.text());

Now we have our JSON-LD object that we can start using for verification which can be achieved in different ways based on the amount of data we need to verify. When working with objects, we ideally want to verify the structure of the data, and the data it is.

The simplest to start verifying that your SEO structured data is present and valuable is by asserting each key and its corresponding value.

// once parsed we can easily test for different data points
      expect(jsonLD["@context"]).equal("https://schema.org");
      expect(jsonLD.author).length(2);

There are cases where the data can be dynamic for different pages, and rather than have hardcoded values we can reference data from other parts of the page, like the title. In our structured data example, there is a headline key, and this headline should be the same as the title of the page.

// Cross-referencing SEO data between the page title and the headline
      // in the jsonLD data, great for dynamic data
      cy.title().then((currentPageTitle) =>
        expect(jsonLD["headline"]).equal(currentPageTitle)
      );

In that case, we can get the title of the page and compare it against the value of the headline. This will ensure that the current page of the correct title is in the structured data.

What if the structured data is too large to individually verify each key, and what about nested data like the author key which is an array of objects? I would recommend looking into the schema validation options that you can use to verify the JSON structure and its data.

Understanding Automated SEO Testing

Automating SEO is not the hardest thing to do, but it’s the last thing we sometimes think about because of the nature of SEO and where the data lives.

Depending on the requirements of your organization you might want to test beyond the title, description, and structured data but it’s no longer a daunting request because you fully understand how you can automate it.

If you want to execute the code for yourself, you can use this repo. Have fun automating!

Are you ready?

Get started Schedule a demo