Abel Wang, Principal Cloud Advocate for Microsoft Azure DevOps, spoke in an Applitools webinar recently. He called his webinar: DevOps and Quality in the Era of CI-CD: An Inside Look at How Microsoft Does It. Because he knows the workings of the team, Abel knows what it takes to operate Azure DevOps. In this webinar, Abel described why the build-test-release model was too slow, and he explained what drove changes in their development team enabled them to deliver software with both speed and quality. He explained that the combination of Azure DevOps plus visual testing results in better developer productivity, higher quality, even with frequent releases on a rapid cadence. And, Abel described how he came across Applitools as a way to provide visual testing for the Microsoft Azure DevOps development team, and why he likes Applitools so much.
I get sweaty palms thinking about some things that just freak me out. For example, my hands sweated profusely when I watched Free Solo, about climber Alex Hornnold who climbed El Capitan in Yosemite National Park without the benefit of a rope. I also get sweaty palms considering a change from a build-test-release model to a continuous integration – continuous deployment (CI-CD) model. So I appreciate hearing from someone who has done something successfully and is willing to share lessons they learned along the way.
Requirements Conflict With Conventional Wisdom
Conventional wisdom says that, for any project, you can have targets for scope, quality, and resources. Two targets remain to be fixed, but one must give way. When applied to build-test-release, this conventional wisdom means you’re either going to miss your scope, your date, or your quality target.
Companies that consider a shift to CI-CD are doing so hoping to break this conventional wisdom. Abel described the following as key issues that the Azure DevOps at Microsoft needed to address:
- Rapid Change – Customers demand new capabilities, and Azure DevOps wants to respond. The platform changes every three weeks, with bug fixes released in the interim.
- Quality – Customers expect existing features to continue to behave properly, and they expect new features not to impact existing behavior.
- Customer Choice – In today’s Microsoft, the company cannot dictate which tools customers can or should use. Customers want to choose their own tools and interface with other tools using standards.
- Reliability – All with no downtime
So, anyone in any CI-CD solution requires flexibility, quality, speed, and reliability. That clearly violates conventional wisdom. Microsoft maps out all the potential capabilities needed, and software tools to use, in a map like this:
Image courtesy Abel Wang, Microsoft
Clearly, Microsoft Azure DevOps expects customers to use any kind of tool for any purpose and have it work in their flow.
Why Waterfall Won’t Work
Next, Abel explained the old way of doing things at Microsoft, and why it just wouldn’t work for Azure DevOps.
Abel related his stories of development at Microsoft – specifically his motivations when he was a developer. His job, he recalled, was to develop code. Once it was done, checked in, and validated, he was done – and hopefully onto another project that was interesting and exciting. Testing was QA’s responsibility. Developers didn’t know much about the quality of any unit tests they wrote.
QA spent the bulk of their time writing and running functional tests. Small products could take months. Large products like a Windows release could take as long as a year. During the testing time, bugs were discovered and fixed, and the bug list burned down. Eventually, the release date would arrive, and the big issue for the product team was whether they thought customers could live with the remaining bugs.
The time scales
Development (Months) > QA (Months) > Release
cannot work for services like Agile DevOps. Microsoft needed a different workflow for developing software that brought speed and quality into the development process.
Microsoft’s Solution: Shift Left
Abel then relayed the five strategic changes that Microsoft employed to deliver speed and quality to their software development. They were:
- Make developers responsible for software quality
- Make developers feel the pain of their bad code
- Employ heavy use of Code Reviews / Pull Requests
- Move away from functional testing toward unit testing
- Feature Flags
This approach, commonly called “Shift Left” among developers, places the responsibility for software quality squarely on the developers. Compared with the way things used to happen at Microsoft, the new approach for CI-CD required a radical shift for developers. And, as Abel said, most people were freaked out by their responsibilities.
Responsible for Quality
Making developers responsible for software quality meant that they had to think beyond their piece. Developers were required to consider how their code would be used in the real world. As a result, they started having to measure code coverage in their unit tests. Most of them had never considered coverage seriously. At first, they were frustrated by the extra work this entailed. Until they experienced the real pain of the code they had written.
Feel The Pain
What pain? Microsoft required each team to assign a single developer to be responsible for the team’s code during off hours, and the assignment rotated from person to person. If a production bug were found, the responsible engineer received a notification and had to join a bridge call – with all the executives for the business on the call. And, while the executives were pushing for outcomes, it was up to the software engineer to fix the problem push the fix into production, and ensure that the root cause was identified and resolved the problem never, ever happened again.
Abel related from experience that, when you knew that the top people of the cloud business at Microsoft were on the call and demanding outcomes, that was both stressful and painful. The engineer who resolved that bug never wanted to be on a call like that again. And that pain made its way back to the team.
Use Code Reviews/Pull Requests
Once engineers had first-hand experience of the pain of an executive-level bridge call due to bad code, they all understood the need for quality. This drove the next behavior – code reviews and pull requests. Teams agreed that individuals could not simply push code into a build. Quality depended on the team doing reviews to ensure that the code would work across all the cases to which the team was developing.
Move To Better Unit Tests
The pain from bridge calls also drove another quality behavior in development – better unit tests. While the initial response to being responsible for the quality of their own code was frustration, developers changed their behavior. They started understanding the value of comprehensive test code to validate their work. And they began writing code with more comprehensive coverage. As a result, team code improved, and the unit tests could be run during code check-in.
These better unit tests also dropped the need for large functional test suites. These tests, which could take hours to run and even longer for root cause identification of failures, were needed for fewer and fewer test cases. In most situations, the unit tests validated functional behavior sufficiently so feature tests were unnecessary. Functional testing for behavior accounted for a huge chunk of the QA time in the build-test-release model. The increased coverage of unit testing decreased the amount of functional testing that was required.
Use Feature Flags
The last Shift Left process change involved feature flags. New features could be released to production flagged, so only certain users could see how they were behaving. As a result, new features could be tested in production! If the feature behaved, the flag could be changed and anyone might be able to use this feature. If errors were discovered, the feature could be removed or fixed in the next build without impacting paying customers.
Test for Real Eyeballs: Applitools
While the developer process changes with Shift Left resulted in faster delivery of features with quality, one big change was missing: visual errors. A small CSS change could have a big impact on the visual behavior of the application, and neither unit testing nor functional testing alone could spot this kind of error consistently.
In the past, Microsoft needed to run the functional tests in front of a group of manual testers. They could see the behavior of the application on-screen and determine whether or not the application behaved as expected. An army of manual testers had to run the tests and do comparisons to see how individual pages responded on individual browser/operating system combinations.
Abel relates that, in this quest for tools, Microsoft came across Applitools. As Abel said:
“There is a really cool technology from Applitools, it’s not even a Microsoft product, that really helps with this. It uses artificial intelligence to do visual testing. Basically, you write your automated UI test, (I write my test using Selenium). You can use Applitools to take pictures of your screen for you. Then they can do comparisons between what your baseline image should be and. what the images for your latest code check-in, and they can flag those differences for you.”
“Or if you want to keep those differences you can say you know what this is correct. The changes correct. Well let’s go ahead and set that as my new baseline moving forward. So you have the power to do that. And now what is so incredibly cool about this is you’re shifting even left. Even your testing is being shifted left – your manual or your automated UI testing has shifted left – where instead of needing human eyeballs all the time you can use Applitools to act as your human eyeballs. And only if there is a difference it will let you know and then you can now decide, ‘Is this a change you want or did somebody mess up my CSS file and now things look wonky? So it is super powerful and super useful.
Demonstrating Azure DevOps Plus Visual Testing with Applitools
Abel then went into a demo of his own environment.
“So what I want to do is show you just how easy it is to use Applitools and to integrate that into our Azure DevOps pipelines.”
Using Page Objects in Selenium
“I’m using Selenium to write my test. And when I write my automated UI test, I use a page object type of pattern where I create an object for every single page that my application has. In this page object, it’s just a simple object but it has all the actions that you can do on that particular screen. And it also has all the verifications that you can do on that screen. So for instance actions would be like, oh, well, you can click click on the exercise link or maybe I can enter in text into this specific text box. Those are all the actions that you can do on that screen.”
“I also include the verification that you can do. So maybe I can enter this text, click on next, and then there should be a text box that pops up, or whatever. I need to verify that that text box shows up. I write tests that can actually show me that as well.
“When you write your test in this page object pattern, maintaining your code incredibly easily. You separate the test code from the page object. If your page changes, you just need to modify your page object.”
Reading Selenium Tests
“The other benefit is that my automated UI tests become incredibly easy to read as well. For instance, here’s my sample Healthy Food web app, and I’m going to go ahead and launch it. I pass in the browser, which is going to be Chrome, and then…
“Let’s go ahead and browse to the home page of the application. I pass in the URL. Then I verify my home page will be reached. I click on the Nutrition in the app link I remove all the donuts from the fields. Next, I create a new link. Now, I verify the page then read I had a donut in there I verify that the donut shows up on the table. So even if you’re not a coder you can look at this and be like: ‘Oh I know exactly what this test is doing.’ It just makes for a very very clean way to write your automated UI test.
“If we go ahead and jump into my launch code you can see this is where I actually do my selenium. So the first thing that I do is I check the browser and if I pass in the correct browser and Chrome I create a chrome driver. Then I launch the browser I set the size of the window and then the next thing that I do is I browse to my homepage. So if I go to browse to my homepage I pass into your URL and I just say “driver.navigate”. Go to the URL and I return to my home page and I pass in the driver.”
“Now I have these tests I want to add Applitools the ability to take pictures of my screens as I’m running through my tests so that I can compare our future tests with my base images to make sure nothing has changed or things haven’t turned wonky or weird with my UI from a visual perspective. It’s actually super easy to add Applitools into your tests.”
“If you look at my test now, there’s a couple of things that I need to gather before I even run. Number one is I need to have a specific Applitools key. So I need to have that key if I want to run my Applitools type stuff in my build and release pipeline in Azure DevOps. I also grab a batch name and batch ID that automatically populates my environmental variables as well. Let me add that little chunk of code.
“Now let’s go into my test. Let’s go into this. I launch my web app. Next, I created chrome driver but now I grab the batch name in the batch ID. Then, I create an Eyes object which is part of the Applitools API which you can just get. This is the important code. I create my eyes object I call eyes.Open. I set Firefox to my window size. And here I set Chrome to the same window size as well. So the next thing I do is browse to my home page verify that the home page has been reached. Then I had this method called take a visual picture.”
“When I take a visual picture it’s literally one line of code that I do I set eyes.CheckWindows and of course I pass in a tag that will show me all the data. Just helping organize stuff a little bit easier. So now what that does is on this particular page. Applitools will snap a screenshot and it can set it as my baseline. Or I can set it as manually set it as a baseline and then future tests, it will compare the pictures that it takes with the baseline to say has anything changed and if it has, is this a mistake or do you want to set your new changes as your new baseline. So super super super simple to do so once I add this check.
“If something looks wonky even though I’m on the right page like for instance the CSS got changed. Not a big deal because guess what Applitools will flag you and let you know. Super powerful super easy to use. The code changes that you make its almost nothing. So what does this look like? How do you set this up so it’s automated in your build and released pipelines? Well, let me go ahead and jump and show you what that looks like.”
Azure DevOps Plus Visual Testing with Applitools
“Here’s my build pipeline and here’s my Applitools build and I’ll go ahead and edit it to show you guys what that looks like now. This build should look incredibly similar to the previous build because it’s literally the same thing but I added a couple of things. I added the Applitools build task and you can get this build task from the marketplace. You can add it to your marketplace and I’ve already installed it so I can now describe and drop it where I need to. Add it right before you compile your application. Then I run my unit test and then if everything looks good I go ahead and deploy my application into a C.I. environment. And now I run my automated UI test using selenium and that is literally all I have to do.”
“So we can come in here we can look to build summary. We see that 100 percent of our test is passed (hooray for that). That looks good. And if we look at the Applitools tab (you get a brand new Applitools tab now), it will actually show you all the pages that that were done. Everything looked good. Everything is passed. If you notice if I click on these you’ll be able to see screenshots of every time I took a picture.”
Resolving Visual Testing Differences with Applitools
“We’ll go ahead and jump into my summary and you’ll notice here 90.9% percent of my tests have passed. Two of them are in the “Other” category. Well, that’s weird. You’ll notice two of my automated UI tests are returning back “inconclusive”. If I look at Applitools (because that’s what my automated UI tests are running) I’ll notice that things are starting to look weird and Applitools flagged it for me. And if I jump in here and actually take a look at this you’ll notice that the table is no longer nicely formatted. Somebody must’ve messed up the CSS. Logically everything still works but visually it doesn’t look good. But I didn’t need a human to tell me that Applitools actually told me. So now I can either say, ‘This is the way I want it.’ or not. Thumbs up, it’s ok. Thumbs down, nope.”
Let’s go ahead and thumbs down. Right now. It will mark this as failed. Just like that. Now, future builds will be able to figure out what’s going on.
“This makes me so incredibly happy because it’s really powerful. It shifts automated UI testing to the left. And it makes our pipelines go faster and smoother. Super cool. Super, super useful. I am not an Applitools expert. Not even close to it. I’m just a code slinger. I ran into this toolset. It was freaking amazing how easy it was to use and how useful.”
See The Full Webinar
The webinar is also covered in this blog post link.
Abel’s full code examples on GitHub.
The original slide deck on slideshare.com:
Find out more about Applitools
If you liked reading this, here are some more Applitools posts and webinars for you.
- Visual UI Testing as an Aid to Functional Testing by Gil Tayar
- How to Do Visual Regression Testing with Selenium by Dave Haeffner
- Why Visual UI Testing and Agile are a Perfect Fit
- The ROI of Visual Testing by Justin Rohrman