Ny Media: On mistakes and the tools needed to learn from your mistakes

On mistakes and the tools needed to learn from your mistakes eirikMarch 5, 2021

The excitement around releasing something was in the air that morning, and by the time the code was rolled out, everything seemed to be running smoothly and according to plan. This was also expected, as we have a staging site and a range of automated tests. And with that confidence one would think a bug never makes it all the way to production, right? Not right, unfortunately. In the complex solutions as we build, some parts might be very customized, and have a range of either edge cases, or test cases that our tests do not cover. So this morning we quite fast received reports of a bug that prevented a specific part of the site from working as it should.

One reaction to this would be to find the bug, assume who wrote that part was responsible. Complain publicly in Slack about this person and hotfix it on production. Add a post-mortem PR and be done with it.

Another reaction would be to analyze how the bug was introduced, fix the bug while carefully making sure the changes that introduced the bug is kept, and write some tests that makes sure we do not roll out a release that breaks this part of the site again.

Let’s try the latter approach in this article.

Analyze how the bug was introduced.

We use git on all of our projects, which has a a version history of all changes introduced. This way we can also find out what went wrong here. To find out when this bug was introduced, we use git bisect, which is an integrated tool in git. I first learned of this tool in a blog post from Webchick, more than 10 (!) years ago, and have since used that blog post several times as a cheat sheet. So please don’t take down that site (or article), but just in case, here is a brief summary on the steps needed:

Start with finding a commit in the history where the functionality was working. In my case I found one at the SHA 0c6ec6e2330cf7fb89f1aee7bb059edf764fd695. Then take a note where it is not working, in our case the tip of the production branch, which was the SHA a0c161c0067442fa028d80f19f3a5642c653b820. So I started bisecting:

git bisect start
git bisect good 0c6ec6e2330cf7fb89f1aee7bb059edf764fd695
git bisect bad a0c161c0067442fa028d80f19f3a5642c653b820

This will start the bisect, and it will say something like this:

Bisecting: 25 revisions left to test after this (roughly 5 steps)

So it checks out the most effective path to finding the commit that introduced the error. So to investigate each step we have to tell git what the current state is. I run the build step on this commit SHA, and see that the error is there. That is bad. Let’s tell git:

git bisect bad

It checks out another SHA for me, and I build the project again, checking if the bug is there. Yay, it’s gone. Let’s tell git:

git bisect good

This will the continue until you find the commit that introduced the error. Something like this:

72624098ee091143d5b1318e0912c0e1c8a65406 is the first bad commit
commit 72624098ee091143d5b1318e0912c0e1c8a65406
Author: xxx 
Date:   Wed Feb 10 19:51:54 2021 +0100
    Commit message

This can give us enough info to blame someone, but that is really not constructive. Especially since the author ended up being me.

Fix the bug in a careful way

Now that we know what introduced the bug, it might first of all be easy to actually spot the error. So this can actually help you fix the bug. But more importantly, you know why the change was introduced so you can fix the error, while making sure this will not revert whatever change the author wanted to achieve with the commit. So this step will be left to the reader, since it will vary greatly on the bug and the contents of the commit

Analyze what went wrong

The bug was introduced, and we could fix it in a timely manner. But what actually got us in this situation? There can be several answers to this question:

Missing test coverage
Undocumented functionalty
Undocumented dependencies
Mis-use of functionality that introduced a bug when you fixed something else
Misunderstandings in code reviews or pull requests

An analysis of the bug should be able to uncover if any one or several of these things were at play. Or maybe there are other structural issues with this project that needs to be addressed. We can talk through all this, and make a plan (with corresponding issues) to tackle as many as possible out of these issues. In our case missing test coverage was quite obvious, as the change would never have been committed, had we known that it would make the specific functionality break. Which brings us to the last step of this story:

Add the missing test coverage, and make sure it covers the bug introduced

While fixing the bug is priority number one, and was initially done quite fast, the follow up is to write a test that illustrates what went wrong, and confirms the bug actually fixes the bug.

In practice this means we will do these things:

Fix the bug and create a pull request
Working from the develop branch without the fix applied, we make a test
Confirm this test is failing locally

We are writing functional tests like this with Behat, so running the test locally would look something like this:

$ ./vendor/bin/behat --tags=some_functionality

@api @javascript @issue @more_tags @some_functionality
Feature: Use this specific thing
  
  We introduced a bug with with something specific, and here is a description.
  
  We also include a link here: http://example.com/issue/677

  Background:
    Given viewport is desktop
    Given some entity

  Scenario: A user should be able to do something specific
    Given I am viewing the thing
    Then I click the 0 element ".bundle--item-wrapper .select-wrapper"
      No html element found for the selector ('.bundle--item-wrapper .select-wrapper') (Exception)

    Then I click the 1 element ".select-wrapper li span"
    Then I click the 1 element ".some-wrapper .select-wrapper"
    Then I click the 4 element ".select-wrapper li span"
    Then I should see "Expected text"
    Then I should see "More expected text"
    Then I click add to cart
    Then I go to checkout for last order
    Then I should see "special-variation"
    Then I should see "Expected text"
    Then I should see "More expected text"

--- Failed scenarios:

    tests/features/some-feature.feature:12

1 scenario (1 failed)
14 steps (3 passed, 1 failed, 10 skipped)

As expected, our test fails. Trying it with the bug fixed:

@api @javascript @issue @more_tags @some_functionality
Feature: Use this specific thing
  
  We introduced a bug with with something specific, and here is a description.
  
  We also include a link here: http://example.com/issue/677

  Background:
    Given viewport is desktop
    Given some entity

  Scenario: A user should be able to do something specific
    Given I am viewing the thing
    Then I click the 0 element ".some-wrapper .select-wrapper"
    Then I click the 1 element ".select-wrapper li span"
    Then I click the 1 element ".some-wrapper .select-wrapper"
    Then I click the 4 element ".select-wrapper li span"
    Then I should see "Expected text"
    Then I should see "More expected text"
    Then I click add to cart
    Then I go to checkout for last order
    Then I should see "special-variation"
    Then I should see "Expected text"
    Then I should see "More expected text"

1 scenario (1 passed)
14 steps (14 passed)
0m11.06s (66.01Mb)

🎉️🎉️🎉️

Our next step is to push this branch to Github. We use Github for code and Github actions for continuous integration. Now to prove our test is actually testing the bug we have fixed, we revert the fix, and create a pull request from this.

Github PR with reverted patch — Pull request, including the reverted commit, which we now expect to fail.

(Note: Commit messages and titles have been manipulated for illustration purposes).

As we can see, the pull request succeeded with just the test, but we want to make sure the change would have prevented us to push the broken code to production. So we remove the fix in this reverted commit, which we applied like this:

git revert ba31d6c9f4ffaa5508642a23a598b854124ac572 # - Our commit sha for the fix.

Then, as the test failed like we expected it to, we add back the fix. For example by reverting the reverted commit, so something like this:

git revert 3d3866985e24a19a65c05abaaa1afab9242bc76d # - Same SHA as the screenshot

Then, let's go back to our pull request, and verify that it now passes again (which it should, since it initially passed before we reverted the fix)

We can now merge in the test. This way we have accomplished the following things

Fixed the bug
Analyzed why the bug was introduced
Wrote a test that illustrated the problem
Made sure the functionality in question will not be broken by releases in the future

If you are looking for an agency with a focus on Quality Assurance, automation and stability, we can help! Our clients benefit from our focus on quality, and we always have long term cooperation in mind while working with clients. This way we can deliver solutions that have a high quality, but also increases in quality as the complexity grows. Contact us today if you are looking for a technical partner for your project!

https://www.nymedia.no/en/blog/mistakes-and-tools-needed-learn-your-mistakes