Tuesday, December 30, 2014

Why We Test, part 4: Specification

Warning: this post doesn't feel great to me. A bit disorganized and ill-balanced. But I needed to get it out for completeness. Feedback welcome, as always!

Llewellyn Falco (here) and Arlo Belshee (here) both talk about the way that tests can provide the value of "specification": that tests can explain to another human what the program is supposed to do.

How does this compare to the values of catching bugs, informing design, and psychological reward? There's a subtle way in which focusing your energy on specification is hugely important.

What does it mean for to be a good spec?
  • Name of a test is business value oriented.
  • Expresses a single example of a business rule.
  • Uses terminology from the problem domain.
  • Meant to be read by a human. (programmer, not customer)
  • At the appropriate level of abstraction for a human reader.
  • Test doesn't make any non-business-value demands
This gets you the design feedback you need. You can only meet the goal of "test as spec" if you listen to the design feedback. You can't have a bunch of setup code (including mocks); that would distract from the core message of the test. The correct terminology gets pushed in to the system under test. These short, simple, straightforward tests are only possible when code is decoupled, and when each business rule is expressed in exactly one place (DRY).

Simple, decoupled tests are inherently fast.

When they fail, they tell you clearly why the failure matters.

It also gets you the comprehensive safety net: if you're focusing your attention on writing and satisfying the spec, all your code will have purpose and will will be covered by tests.

Because the test only makes demands for for business value, you are free to refactor without unnecessarily breaking tests.

This value appears when you focus on how tests are read, but also delivers value when tests are run and written.


Monday, December 29, 2014

Why We Test, part 3: Psychology

Warning: I think this is a bit of a crappy blog post. I needed to get the ideas out there for completeness, but I haven't thought through this part thoroughly.

In "TDD as the crack cocaine of software", Jef Claes talks about the way that TDD (with really fast tests) can create the preconditions for Flow. Flow is emotionally rewarding, so that becomes its own reason to write tests this way, in addition to the desirable outcomes of catching bugs and informing design.

Other psychological (not technical) reasons to do TDD or other types of testing:

  • Confidence. 
Knowing that I have tests makes me feel safe that I can make changes and ship working software. 

Note that this confidence may be false! For example, if I base that confidence on reported code coverage, even though code coverage often does not correlate with quality. (Even worse, if I ignore other quality-ensuring activities because I focus my attention on code coverage, my quality will suffer while my confidence increases.) It's very tempting to celebrate coverage numbers. Don't do it.

Interestingly, following The Three Rules of TDD will tend to result in very high coverage and good quality.
  • Incrementalism
TDD with a tiny Red/Green/Refactor cycle helps you take small steps. You get the feeling of making progress all that time. You're always a minute or so away from a reverting to a green bar.
  • Tracking progress/status
If I get interrupted, I can look at my most recent unit tests to remind myself what I was doing. I can write my next failing test as a note to my future self about what I wanted to do next. (While I'm away, I'll probably change my mind, but the note is still valuable.)

Also, if I commit each passing test, another programmer can read the history to see the path I took. But I'm getting off topic for this post.

Maybe you can think of more examples, or a better way to organize these ideas - let me know!

Thursday, December 25, 2014

Two kinds of safety

While comparing the use of tests to catch bugs vs. improve design, I had a thought about safety (also inspired by Anzeneering).

The "catch bugs" approach provides one kind of safety - if I screw up, the tests will catch my mistake before it has a chance to do any harm.

The "improve design" approach provides a different kind of safety - I look for hazards and eliminate them, so the mistake doesn't happen.

To use a metaphor: if you have a high-wire to cross, the first kind of safety would come from a safety net under the wire; the second would come from replacing the wire with a wide, stable platform.

The wire + net is quick to install/change/remove; I can practice my balance on it; it's exciting. If I fall, I have to crawl back and try again.

The wide platform is expensive, but I can traverse it without taking great care; I can run across.

Working in untested legacy code is like living in a tree city where each home is connected by high wires.

Why we Test, part 2: Design

Previously, I talked about the perspective that the reason to write tests is to catch bugs, and this is a good thing all around. Now I want to talk about code design - about using tests to help me design my code well. Some people argue this is the "true meaning" of "Test-Driven Development".

Unit tests can point me towards good design: if its difficult to write a good test, it means the code is poorly factored, especially that the thing I want to test is inappropriately coupled with something I don't care about right now. Introducing indirection at this point can open my code up to a valuable abstraction.

The "good test" that is "easy to write" will look something like:

    testSubject = new Foo(/*initial state*/);
    result = testSubject.Bar(..);
    Assert(result...);

That's Arrange-Act-Assert, with one line of each. No need to write comments to that effect; no need for blank lines. (There are a few other similar forms in the 2-4 line range.)

This kind of test is only possible if the code under test is not tightly coupled to the rest of the system. More than just "programming to interfaces" so I can inject dependencies, I look to eliminate dependencies. I don't have much setup code. I don't use mocks because I don't need to.

It's not just about "testing a class", it's also about "testing a business rule". If I can test my business rules this way, they are DRY: each piece of knowledge has exactly one canonical expression in my codebase. I minimize emergent phenomena, so my whole system is easier to reason about.

Since I have unit tests that express my business rules, I use terminology from my business domain in the tests. That terminology will flow in to my system-under-test, giving rise to ubiquitous language (within my bounded context of course.)

This kind of test will naturally be super-fast and completely reliable, which supports the "catch bugs" value described before. But I also write a lot fewer bugs, because I have well-factored, well-named, decoupled, DRY code that is easy to reason about. Writing fewer bugs is more effective than trying to find-and-fix the bugs with tests.

I can treat bugs as another kind of design feedback: I can ask what made it easy for this bug to appear, and look for a way to eliminate the whole class of bugs. I may use a unit test for this purpose, but simply writing a test for the specific bug is not enough - I want to address the whole class.

Refactoring is still really important, and (unless I have great tools in C# or Java) I must count on the tests to protect me while refactoring. But now I have the advantage that a) my code is relatively well-factored already, and b) my tests are helping me figure out good ways to refactor, so refactoring is much more fruitful.

You only get this value if you listen to the feedback your tests are giving you.

This value appears when tests are written.


Why we Test, part 1: Bugs

I've noticed a small disagreement in the Agile world around the "true purpose" of unit tests? Mostly the two camps are "to catch mistakes" and "to direct design". I want to explore these ideas a bit further. Arlo Belshee gathered a bunch of great perspectives at What Makes a Good Test Suite?, and part of what I'm doing here is reorganizing those ideas, especially Llewellyn Falco's answer.

Value #1: Bugs.

Bugs ruin software, nullifying the value we work so hard to create. Tests catch bugs (sometimes called "checking" or "regression" or "validation"), so our users and our reputations are not harmed. If another person (or my future self) works on this code later on, I count on tests to catch mistakes before our customers do.

Having tests means I can refactor safely (for some definitions of refactoring). Refactoring makes future work easier. Programmers are happier. We can say "yes" to our customers more often. When refactoring, tests are especially important in languages without great refactoring tools (basically everything except C# and Java).

Speed matters. Faster tests => I run them more often => less has changed since the last run, and what has changed in fresh in my brain => easy to understand what a failure means.

Granularity matters. When a test fails, a granular test will tell me what is broken without a lot of investigation. (Some tests can also provide good diagnostics around a failure, which helps in similar ways.)

Reliability matters. If tests are flaky or broken, you either ignore them (so they deliver 0 value) or you rerun them (which acts as a multiplier on runtime).

Coverage matters. Luckily, strictly following TDD means you won't write any untested code, so you can be confident in your coverage, which is especially important in manual refactoring. Sticking with TDD requires discipline.

When I do find a bug, the responsible thing to do is add a test for it when I fix it. Now I can be sure I'll never have that bug again.

In this mindset, mocks are a great tool, because they let me unit test my code in isolation, which makes them faster, more reliable, and easier to write. I'm likely to introduce indirection ("program to interfaces") and use dependency injection, and maybe even the Service Locator pattern.

You only get this value if the have the right tests.

The bug-catching value appears when tests are run.

Monday, December 22, 2014

Extract Method to eliminate duplication

ReSharper recognizes duplication when I introduce a new variable:

But not when I Extract Method.

Consider these two rules:
  • When two methods in the same class are textually identical, they are semantically identical.
Here's my recipe for eliminating duplication with Extract Method:
  1. Extract Method at each site, giving the new methods nearly-identical names ("Foo", "Foo2" is fine).

  2. Use a text diffing tool to compare the two methods, including the signature.

  3. Use automated refactoring tools to normalize (eliminate the differences). For example, rename a parameter in one method to match the other.
  4. When the two methods are textually identical, except for their names, forward one to the other.
  5. Inline the forwarding method.
This may take several attempts, and the application of other refactorings (mostly Introduce Variable and Rename) to get everything just right.

Note that you don't have to understand what the code does to make this work. You just have to see the duplication and strive to eliminate it.

I like to do fully-automated, highly-reliable refactorings instead of manual edits, where possible. Because it makes me confident that I'm not breaking anything, I can do that without test coverage, which is key to recovering legacy code.

Monday, December 8, 2014

"We don't want to waste time on retrospectives."

As I've been running retrospectives, I've noticed something interesting around how much time we spend on them.

I've been on teams that decided to run retrospectives every 3 weeks. They picked a list of oversized changes, which didn't get implemented, so people stopped seeing the retros as valuable. If there was a scheduling conflict, a retro would get skipped, which just made the problem worse.

On my current team, we do retrospectives every week. The first one took 90 minutes, and triggered a lot of comments complaining about how long it took. I knew that if I wanted to get people to actually show up to weekly retros, I would have to limit the size to the more socially acceptable 60 minutes.

One tool I used was to ask each person to arrive with exactly one item (rant or rave) that they want to give attention to. That helped a lot with the time spent, without hurting value too much (although after a few iterations, we got more efficient from practice and people asked to go back to multiple items.)

I actually think we're optimizing the wrong way, though: if 90 minutes / week isn't worth it, then instead of reducing the cost, I'd rather increase the value. I prefer effectiveness over efficiency. There's a lot of latent potential upside here, focusing on efficiency optimizes for the status quo. I'd like to accelerate our improvements by having retros even more often (daily?!?!), but I am sure that idea would meet heavy resistance (and undermine my credibility.)

I've noticed an odd phenomenon that suggests that optimizing for time is not the right choice: at the end of each formal retrospective, when the meeting is officially done, about 1/2 the group sticks around and continues to discuss how we work. This sometimes goes on for another 90 minutes. Something similar happens at lunch. So clearly, there is a real need for more of this introspection. I just think that the formality of the scheduled retro is something people can't tolerate for more than 60 minutes / week.

It's strange that I hear the comments about "wasting too much time on retros", while people's actual actions show that they really are interested in devoting their time in this space.

How we do retrospectives

I've been facilitating formal retrospectives on this team for a few months now. We use this structure:

  • Meet at the end of each week for an hour.
  • Everyone writes observations on a sticky note (something that sucked, or something that was awesome by accident that you want to make sure we don't lose)
  • Dot-voting to select one item. 
  • Open discussion to deeply understand this item
  • Propose possible changes
  • Dot-voting to select one
  • Refine the item in to a crisp experiment

Most of the time is spent in the open discussion. I think this is the most interesting part. What causes this? What value do we get from it? What are some changes we could make, and what might the consequences be? How do other teams avoid or address this issue? Are there non-obvious changes that might make this issue disappear? (Causality may not be obvious.)

We have tried a few experiments with how we do retros:

  • Allow outsiders (including our manager) to attend. 

Learned that's OK if they stay quiet.

  • Each person can only put up one sticky note in the first round.

This speeds things up, and lets us devote more time to the open discussion.

  • Only call out good things that we want more of.

Assuming that bad things will fall away. Focusing only on fixing bad things will eventually get you up to "tolerable"; you need to increase good things to get to "awesome".

Overall I am very happy with our results. 90% of our decisions have been fully implemented; most produced valuable improvements. Where things didn't get better, we learned something valuable. For example, "Retrospective outcomes that require people to do more work are not likely to be adopted.", so at least for now, we should focus on things that don't require more work. (Maybe later, when our overload is reduced, and we can carve out more slack, we can start trying those things.)