Maintaining Software Correctness

2022-11-16

This vendible is a write-up of a talk I gave at MinneBar 2022. Instead of reading this, you could moreover watch the recording or view the slides.

The title of this talk is "maintaining software correctness." But what exactly do I midpoint by “correctness”? Let me set the scene with an example.

Years ago, when Trello Android unexplored RxJava, we moreover unexplored a memory leak problem. Surpassing RxJava, we might have, say, a sawed-off and a click listener; when that sawed-off would go yonder so would its click listener. But with RxJava, we now have a sawed-off click stream and a subscription, and that subscription could leak memory.

We could stave the leak by unsubscribing from each subscription, but manually managing all those subscriptions was a pain, so I wrote RxLifecycle to handle that for me. I’ve since disavowed RxLifecycle due to its numerous shortcomings, one of which was that you had to remember to wield it correctly to every subscription:

observable
  .subscribeOn(Schedulers.io())
  .observeOn(AndroidSchedulers.mainThread())
  .bindToLifecycle() // Forget this and leak memory!
  .subscribe()

If you put bindToLifecycle() surpassing subscribeOn() and observeOn() it might fail. Moreover, if you outright forget to add bindToLifecycle() it doesn’t work, either!

There were hundreds (perhaps thousands) of subscriptions in our codebase. Did everyone remember to add that line of lawmaking every time, and in the right place? No, of undertow not! People forgot constantly, and while lawmaking review unprotected it sometimes, it didn’t always, leading to memory leaks.

It’s easy to vituperation people for messing this up, but in substance the diamond of RxLifecycle itself was at fault. Depending on people to “just do it right” will sooner fail.

Designing For Humans

Let’s generalize this story.

Suppose you’ve just created a new architecture, library, or process. Over time you notice some issues that stem from people incorrectly using your creation. If people would just use everything correctly there wouldn’t be any problems, but to your horror everyone continues to make mistakes and rationalization your software to fail.

This is what I undeniability the lawfulness dilemma: it’s easy to create but nonflexible to maintain. Getting people to uncurl on a lawmaking style, properly contribute to an OSS project, or unceasingly releasing good builds - all of these processes are easy to come up with, but errors sooner tingle in when people don't use them correctly.

The cadre mistake is designing without keeping human fallibility in mind. Expecting people to be perfect is not a tenable solution.

If you pay no sustentation to this speciality of software diamond (like I did for much of my career), you are setting yourself up for long term failure. However, once I started focusing on this problem, I discovered many good (and often easy) solutions. All you have to do is try, just a little bit, and sometimes you’ll set up a product that lasts forever.

Correctness Strategies

How do we diamond for correctness?

Human error is a problem in any industry, but I think that in the software industry we have a unique superpower that lets us sidestep this problem: we can readily turn human processes into software processes. We can take unreliable chores washed-up by people and turn them into unspoiled code, and faster than anyone else considering we’ve got all the software developers.

What do we do with this power to stave human fallibility? We constrain. The key idea is that the less self-rule you give, the increasingly likely you’ll maintain correctness. If you have the self-rule to do anything, then you have the self-rule to make every mistake. If you’re constrained to only do the correct thing, then you have no nomination but to do the right thing!

There are all sorts of strategies we can employ for correctness, laying on a spectrum between flexibility and rigidity:

Let’s squint at each strategy in turn.

Institutional Knowledge

Otherwise known as “stuff in your head.”

This is less of a strategy and increasingly of a starting point. Everything has to start somewhere, and usually that’s in the joint consciousness of you and your teammates.

Thoughts are great! Thinking comes naturally to most people and have many advantages:

Thoughts are extremely cheap; the going rate has been unaffected by inflation, so it’s still just a penny for a thought. Brainstorming is based on how unseemly thoughts are; “Where should this sawed-off go?” you might ask, and you’ll have fifteen variegated possible locations in the span of a few minutes.

Thoughts are extremely flexible. You can pitch a new process to your team to try out for a week, see how it goes, then welsh it if it fails. “Let’s try posting a quick status message each morning”, you might suggest, and when everyone inevitably hates it then you can quickly requite it up a week later.

Institutional knowledge can explain and summarize code. Would you rather read through every line of code, or have someone discuss its structure and goals? Trello Android could operate offline, which ways writing changes to the client’s database then syncing those changes with the server - I’ve just now described tens of thousands of lines of lawmaking in one sentence.

Institutional knowledge can explain the “why” of things. By itself, lawmaking can only describe how it gets things done, but not why. Any hack you write to solve a solution in a roundabout way should include a scuttlebutt on why the hack was necessary, lest future generations wonder why you wrote such wacky code. There might have been a series of experiments that unswayable this is the weightier solution, plane though that’s not obvious.

Institutional knowledge can describe human problems. There’s only so much you can do with code. Your vacation policy cannot be fully encoded considering employees get to segregate when they take vacation, not computers!

There’s a lot to like well-nigh thinking, but when it comes to correctness, institutional knowledge is the worst. Unseemly and flexible does not make for a strong lawfulness foundation:

Institutional knowledge can be misremembered, forgotten, or leave the company. I tend to forget most things I did without just a few months. Coworkers with expert knowledge can quit unendingly they want.

Institutional knowledge is laborious to share. Every new teammate has to be taught every bit of institutional knowledge by someone else during onboarding. Whenever you come up with a new idea, you have to communicate it to every existing teammate, too. Scale is impossible.

Institutional knowledge can be difficult to communicate. The game “telephone” is predicated on just how nonflexible it is to pass withal simple messages. Now imagine playing telephone with some difficult technical concept.

Institutional knowledge does not remind people to do something. Do you need someone to printing a sawed-off every week to deploy the latest build to production? What if the person who does it… just forgets? What if they’re on vacation and no one else remembers that someone has to push the button?

Like I said, institutional knowledge is good and important - it’s the starting point, and a cheap, flexible way to experiment. But any institutional knowledge that is regularly used should be codified in some way. Which leads us to…

Documentation

I’m sure that someone was screaming at their monitor while reading the last section stuff like “Documentation! Duh! That is the answer!”

Documentation is institutional knowledge that is written down. That makes it harder to forget and easier to transmit.

Documentation has many of the advantages of institutional knowledge - though not quite as unseemly or flexible, it is moreover worldly-wise to summarize lawmaking and describe human problems. It is moreover much easier to unconcentrated documentation; you don’t have to sit lanugo and have a conversation with every person who needs to learn.

There’s moreover a couple bonuses to visual knowledge. Documentation can use pictures or video. A good spritz orchestration or tracery summary is worth 1000 words - I could spend a tuft of time talking well-nigh how Trello Android’s offline tracery works, or you could squint at the spritz charts in this article. I personally find that video can click with me easier than just talking; I suspect this is why the modern video essay exists (over written articles).

Documentation can moreover create checklists for ramified processes. We streamlined much of it, but the process of releasing a new version of Trello Android still involved many unavoidably transmission steps (e.g. writing release notes or checking crash reports for new issues). A good checklist can help cut lanugo on human error.

Despite documentation’s benefits, there’s a reason this talk was originally titled “documentation is not enough.”

Here’s a worldwide situation we’d run into at work: we’d come up with a new team process or architecture, and people would say “this is great, but we’ve got to write it lanugo so people won’t make mistakes in the future.” We’d take the time to write some unconfined documentation… only to discover that mistakes kept occurring. What gives?

Well, it turns out there are many problems that can upspring with documentation:

Documentation can be immensely written or misunderstood. A document can explain a concept poorly or inaccurately, or the reader might simply misapprehend its meaning. There’s moreover no way to double-check that the information was transmitted effectively; talking to flipside person allows for clarifying questions, but reading documentation is a one-way transmission.

Documentation can be poorly maintained and go out of date. Perhaps your document was well-judged when first written, but years later, it’s a page of lies. Keeping documentation up-to-date is expensive and laborious, if you plane remember to go when and update it.

Documentation can be nonflexible to find or simply ignored. Plane if the document is perfect, you need to be worldly-wise to find it! Maybe you know it’s somewhere on Confluence but who knows where. Plane worse, people might not plane know they need to read some documentation! “I’m sorry I took lanugo the server, I didn’t know that you couldn't cut releases at 11PM considering I never saw the release process document.”

Documentation cannot serve as a reminder. Much like with institutional knowledge, there’s no way for documentation to tell you to do something at a unrepealable time. Checklists get you slightly closer, but there’s no guarantee that a person will remember to trammels the checklist! Trello Android had a release checklist, but oftentimes the release would roll virtually and we’d discover that someone forgot to trammels it, and now we can’t translate the release notes in time.

Documentation is necessary. Some concepts can only be documented, not codified (like high-level tracery explanations). And ultimately, software minutiae is well-nigh working with humans. Humans are messy, and only written language can handle that messiness. However, it’s only one step whilom institutional knowledge in terms of correctness.

Affordances

Let’s take a detour into the dictionary.

An affordance is “the quality or property of an object that defines its possible uses or makes well-spoken how it can or should be used.”

I was first introduced to this concept by “The Diamond of Everyday Things” by Don Norman, which goes into detail studying seemingly wuss diamond choices that have huge impacts on usage.

The imbricate of the typesetting is classic, showing how wacky a tea kettle with the spout and handle on the same side would be

A archetype example of good and bad affordances are doors. Good doors have an obvious way to unshut them. Crash bar doors are a good example of that; there’s no universe in which you’d think to pull these doors open.

https://commons.wikimedia.org/wiki/File:Set_of_Crash_Bar_Doors.jpg

The opposite is what is known as a Norman door (named without the same Don Norman). Norman doors that invite you to do the wrong thing, for example by having a handle that begs to be pulled but, in fact, should be pushed.

https://www.flickr.com/photos/79157069@N03/40530223463

Here’s why I find all this interesting: We can use affordances in software to invisibly guide people towards lawfulness in software. If you make “doing the right thing” natural, people will just do it without plane realizing they’re stuff guided.

Here’s an example of an affordant API: in Android, there’s no one stopping you from opening a connection to a database whenever you want. A dozen developers each doing their own custom DB transactions would be a nightmare, so instead, on Trello Android we widow a “modification” API that would update the DB on request. The modification API was easy - you would just say “create a card” and it’d go do it. That’s a lot simpler than opening your own connection, setting up a SQL query, and committing it - thus we never had to worry well-nigh anyone doing it manually. Why would you, when using the modification API was there?

What well-nigh improving non-software situations? One example that comes to mind is filing bug reports. The harder it is to file a bug report, the less likely you are to get one (which, hey, maybe that’s a full-length for you, but not for me). The groups that put the onus on the filer to icon out exactly where and how to file a bug tended not to hear important feedback, whereas the teams that said “we winnow all bugs, we’ll filter out what’s not important” got lots of feedback all the time.

If, for some reason, you can’t make the “right” way of doing things any increasingly affordant, you can instead do the opposite and make the wrong way un-affordant (aka nonflexible and obtuse). Is there an escape hatch API that most people shouldn’t use? Hide it so that only those who need it can plane find it. Getting too many developer job applications? Add a simple algorithm filter to the start of your interview pipeline.

I think of this concept like how governments can shape economic policy through subsidies and taxes: make what you want people to do cheap; make what you don't want people to do expensive.

Though not exactly an affordance, I moreover consider peer pressure a related way to invisibly nudge people in the right direction. I don’t think I’m vacated when I say that the first thing I do in a codebase is squint virtually and try to reprinting the local style and logic. If someone asks me to add a sawed-off that makes a network request, I’m going to find flipside sawed-off that does it first, reprinting and paste, then edit. If there are 50 variegated ways to write that code, well, I hope I found the right one to copy; if there’s just one, then I’m going to reprinting the write method. Consistency creates a flywheel for itself.

I love affordances considering they guide people without them stuff consciously enlightened of it. A lot of the lawfulness strategies I’ll discuss later are increasingly heavy handed and obtrusive; affordances are gentle and invisible.

Their main downside is that affordances and peer pressure can only guide, not restrict. Often these strategies are useful when you can’t stop someone from doing the wrong thing considering the coding language/framework is too permissive, you need to provide exceptions for rare cases, or you’re dealing with human processes (and anything can go off the rails there).

Software Checks

Software checks are when lawmaking can trammels itself for correctness.

If you’re anything like me, you’ve just started skimming this section considering you think I’m gonna be talking well-nigh unit tests. Well… okay, yes, I am, but software checks are so much more than unit tests. Unit tests are just one form of a software check, but there are many others, such as the compiler checking grammar.

What interests me here is the timing of each software check. These checks can happen as early as when you’re writing lawmaking to as late when you’re running the app.

The older you can get feedback, the better. Fast feedback creates a tight loop - you forget a semicolon, the IDE warns you, you fix it surpassing plane compiling. By contrast, slow feedback is painful - you’ve just released the latest version of your app and oops, it’s crashing for 25% of users, it’ll be at least a day surpassing you can roll out a fix, and you’ll have to undo some tracery choices withal the way.

Let’s squint at the timing of software checks, from slowest to fastest:

The slowest software trammels is a runtime check, wherein you trammels for lawfulness as the program is running. Collecting analytics/crash data from your software as it runs is good for finding problems. For example, in OkHttp, each Call can only be used once; try to reuse it and you get an exception. This trammels is untellable to make surpassing running the software.

There are big drawbacks to runtime checks: your users end up stuff your testers (which won’t make them happy) and there’s a long turnaround from finding a problem to deploying a fix (which moreover won’t make your users happy). It’s moreover an inconsistent way to test your lawmaking - there might be a bug on a lawmaking path that’s only accessed once a month, making the feedback loop plane slower. Runtime checks are worth embracing as a last resort, but relying on them vacated is poor practice.

The next slowest software trammels is a transmission test, where you manually execute lawmaking that runs a check. These can be unit tests, integration tests, regression tests, etc. There can be a lot of value in writing these tests, but you have to foster a culture for testing (since it takes time & effort to write and verify the lawfulness of tests). I think it’s worth investing in these sorts of tests; in the long run, good tests not only save you effort but moreover gravity you to technie your lawmaking in (what I consider) a often superior way.

One step up from transmission tests are streamlined tests, which are just transmission tests that run automatically. The cadre problem with transmission tests is that it requires someone to remember to run them. Why not make a computer remember to do it instead? Bonus points if failed checks prevent something bad from happening (e.g. blocking lawmaking merges that unravel the build).

Next up are compile time checks, wherein the compilation step checks for errors. Typically this is well-nigh the compiler enforcing its own rules, such as static type safety, but you can integrate so much increasingly into this step. You can have checks for lawmaking style, linting, coverage, or plane run some streamlined tests during compilation.

Finally, the fastest feedback is given at diamond time, where your editor itself tells you that you made a mistake while you are writing code. Instead of finding out you mis-named a variable during compilation, the editor can instantly tell you that there’s a typo. Or when you’re writing an article, the spellchecker can find mistakes surpassing you post the vendible online. Much like compile time checks, while these tend to be well-nigh grammatical errors, you can sometimes insert your own diamond time style/lint/etc. checks.

While fast feedback is better, the faster timings tend to constrain what you can test. Design-time checks can only specific shit of logic, whereas runtime checks can imbricate basically anything your software can do. In my experience, while it’s easier to implement runtime checks, it’s often worth putting in a bit of uneaten effort to make those checks go faster (and be run increasingly consistently).

Constraints

Constraints make it so that the only path is the correct one, such that it is untellable to do the wrong thing. Let’s squint at a few cases:

Enums vs. strings. If you can constrain to just a few options (instead of any string) it makes your life easier. For example, people are often tempted to use stringly-typing when interpreting data from server APIs (e.g. “card”, “board”, “list”). But strings can be anything, including data that your software is not worldly-wise to handle. By using an enum instead (CARD, BOARD, LIST) you can constrain the rest of your using to just the valid options.

Stateless functions vs. stateful classes. Anything with state runs the risk of ending up in a bad state, where two variables are in stark disagreement with each other. If you can execute the same logic in a self-contained, stateless function, there’s no risk that some long-lived variables can end up out of structuring with each other.

Pull requests vs. merging to main. If you let anyone merge lawmaking to main, then you’ll end up with lightweight tests and wrenched builds. By requiring people to go through a pull request - thus permitting continuous integration to run - you can gravity largest habits in your codebase.

Not only can constraints guarantee correctness, they moreover limit the logical headspace you need to wrap your mind virtually a topic. Instead of needing to consider every string, you can consider a limited number of enums. In the same vein, it moreover limits the number of tests you need to imbricate your logic.

Automation

When you automate, a computer does everything for you. This is like a constraint but largest considering people don’t plane have to do anything. You only have to write the automation once, then the computers will take over doing your busywork.

One constructive use of this strategy is lawmaking generation. A archetype example are Java POJOs, which don’t come with an equals(), hashCode(), or toString() implementations. In the old days, you used to have to generate these by hand; these implementations would quickly go stale as you modified the POJO’s fields. Now, we have libraries like AutoValue (which generate implementations based on annotations) or languages like Kotlin (which generate implementations as a language feature).

Continuous integration is flipside unconfined automation strategy. Having trouble remembering to run all your checks surpassing merging new code? Just get CI to gravity you to do it by not permitting a merge until you pass all the tests. You can plane have CI do will-less deployments, such that you barely have to do anything without merging lawmaking surpassing releasing it.

There are two main drawbacks of automation. The first is that it’s expensive to write and maintain, so you have to trammels that the payoff is worth the cost. The second problem is that automation can do the wrong thing over and over again, so you have to be shielding to trammels that you implemented the automation correctly in the first place.

Analysis

Now that we’ve reviewed the strategies, indulge me to demonstrate how we use them in the real world.

Before solving any given problem, you should take a step when and icon out which of these strategies to wield (if any) surpassing committing to a solution. You’ll probably end up with a mix of strategies, not just one. For example, it’s rarely the specimen that you can just implement constraints or automation without moreover documenting what you did.

There are a few meta-considerations to take into worth as well:

First, while rigid solutions (like constraints or automation) are largest for correctness, they are worse for flexibility. They are expensive to transpiration without implementation and unforgiving of exceptions. Thus, you need to wastefulness lawfulness and flexibility for each situation. In general, I trend towards early flexibility, then moving towards lawfulness as necessary.

Second, you might implement lawfulness badly. You can have flakey software checks, overbearing lawmaking contribution processes, difficult automation maintenance, or no escape hatches for new features or exceptions. Lawfulness is an investment, and you need to make sure you can sire to invest and maintain.

Last, you need buy-in from your teammates. I tend to make the mistake of thinking that considering I like a solution that everyone else will moreover like it, but that’s definitely not unchangingly the case. If you get try-on from others, lawfulness is easier to implement (especially for team processes); people will go withal with your plans, or plane pitch in ideas to modernize it.

Disagreements, on the other hand, can lead to toxicity, such as people ignoring or purposefully undermining your creation. At my first job they tried to implement a lawmaking style checker that prevented merges, but didn't have a plan for how to fix old files. There was no will-less formatter (because it was a custom markup language), so no one overly wanted to fix the big files; instead everyone just kept using a workaround to stave the lawmaking style checker! Whoops!

Taking some time to gather vestige then presenting the specimen to your coworkers can make a world of difference.

Now, let’s squint at a few examples and unriddle them…

Code Style

For example, how do you get everyone to unceasingly use spaces over tabs?

❌ Institutional knowledge - Bad; this doesn’t prevent people from going off the lawmaking style at all.

❌ Documentation - Just as bad as institutional knowledge, but written down.

✅ Affordances - Semi-effective. You can configure your editor to unchangingly use spaces instead of tabs. Plane better, some IDEs let you trammels a lawmaking style definition into source tenancy so everyone is on the same page style-wise. However, in terms of correctness, it guides but doesn’t restrict.

✅ Software checks - Using lint or lawmaking style checkers to verify lawmaking style is a unconfined use of CPU cycles. People can’t merge lawmaking that goes off style with this in place.

❌ Constraints - Not really possible from what I can tell. I’m not sure how you’d enforce this - ship everyone keyboards without the tab key?

❌ Automation - You could have some vaccinate automatically rewrite tabs to spaces, but honestly this gives me the heebie jeebies a bit!

In the end, I like enforcing your style with software checks, but making it easier to stave failures with affordances.

Code Contribution to an OSS Project

How do people contribute lawmaking to an unshut source codebase? If you’ve got a particular process (like lawmaking reviews, running tests, deploying) how do you ensure those happen when a random person donates code?

❌ Institutional knowledge - Untellable for strangers.

✅ Documentation - If you write solid instructions, you can create a increasingly welcoming environment for anyone to contribute code. However, documentation vacated will not result in a reliable process, considering not everyone reads the manual.

✅ Affordances - There’s plenty you can do here, like templates for explaining your lawmaking contribution, or giving people well-spoken buttons for variegated freelancer deportment (like signing the freelancer license agreement).

✅ Software checks - Having plenty of software checks in place makes it much easier for people to contribute lawmaking that doesn’t unravel the existing project.

✅ Constraints - Repository hosts let you put all sorts of nice constraints on lawmaking contribution: prevent merging directly to main, require lawmaking reviews, require freelancer licenses, require CI to pass surpassing merging.

✅ Automation - CI is necessary considering it feeds information into the constraints you’ve set up.

For this, I use a mix of all variegated strategies to try to get people to do the right thing.

Cleaning Streams

Let’s revisit the story from the whence of this vendible - how to wipe up resources in reactive streams of data (specifically with RxJava).

❌ Institutional knowledge - You can teach people to wipe up streams, but they will forget.

❌ Documentation - No increasingly correct than institutional knowledge, just easier to spread the information.

✅ Affordances - We used an RxJava tool tabbed CompositeDisposable to wipe up a tuft of streams at once. AutoDispose adds easier ways to wipe up streams automatically as well. However, all these solutions still require remembering to use them.

✅ Software checks - We widow RxLint to verify that we unquestionably handle the returned stream subscription. However, this does not guarantee you stave a leak, just that you made an struggle to stave it. If you’re using AutoDispose, it provides a lint trammels to make sure it’s stuff used.

✅ Constraints - I’m pretty excited by Kotlin coroutines’ scopes here. Instead of putting the onus on the developer to remember to wipe up, a coroutine telescopic requires that you pinpoint the lifespan of the coroutine.

❌ Automation - Knowing when a stream of data is no longer needed is something only humans can determine.

What strategy you use here depends on the library. The weightier solution IMO are constraints, where the library itself forces you to stave leaks. If you’re using a library that can’t enforce it (like RxJava), then affordances and software checks are the way to go.

Conclusion

Obviously, not every option is misogynist to every problem - you can’t automate your way out of all software development! However, at its core, the less people have to make choices, the largest for correctness. Free people’s minds up for what really matters - developing software, rather than wrestling with avoidable mistakes.