IEEE Software - The Pragmatic Designer: Healthy Code Reveals the Problem and Solution

This column was published in IEEE Software, The Pragmatic Designer column, Sep-Oct 2019, Vol 36, number 5.

ABSTRACT: Source code reveals abstractions from two places: the problem and the solution. It’s easier to design and evolve a system when you understand each of them separately before you combine them in code. With skill, it’s possible to separate those concerns in the code. Declarative understanding of the abstractions is the most useful and easy to convey. However, current software development processes rarely guide developers to do this.


Pre-publication draft. Please click this official link so your view counts in the IEEE's records of article views – plus the IEEE site has profesionally typeset PDFs.

A while ago, I was involved with a project that followed a typical software development process that included iterations, standup meetings, refactoring, and retrospectives. The process had no guidance on technical practices other than a code style guide, which mostly covered syntactic issues such as variable and method names. Based on the surveys I have read, this is what many teams across our industry are doing.

For a while things went just fine. There were no obvious danger signs: the user interface worked well, deployments happened on a weekly cadence without much drama, and features were being added at an acceptable rate. That is, if you were a manager, there were no obvious danger signs.

Inevitably, there was a change in the requirements: we shifted from building software for consumers to building it for businesses. Other than that, the problem domain was the same and the features were largely unchanged. However, it was hard to accommodate this change because it required deep surgery into the code to change assumptions and it therefore took several quarters.

Reflecting on what happened, I think we had only a limited understanding of the problem domain and what we knew was not well expressed in the code. It was easy to see that the code did things with data structures but it was hard to turn that into declarative understanding such as “customers have at most one shipping address”. It was similarly difficult to understand the concepts in the solution domain, for example how asynchronous jobs were meant to behave. What we did understand boiled down to how to handle specific cases. We could reason about those cases using the program’s data structures, not using the concepts in the problem domain.

It took a long time to handle the new requirements because we had to first re-acquire clarity about the concepts in the problem and solution domains. This was an undesirable and vulnerable position that no team wants to be in. In this article, let’s dig in a bit more to understand how the problem domain, solution domain, and the code must line up if we hope to react to changing requirements.

Problem domain and solution domain

It’s standard design advice to separate concerns, for example to separate what we know about the problem from what we know about its solution. Our problem domain could include customers, accounts, and addresses while our solution could include web browsers, multithreaded application servers, and relational databases. Being a software developer means creatively intersecting these two domains to create things like CustomerTables and Address DOM elements.

When there is a change to the requirements and we learn that customers can now have multiple shipping addresses, it clutters our reasoning to include details like database tables and browser DOM elements. That is, it’s easier to reason about this change if we have separated the concern of the problem domain and we can limit our thinking to customers, accounts, and addresses. Once we understand the implications of the change there, we can move on to include the technology we used in programming the solution.

Changes will happen in the solution domain too. A while ago, the Java programming language added lambdas and I, like many developers, changed my code to take advantage of it. Again it was easier to focus my thinking on the separated concern first and, once the implications of the change were clear, move on to the intersection of the problem and solution.

Use the Principle of Least Expressiveness

When I write code, I’m trying to reveal what I know via the medium of code. I’m hoping that the ideas that are in my head as I write the code also appear in yours has you read it. That’s a difficult task and just wishing it to happen is not enough.

I try to express what I know about the problem and solution in the least expressive way I can. For example, don’t use a method when you can use a state machine or lookup table. That’s the Principle of Least Expressiveness (PLE) that I wrote about it in an earlier article [4]. So instead of writing methods or functions that respect the idea that customers can have at most one shipping address, I make that idea explicit by declaring a data structure named Customer that has an optional field called ShippingAddress. My code would behave correctly written in either style, but the less expressive code is more likely to convey my understanding to readers.

Another example: Instead of writing code using Turing-complete mechanisms that discount the invoice total if the customer has a long order history, I define that the Customer has one of two statuses (Normal or Premium), I define a relationship between the order history and being a Premium Customer, and I define a relationship between Premium customers and an Invoice Discount. Again, either way of writing the code would compute invoices correctly, but the style of writing code that favors definitions and declarations more directly expresses my ideas about the problem domain.

It’s certainly possible that a reader would have accurately inferred the problem domain with either style of expression in the code. However, I think more people will receive my intended ideas about the problem domain if they read code that follows the PLE.

One reason for that is names. We can write code that never gives a name to the concepts of Premium Customer or an Invoice Discount. However, when we give names to ideas it helps your readers see the problem domain the same way you do. Expressing those names through code declarations is better than code comments and Turing-complete code, like methods or functions.

Preserving what was known

Even when using the PLE we cannot fully separate the problem and solution concerns so they will be tangled in our source code. Once they are tangled, can we ever see either of them clearly again?

Not unless we take care. It’s quite easy to write code that ensures the tangling of problem and solution is a one-way street. The code can so tightly connect the problem and solution that from then on the only way to evolve it is to intellectually master the quirky code and start your reasoning from there.

Perhaps you have been on projects where new requirements are expressed in terms of the program’s data structures. I certainly have and it was because the only way to reason about those programs was in terms of the quirky abstractions in the code rather than problem domain concepts.

So there is a real danger that early in the project we had clear knowledge about the problem domain but once we started coding we lost that clarity. But as described above, with effort it’s possible to write code that preserves our clear domain understanding. What’s more, by using the PLE our understanding of the domain may grow over time, expressing declaratively ideas that were originally implicit.

Reasoning about the intersection

When we intersect the problem and solution, interesting things happen that aren’t present in either domain. Consider a web application that handles customers with an HTTP request to create a new customer. In the problem domain, either a customer exists or it doesn’t, which maps neatly to the time before the request and the time after the request.

In our solution, we’d like to ensure the data we’ve been handed adds up to a valid customer, but it takes longer to validate the data than we’d like the HTTP request to be open. To satisfy that, our web application creates the customer during the request and then schedules the validation to run asynchronously. That way the web request is answered quickly and it validates the customer data.

Notice, however, that the customer as we’ve built it now has some states: the customer is pending when we first save it, then either valid or invalid once the validation completes. Those states arise only because of details in the solution domain. Our idea of a customer in the problem domain is never pending.

Now imagine someone reading the source code and trying to infer how the problem domain works. There can be states of the customer that arise in the problem domain, for example regular and premium customers. So it’s a challenge to read the code and conclude that regular vs premium is a concept from the domain but pending vs valid is found only in the code.

What about literate programming?

You may have heard about Literate Programming [1]. When writing the TeX text processing system, Donald Knuth wanted to not only write code that worked, but also explain to readers why it worked. He invented a new programming language that allowed him to interleave textual descriptions with the source code. This language was then compiled into documentation and executable code.

Despite continued interest in literate programming, it seems that few programs are written in this style, perhaps because it requires special pre-processing to turn it into executable programs. Simply making an effort to clearly reveal the problem and solution domains in your code does not mean you are doing literate programming.

Donald Knuth mentions that it was so fun to write programs in a literate style that he was tempted to go back and rewrite existing code. He also says that his programs in this style are better programs, in part because the style encourages him to do a good job explaining the program to people reading it. He does not highlight the declarative nature of his explanations but I notice it in the literate programs I’ve seen.

What about expressing design intent?

Software developers are often advised to express design intent in the code. Broadly speaking, design intent covers anything the code author was thinking. In a narrower sense it usually omits domain details, especially those that aren’t needed in the current design but would be helpful if requirements changed just slightly. So even if someone did a great job expressing their design intent, you might not learn what you want to know about the problem domain.

The best expression of design intent is declarative. Kent Beck, who has been a driving force behind agile software development, said it this way. “Another principle behind the implementation patterns is to express as much of my intention as possible declaratively.” [2]

What about refactoring?

Refactoring is necessary but not sufficient. As requirements change and your understanding of the problem and solution evolves, you will need to refactor or else your code will always reveal your ideas from day one, not your current best understanding.

Refactoring is often described as a mechanical activity that removes duplicated or redundant code. While it’s true that mechanical refactoring can clean up the code and enable you to understand things better, you must go beyond mechanical refactoring if you want the code to reveal ideas that are actually simpler or a better fit for the problem or solution.

One such refactoring is the use of PLE to simplify code into a less expressive form. This may require you to invent and name ideas that are implicit in the code because you are shifting the expression from operational to declarative. This kind of refactoring is incredibly valuable, yet I’ve noticed that it’s easy to take it for granted. Once an idea exists, it’s hard to imagine how you thought before the idea. So, in a perverse twist, those who do the hardest refactoring work can be easiest to overlook.

What about functional programming?

Functional programming starts from a better position because it’s easy to make declarative statements–but is not a magic wand. Functional languages let us express functions to compute what we want but they do not lead us to express why that’s the right thing to do or what we know about the problem domain. I have seen functional code that does an excellent job of separating the problem and solution, but I’ve also seen it impossibly tangled together.

What about Domain Driven Design?

Like what I’m suggesting here, Domain Driven Design (DDD) emphasizes that the software team pay attention to the problem domain, seek out and evolve domain abstractions, and reveal them in the source code. It has a set of patterns that are suitable for Information Technology (IT) systems but probably not all kinds of software. To my knowledge it has no connection to the Principle of Least Expressiveness. So, if you are a fan of DDD then treat this as strong support for the core idea.

Isn’t the problem domain the product owner’s job?

Michael Jackson said it better but I’ll paraphrase: the buck stops here [3]. It’s reasonable to ask the product owner to build a clean set of abstractions for the problem domain. However, in my experience, product owners rarely have the skills to deliver descriptions of the problem domain that programmers can use directly.

The precision needed to write an essay or user stories is less than what’s needed to write code. Unless the product owner is using a formal or quasi-formal notation that helps identify fuzzy thinking, it’s unlikely that the product owner’s prose is going to meet your needs as a software developer. So in the best case you will be filling the gaps and in the worst case you will be doing all of the work.

Development process

In this article, I’ve talked about what can happen when projects follow typical software development processes that do not guide them to express the problem and solution domains clearly. If we hope to react quickly to changing requirements, we must align our code with our best understanding of the problem and solution. If something seems like a small change in the problem domain, it should translate into a small change in the code.

I can imagine a team full of highly experienced developers following a simple process based on stories and iterations who keep their code in good shape. They would be applying hard-won lessons from their experience, refactoring their code as they progressed. The process itself wouldn’t tell them to do this, but their shared values would make that possible.

What I see in practice is that what gets measured gets done. If the team tracks story points and the movement of stories on a Kanban board, then that is what will be done. It’s hard to track and measure the separation of concerns or refactoring using the PLE, so those don’t get done well or at all.

I have seen people argue that the team’s development process should aim directly for business value. That sounds like a moral imperative because of course we want to do the thing the business needs. Any suggestion of an alternative path, for example that the team should create declarative models of the problem and solution domains, would seem on face value to be a distraction.

Yet that is exactly the nature of process. There’s no need for a process when our gut instincts yield perfect results. We need processes to tell us to do something non-obvious, something that’s been learned through experience.

Pre-publication draft. Please click this official link so your view counts in the IEEE's records of article views – plus the IEEE site has profesionally typeset PDFs.

References

  1. Donald Knuth, Literate Programming, The Computer Journal, vol. 27, no. 2, pp. 97-111, May 1984.
  2. Kent Beck, Implementation Patterns, Addison-Wesley, 2007.
  3. Michael Jackson, “The world and the machine”, Proc. 17th Int. Conf. Software Engineering (ICSE95), pp. 283-292, 1995.
  4. George Fairbanks, The Principle of Least Expressiveness, IEEE Software, The Pragmatic Designer, vol. 36, no. 3, pp. 116-119, May/June 2019.