IEEE Software - The Pragmatic Designer: Steering Software Qualities

This column was published in IEEE Software, The Pragmatic Designer column, July-August 2025, Vol 42, number 4.

ABSTRACT: Developers want to steer the quality attributes in their system. For example, they want to keep latency low, security high, and the code maintainable. To achieve this, they typically roll up their sleeves and get to work rather than deliberately employing design techniques that would steer the desired qualities. That’s because the connection between software design and quality attributes is not widely understood. This column explains the connection and shows how software design can steer software qualities.

Pre-publication draft. Please click this official link so your view counts in the IEEE's records of article views – plus the IEEE site has profesionally typeset PDFs.

Engineering is complicated, so it’s easy to overlook a basic truth: The qualities that a system exhibits derive primarily from its design. But there’s a catch: We influence these qualities indirectly. So, if a bridge withstands high winds safely, you can thank the engineers who used abstractions. Yes, the bridge is constructed from bolts and steel, but stresses and strains guided how those bolts and steel could make it resilient.

Formal study of software architecture abstractions started in the 1990s and revealed how design was connected to qualities. In hindsight, it’s now clear that the reason a software developer would pay attention to architecture is precisely because it can steer quality attributes. To steer something, you need to be able to grasp a control, like a rudder, commanding it to do your bidding. Software architecture reveals the essential abstractions that give you the ability to steer.

Most developers do not learn this in school and developer-oriented articles or blog posts often call abstraction the villain, not the hero. Because they don’t understand how to steer the qualities of their software, they try things that don’t work so well. Software stubbornly ignores job titles, developer enthusiasm, and story points. This column provides a capsule understanding of how you can steer the qualities in your system using software design.

Sidebar: What's in a name?

People care about what their software does (printing, sorting, etc.) and how well it does it (security, usability, latency, etc.). Let's call these two things features and qualities. Both terms have many synonyms and longer names, such as functionality, characteristics, and quality attributes.

You have probably heard one synonym more than others: non-functional requirements (NFRs). Why do we instead say quality attributes? Let’s take it word-by-word. If you say something is non-functional, you’re saying it’s broken. Maybe you could say extra-functional, but that’s even clunkier. The other word is requirements. Are qualities requirements? If you compare two designs, one might be faster and the other more secure. These are characteristics that you can see in the designs, not specifications you demand. Both systems could meet your speed requirements, yet one has the quality of being faster. When you say one is faster, you are describing, not prescribing. The term quality attributes goes back to at least the 1950s when it was used by the American Society for Quality Control.

Steering at Netflix

In the 2000s, Netflix had a problem: when one of their services slowed down, its callers repeated their requests, which slowed the service down further. A vicious cycle. They fixed this with the Chaos Monkey [1], which, counter-intuitively, improved reliability by stopping services that were running in production. Reliability emerges from that technique because rare overload events become commonplace, so developers handle them, so the system becomes more reliable. It’s ingenious.

What’s less well-known is that Netflix developers already had a library that could handle the service overload problem, but they ignored that library. Overload conditions were rare enough that developers didn’t seek out solutions. Netflix could have changed developer behavior in various ways, such as adding compliance steps to their process, sending developers to education, or harsh punishment. Chaos Monkey changed developer behavior by making failures more common, thereby making the reliability problem visible. It’s even more ingenious.

Features are local, qualities are emergent. You can point to the code in Netflix that handles the streaming or account activation features. There’s no single place that handles qualities like security, usability, latency, or resiliency. To build a feature, you write some code. To influence a quality, you typically need to write code, but only after using abstractions that help you reason about how that quality can emerge from the system as a whole.

Netflix improved reliability by structurally changing the overall system. They created Chaos Monkey as a lever of control and they wrote a software library to handle overload conditions. In those actions, you can glimpse a general framework to steer software qualities: choose a strategy, create levers of influence, and apply design techniques. The next few sections describe that framework. There are a lot of figures that summarize the ideas and introduce terminology and you may want to glance at the figures first, then return to the text.

Strategy

Figure 1 shows three broad strategies for steering software qualities. The first is named ad hoc because developers using this strategy work on qualities only when the need becomes acute. Some developers at Netflix appear to have been focused on feature development, oblivious to design techniques to achieve reliability.

Many teams create a stream of work that combines both feature and quality requests, prioritized by customer needs. Developers metaphorically roll up their sleeves and get to work on qualities, which works ok if they lucked into an architectural style that matched their needs, say a three-tier style for an IT system. Too often they are unlucky: quality needs become apparent long after the system design is set in stone.

A local strategy recognizes the need to steer qualities from the beginning. Developers may be advised to always make certain modules idempotent, or stateless, or free of side effects. The advice often takes the form: when X, don’t forget Y. If such advice were easy to follow, we would have many fewer buffer overruns and memory leaks.

A structural strategy is the one I wish all developers understood. Often, you can set up initial conditions on a project so that it’s easy to achieve the qualities you want with little effort or vigilance. Chaos Monkey set up conditions by which Netflix became reliable without constant vigilance. It creates the opportunity for a few experts to own something complicated, such as the queueing theory needed to optimize message flow between services, removing that burden from most developers.

Figure 1: Strategies for achieving desired qualities

Few projects start with a clear understanding of quality goals and tradeoffs, e.g., latency is prioritized over maintainability. Those that do tend to work locally and with vigilance to achieve the quality. Few use a structural strategy: promoting desired qualities via the system’s architecture. [3]

Levers of influence

Trying to change the qualities in a system or organization feels like pushing Jello. You must first create levers of influence, then you can operate them. Before Chaos Monkey, Netflix had a clear desire to improve reliability – they had even written a library – but lacked a suitable lever of influence. If you are lucky, your organization already has some levers, but typically they are missing. There might also be obstacles, for example your company has an education curriculum but you cannot change it.

Levers of influence exist in different scopes. Figure 2 shows three scopes: software architecture and design, the software lifecycle, and systems engineering. There is a power-generality trade-off between them [2]. Systems engineering is the most general, but the least powerful; software architecture is the most powerful but least general.

Figure 2: Three scopes: architecture & design, software lifecycle, and systems engineering

They exhibit a power-generality tradeoff [2]: Systems engineering has many levers of influence over quality attributes that apply to many problems. Software architecture has a few levers that act only on the software itself, but they affect quality attributes directly and forcefully.

Figure 3 shows several examples of levers of influence in systems engineering. Chaos Monkey, educating developers, and exerting management pressure to use the library are levers in the scope of systems engineering. Perhaps Chaos Monkey should be in the scope of software architecture or the lifecycle – it can be hard to categorize some levers.

Changing hiring practices and curriculum can have a big impact on the software, though indirectly and after a long time. And consider that what gets staffed gets done: are there quality-specific job roles such as user experience engineer and site reliability engineer?

  • Hiring different engineers people
  • Education and curriculum
  • Choice of software development process or programming style
  • Build / buy: Outsource to third party with expertise
  • Role choices: Systems / requirements / operations engineers

Figure 3: Systems engineering levers of influence

Systems engineering is much broader than just software development. If people complain your elevators are too slow, a software engineer might investigate the scheduling algorithms while a systems engineer might put mirrors next to the elevators so people adjust their clothing and don’t notice the wait.

Between architecture and systems engineering is the software development lifecycle. Figure 4 shows several levers of influence in the software lifecycle (except design, detailed next). Every stage in the software development lifecycle is an opportunity to steer qualities. Do not skip past the requirements / analysis stage, as systems become tangled when quality priorities and tradeoffs are unknown or chosen inconsistently on each sub-team. Notice that Chaos Monkey by itself reduces reliability, but works great in combination with monitoring and alerting.

  • Requirements / analysis
    • “A problem well-defined is half-solved” -- Charles Kettering
  • Review
    • Process “gates” for compliance / governance
    • Code review
    • Static analysis “linters”
  • Testing
    • Regression testing
  • Deployment
    • Continuous integration & delivery automation
  • Monitoring
    • SLO monitoring and alerting (SLO targets = quality targets)

Figure 4: Software development lifecycle levers of influence

Each stage in the lifecycle is an opportunity to steer qualities. The design stage is elaborated in Figure 5.

Software architecture and design has the most direct influence on qualities. Figure 5 shows four primary levers of influence in software architecture. Two classic levers of influence are architectural styles and tactics [3]. Both are large-scale patterns, with tactics typically nested within a style. Each style is known to promote (or inhibit) certain qualities. Once the style is chosen, tactics can be applied according to Attribute-Driven Design to further refine qualities.

More recently recognized levers include architectural hoisting [4] and thinking [5]. When you move a responsibility out of developers’ hands and into the infrastructure, that’s hoisting. Without hoisting, developers require vigilance. With hoisting, the infrastructure ensures that it’s done. Architectural thinking is an umbrella term for the use of architecture abstractions to achieve intellectual control over a system [5]. An alternative to architectural thinking is testing and statistical control.

Figure 5: Software architecture’s levers of influence

In principle, these levers are always available to steer qualities, but on any given system you may need to develop them. For example, a system that has become a big ball of mud lacks an architectural style [6], so you cannot use the style lever to promote a quality. [3] [4] [5]

Architecture and design techniques

Design techniques are the most fine-grained influence. Figure 6 shows several design techniques organized by viewtype: compile-time, run-time, and deployment/allocation. All developers will have experience with these techniques as they are daily or at least weekly activities. For example, refactoring code into a reusable library (compile-time viewtype) or reusable service (run-time viewtype) is common.

Design techniques work with architectural thinking. Most developers employ heuristics for writing code. If a system has challenging latency needs, however, developers must trade off latency with modifiability. In a few computation-intensive modules, they would optimize the code for speed, not modifiability. They are thinking architecturally: keeping quality priorities and tradeoffs in mind and identifying critical paths of computation.

Compile-time viewtype
  • Documentation and refercne architecture
  • Reusable library / toolkit
  • Typeful programming
  • Static analysis
  • Algorithm choice and design
  • Patterns and tactics
Run-time viewtype
  • Reusable service
  • Dynamic analysis
  • Patterns and tactics
Deployment / Allocation viewtype
  • Blue-green deployment
  • Staging / production allocation
  • Monitoring / alerting
  • Chaos monkey
  • Patterns and tactics

Figure 6: Software architecture techniques by viewtype

Architecture and design techniques influence the software most directly. Patterns and tactics exist in all three viewtypes.

The value of software design

You can promote or inhibit quality attributes through your software design choices. This column sketches a framework for understanding how to do that. With the ideas and terms here, you are in good shape to dig into each idea further.

From a bird’s eye perspective, this approach looks reasonable and perhaps even obvious. In practice, however, software development is often filled with intense pressure to build features and fix bugs. A simple version of the system is often demanded within a few weeks – little more than a prototype – but this prototype defines the architecture of the system. With these pressures, it’s tempting to assemble something from what’s handy or use your last system as a template for the next one. The project’s conditions can make it hard to use design techniques to steer qualities.

It’s increasingly common to see software development as a factory, with developers expected to be increasingly efficient at their stations. From this perspective, value is created only when deployed features are making money for the company, and you seek improvement by running a tighter factory, efficiently moving features into deployed code. [7]

I have a different perspective. The developer role is a combination of engineer and factory worker. A developer’s attention must be balanced between engineering work (such as design) and implementation work (such as coding and testing). If we embrace the factory metaphor too strongly then we’ll seek coding efficiency at the expense of engineering and design quality. In a tight factory, when will developers have time to learn and apply software design techniques?

The levers of influence over quality attributes aren’t just there for the pulling and pushing: to be effective, they must be created and nurtured. If a developer joins a project that’s already a big ball of mud, it may require heroic efforts to create those levers and steer quality attributes.

Some experts think that software architecture will naturally make its way into standard industrial practice: it’s following historical technology adoption patterns. I think software design has a marketing problem. People conflate software design with waterfall processes, which they reject, and with know-nothing corner-office architects, who they resent.

I want to overcome this marketing problem. I wish developers could see software design the way I do, that developers can design every day, and that design can range from tiny tasks all the way to systems engineering. Asking for time for design and architecture might sound like I’m asking for the software process to slow down, but that’s the opposite of what I want. There’s no evidence that cutting corners makes software development – or any kind of engineering – go faster, except for a temporary boost at the start. Ask anyone working on a big ball of mud about their velocity. No, I don’t want slow, I want to see developers at the top of their game, zooming along with every design technique available to them.

Pre-publication draft. Please click this official link so your view counts in the IEEE's records of article views – plus the IEEE site has profesionally typeset PDFs.

References

  1. S. Anand, Keeping Movies Running Amid Thunderstorms, Nov 20, 2011, QCon San Francisco.

  2. A. C. Bock, “The Power/Generality Trade-Off in Decision and Problem Modeling: Theoretical Background and Multi-level Modeling as a Resolution, Lecture Notes in Business Information Processing, vol 318, Springer, Cham., 2018, doi: 10.1007/978-3-319-91704-7_14.

  3. L. Bass, P. Clements and R. Kazman, Software Architecture in Practice, Addison Wesley Longman, 2021.

  4. G. Fairbanks, Architectural Hoisting in IEEE Software, vol. 31, no. 4, pp. 12-15, July-Aug. 2014, doi: 10.1109/MS.2014.82.

  5. G. Fairbanks, Intellectual Control, IEEE Softw., vol. 36, no. 1, pp. 91–94, Jan./Feb. 2019. doi: 10.1109/MS.2018.2874294.

  6. B. Foote and J. Yoder, Big Ball of Mud Pattern Languages of Program Design, vol. 4, 1997.

  7. G. Fairbanks, Why Is It Getting Harder To Apply Software Architecture?, in IEEE Software, vol. 38, no. 4, pp. 126-129, July-Aug. 2021, doi: 10.1109/MS.2021.3071520.