What is Technical Debt?
Technical debts are technical decisions that incur future obligations, just as financial debts do.
Technical debt corresponds to the cost of making and verifying future changes in a complex technical system. Even if the desired changes are not known in advance, it is often possible to predict change costs for the system as a whole and key parts of it.
The term was first used by the wiki-inventor Ward Cunningham, who said: “Shipping first time code is like going into debt. … Entire engineering organizations can be brought to a stand still under the debt load of an unconsolidated implementation.” The term has since spread throughout the software development community and beyond.
Practitioners involved in the design of technical systems find the term evocative, but the theoretical underpinnings of the term have yet to be explored. Consider how technological decisions give rise to future technological costs or constraints. If a product never needs to change, then the technologists who created the product have no future obligations, hence no debts.
But software-intensive systems are different from many physical products in that they last a very long time and are capable of being modified by changing the code. (Unlike the walls of a building, which are generally fixed in steel and concrete.) Because software systems can change, both developers and users expect that they will change. But at the same time, much of the legacy code will continue exist for years and even decades.
Changes are desirable, because users will come to see and demand new features. System designers, in turn, must deal with defects, errors, and ‘bugs’. The software system evolves, but there is almost never a chance to start over from scratch with a blank slate. Technical debt is related to the cost of making such changes. The debt burden is borne by future developers (and users) who must work with the code as it evolves.
There is a close correspondence between technical debt and cost of exercising ‘real’ options, whose nature only becomes apparent as history unfolds. This mapping of technical debt onto the cost of exercising options, in turn, suggests ways in which such debts can be located, measured, and quantified in financial terms.
What are Options?
In finance, an option is “the right but not the obligation” to choose a course of action and obtain an associated payoff. In engineering, a new design creates the ability but not the necessity—the right but not the obligation—to do something in a different way. In general (if the designers are rational), the new design will be adopted only if it is better than its alternatives. Thus the economic value of a new design is properly modeled as an option using the methods of modern finance theory. A fundamental property of designs is that at the start of any design process, the final outcome is uncertain. Once the full design has been specified and is certain, then the development process for that design is over. Uncertainty about the final design translates into uncertainty about the design’s eventual value. How well will the end-product of the design process perform its intended functions? And what will it be worth to users? These questions can never be answered with certainty at the beginning of any substantive development process. Thus the ultimate value of a design is unknown when the development process begins. Uncertainty about final value in turn causes new designs to have “option-like” properties. The option-like structure of designs has three important but counterintuitive consequences.
1. When payoffs take the form of options, taking more risk creates more value. Risk here is defined as the ex ante dispersion of potential outcomes. Intuitively, a risky design is one with high technical potential but no guarantee of success. “Taking more risk” means accepting the prospect of a greater ex ante dispersion. Thus a risky design process is one that has a very high potential value conditional on success but, symmetrically, a very low, perhaps negative, value conditional on failure. What makes the design an option, however, is that the low-valued outcomes do not have to be passively accepted. As we said, the new design does not have to be adopted; rationally, it will be adopted only if it is better than the alternatives, including the status quo alternative. In effect, then, the downside potential of a risky design is limited by the option to reject it after the fact. This means that “risk” creates only upside potential. More risk, in turn, means more upside potential, hence more value.
2. When payoffs take the form of options, seemingly redundant efforts may be value-increasing. Two attempts to create a new design may arrive at different endpoints. In that case, the designers will have the option to take the better of the two. The option to take the better of two or best of several outcomes is valuable. Thus when faced with a risky design process, which has a wide range of potential outcomes, it is often desirable to run multiple “design experiments” with the same functional goal. These experiments may take place in parallel or in sequence, or in a combination of both modes. But whatever the mode, more risk calls for more experimentation.
3. Options interact with modularity in a powerful way. By definition, a modular architecture allows module designs to be changed and improved over time without undercutting the functionality of the system as a whole. This is what it means to be “tolerant of uncertainty” and to “welcome experiments” in the design of modules. As a result, modules and design experiments are economic complements: an increase in one makes the other more valuable. The act of splitting a complex engineering system into modules multiplies the valuable design options in the system. At the same time, this modularization moves decisions from a central point of control to the individual modules. The newly decentralized system can then evolve in new ways. Notice, however, that by modularizing, one barrier to entry by competitors, the high costs of developing an entire complex engineering system (like an automobile, a computer, or a large software package) are reduced to the costs of developing individual modules. Thus the modularization of a large, complex system, even as it creates options and option value, also sows the seeds of increased competition focused on the modules.
Higher degrees of modularity can increase the value of a complex design through option value. This result is a special case of a well-known theorem, first stated by Robert Merton in 1973. For general probability distributions, assuming aggregate value is conserved, Merton showed that a “portfolio of options” is more valuable than an “option on a portfolio.”
In short, technical debt is cruft in a system’s design that forces an organization to pay unwanted costs on an ongoing basis. Designers working on these inelegant parts may have lower productivity, less success delivering on schedule, and more sleepless nights. Inelegant pieces may also have more quality problems, be less flexible, and have a higher probability of catastrophic failure.
All of these organizational costs translate into dollars lost. Technical debt – analogous to financial debt – is a useful way for managers and engineers to think rationally about when to overhaul something and when to leave “well enough” alone. If the cost of fixing a system is less than the cost of maintaining it over the long haul, then it should be fixed. Otherwise, nothing should be done.
While technical debt is a useful way to think about prioritizing resources when managing a complex design, a number of challenges remain.
- Can we reliably and repeatedly measure technical debt in different parts?
- Can we measure the cost of servicing that debt?
- What kind of product-related complexity metrics correlate with significant maintenance costs?
- Can we use them as one way of measuring our technical debt?
Many software complexity metrics commonly used today, such as McCabe’s cyclomatic complexity score, look at properties of functions, classes, or files in isolation. While some of these metrics are useful, they pay no attention to how a system as a whole is interconnected. As a result, these metrics don’t capture some things that designers know are important.
Thanks to the work of Herbert Simon and others, we know that systems should be structured as hierarchies from a macro-perspective. Simon and Parnas have told us that these hierarchies should be composed of modules – elements with defined public interfaces, private internals, high internal cohesion, and lower external coupling. Good systems might also contain platforms or abstraction layers. Commonly used software complexity metrics don’t adequately capture this wisdom.
In response to these limitations, Harvard professors Alan MacCormack and Carliss Baldwin invented a set of metrics that can be used to think about a source code file’s architectural complexity – the complexity that arises within a system due to a lack or breakdown of hierarchy or modularity. MacCormack and Baldwin do this by creating a Design Structure Matrix (DSM) containing files and inter-file dependencies. They then algorithmically classify files based on their position within the rest of the system.
|Architectural complexity classification||Description|
|Peripheral||Peripheral files do not influence and are not influenced by much of the rest of the system.|
|Utility||Utility files are relied upon (directly or indirectly) by a large portion of the system but do not depend upon many other files themselves. They have the potential to be self-contained and stable.|
|Control||Control files invoke the functionality or accesses the data of many other files. They may coordinate the system behavior.|
|Core||Core files are in regions of the system that form highly integral clusters containing large cycles in which files are directly or indirectly co-dependent. These regions are hard to decompose into smaller parts and may be unmanageable if they become too large.|
Eliminating Technical Debt
It is widely believed that architecture significantly impacts a firm’s financial performance and that refactoring projects that address architectural problems have the potential to eliminate technical debt, thereby freeing up money for more productive use.
Using the proper tooling and databases, it should be possible for firms to begin to estimate the financial cost complexity by assigning a monetary value to cost drivers (such as productivity, defect density, turnover) that it influences. Firms could use such a system to plan, manage, and estimate the value of architectural improvement in large complex systems in relation to the balance and interest rate of latent debt.