Large IT and software projects suffer — and fail — from complexity
Complexity that contributes to technical debt has been responsible for a variety of high-profile system failures and accidents that have lead to loss of life. A few notable examples of failed software projects include a multi-billion dollar attempt to create an FAA air-control system and an automated baggage handling system in the Denver airport that mutilated and lost bags.
Successful designs must be structured in such a way that they can evolve over time in response to learning, new requirements, and new opportunities. Designers still have little desire to be surprised by the behavior of large systems that are in operation – when this happens, the results are generally undesirable.
- Misaligned conveyor belts
- Carts jammed in tracks
- Mutilated and lost bags
- Increased costs
- Reduced performance
- Lower cost effectiveness
A major goal of a designer is to manage structural complexity in a design so as to keep the dynamic and emergent complexity of a system in operation well understood and controlled. This is because accidents are often caused by unanticipated interactions between parts. For example, the Tacoma Narrows bridge collapse in 1940 was caused by harmonic properties of the bridge as a whole. This scenario was hardly considered by its designers.
More recently, a cascading power failure in India affected over a half billion people in July 2012. While the exact cause is presently unclear, what is evident is that the power distribution network was structured in such a way that a positive feedback loop could amplify a local problem and cripple an entire country. These types of failures are pernicious because they result when the structure of the whole system exhibits insufficiently constrained behavior. When emergent properties of the system as a whole are unanticipated, they often cause emergencies.
While the system architecture community focuses more on structural complexity, and the system dynamics community focuses more on behavioral complexity (or dynamic complexity), both communities agree that the structure of a system is a key driver of its long term behavior and dynamic characteristics. Both communities view structures and the resulting dynamic interactions as directly responsible for how a system performs during periods of stability and how likely various catastrophic events might be.
Technical debt sneaks up on you … and it can be fatal. See below to find out more about recent examples of systems on which technical debt has had notable ramifications. Do you have an example of technical debt you’d like to include? Submit your example [CONTACT US]
Research in Motion
Research in Motion, creator of the Blackberry smartphone (the first mobile device that could receive “push” email), was a technological and financial powerhouse in the early 2000s. In January 2010, it had 21 million active subscribers and commanded a 43% share of smartphone subscribers. But, even then, there were indications that technical debt was accumulating in its operating system (OS) software.
“RIM currently has huge technical debt — insurmountable technical debt. BlackBerry is considerably more difficult for developers to develop for and considerably more difficult for users to use.”
“[Past technical decisions] created complexity and very slow OS updates for end users. Developers are forced to develop not for the latest features, but for a two or three year old feature set.”
“I’ve heard through the grapevine that RIM actually maintains two separate codebases for their O/S – one for GSM devices and another for CDMA.”
“[T]he OS (according to an app developer I spoke with at CES last year) is too inflexible to develop on quickly. Supposedly, even to fix the display issues around magnifying pages quickly and smoothly, requires major rip-out-and-replace.”
“I’m not surprised. Any OS as old as theirs is usually crusted up with a lot of inflexible legacy code and kludges.”
The launch of its new platform (Blackberry 10) was delayed by more than a year because of technical problems. Partly because of these difficulties, in 2013, the company suffered a 40% drop in sales and financial loss of $628 million. From “What’s really wrong with BlackBerry (and what to do about it),” October 2010.
Technical debt is not only incurred by software companies and startups. Large companies with legacy codebases suffer from it as well. Goldman Sachs is one of the largest and most profitable banks in the world.
From “Flash Boys: A Wall Street Revolt,” Lewis, Michael. 2014 on Cryptome.
After a few months working on the forty-second floor at One New York Plaza, [Sergey Aleynikov] came to the conclusion that the best thing they could do with Goldman’s high-frequency trading platform was to scrap it and build a new one from scratch. His bosses weren’t interested. “The business model of Goldman Sachs was, if there is an opportunity to make money right away, let’s do that,” he says. “But if there was something long-term, they weren’t that interested.” Something would change in the stock market— an exchange would introduce a new, complicated rule, for instance— and that change would create an immediate opportunity to make money. “They’d want to do it immediately,” says Serge. “But if you think about it, it’s just patching the existing system constantly. The existing code base becomes an elephant that’s difficult to maintain.”
That is how he spent the vast majority of his two years at Goldman, patching the elephant.
OPOWER is a “software as a service” company that contracts with utility companies to provide their customers with information about energy usage and personalized ways to reduce consumption. Early on, as it sought to gain customers, it incurred technical debt in its software systems: “We were very good at sales from the start, so we accrued lots of technical debt. We would chase three big deals at the same time and win them all. Because we didn’t have a focused roadmap, we would build new stuff independently for all three clients, stretching our engineering staff and creating inefficiencies.” [President] As the company matured, its managers began to manage their debt more carefully. But debt still got in the way of scaling the system. “[D]ue to interdependencies in our code base and the resulting need for developers to talk to each other, you can’t simply double the number of engineers and double output. Every time you add a feature, you make the next feature more difficult to develop, because the new feature must take the old one into account.” [VP-Engineering] OPOWER’s managers looked for ways to trade off the the cost of technical debt against features that were valuable to customers. In one instance: [S]upporting multiple meter owners would effectively … increase the cost of building most future products by 20%.
“Of course, it is difficult to scope this future cost accurately, since we don’t know what we’ll build. But we are sure that the impact would be significant.” [VP-Product Management]
From “Product Development at OPOWER,” Eisenmann and Go. February 2011