Cost of an Error: Who Pays for Programming Blunders?
- The world level - year 2038 problem
Modern programmers live in a very special period of time, when the software is penetrating into literally all spheres of human life and is installed on a numerous amount of devices that are a part of our every day life. Nobody is surprised by software in the fridges, watches and coffee-machines. However, the dependence of people on the smart technology is also growing. The inevitable consequence: the reliability of the software becomes priority number 1. It's hard to scare someone with a freaked out coffee-maker, although it can bring a lot of harm (liters of boiling coffee flowing on your white marble countertop...). But the thought of growing requirements for the quality of software is really important, that's why let's talk about errors in the code that led to significant waste of time and money.
The aim of these stories is to fight against the idea that defects in programs can be treated as lightly as they were before. Errors in programs now aren't just incorrectly drawn units in a game, the code is now responsible for people's health and safety of property. In this article I would like to cover several new examples of the necessity to treat code really thoughtfully.
It's undeniable that complex programs are getting in our live more actively: household appliances controlled by a smartphone, gadgets, which obtained such features that were hard to imagine 10 years ago and of course, more complex software on the factories, cars and so on; any program created by a person, and the smarter it is, the more dangerous is its failure.
Let's talk about money lost because of errors in the software and the growth of our dependence of the program code. This topic has been repeatedly discussed (also by me colleague Andrey Karpov "The Big Calculator Gone Crazy") and every new example proves the same: the code quality is not something you can ignore.
An Expensive Overline
Mariner 1 satellite was supposed to reach Venus. Launched from Cape Canaveral, the rocket almost immediately changed the trajectory, which created a threat to fall on the ground. To prevent a possible catastrophe, NASA made a decision to start the self-destruction system. Mariner 1 was destroyed 293 seconds after the launch.
The inspection committee conducted a research, during which they found that the cause of the accident was a programming error because of which a program received incorrect control signals.
The most detailed and consistent account was that the error was in hand-transcription of a mathematical symbol in the program specification for the guidance system, in particular a missing overbar. The error had occurred when a symbol was being transcribed by hand in the specification for the guidance program. The writer missed the superscript bar (or overline) in the formula (by which was meant "the n-th smoothed value of the time derivative of a radius R").
Since the smoothing function indicated by the bar was left out of the specification for the program, the implementation treated normal minor variations of velocity as if they were serious, causing spurious corrections that sent the rocket off course (source).
The cost of the "missing overbar" - 18 million dollars (for that time).
Russian GPS that Drowned
Another vivid example of how millions can get lost because of a programming error, is a relatively recent case. It should seem that in the 21st century there is everything necessary for writing secure programs, especially when we talk about the space industry. Experienced professionals with excellent education, financing, possibility to use best tools for testing the software. All of this didn't help. December 5, 2010, a carrier-rocket "Proton-M" with three satellites "Glonass-M" - a Russian equivalent of GPS, crashed in the Pacific Ocean.
The reason of the accident after the investigation was announced by an official representative of the Prosecutor-General's Office of the Russian Federation Alexander Kurennoy: "The investigation has established that the crash was due to the application of a wrong formula, which resulted in putting additional 1,582 kilograms of liquid oxygen into the acceleration unit's oxidizer tank. This error led to the carrier rocket's injection into an open orbit and its subsequent fall into the Pacific Ocean". (source)
An interesting point is that the document on the need for adjustment of the formula was submitted to the organization's relevant department but was written off by the engineer as fulfilled. The authorities didn't verify the way their directives were carried out. All those involved in the accident, were convicted of a criminal offense and imposed large fines. Still, that doesn't compensate the loss of 138 million dollars.
Back in 2009, Manfred Broy, a professor of informatics at Technical University, Munich, and a leading expert on software in cars said: "it [every premium-class automobile] probably contains close to 100 million lines of software code" (source) It's been eight years since that moment and even if you aren't a fan of TopGear, you may have noticed that modern cars have become real intelligent machines.
According to the expert, the cost of the software and electronics in the car is about 40% of its price on the market. And this applies to gasoline engines, just think about hybrids and electric cars, where this value is approximately 70%!
When electronic filling becomes more complex than mechanical, it puts more responsibility on software developers. A bug in one of the key systems such as braking, is much more dangerous than a torn brake hose.
So here is a question - drive modern, comfortable and "smart" cars or oldschool, but simple cars? It's up to you to decide, I am suggesting you a selection of bugs in the software of cars.
Japanese Toyota cars have positive reputation in general, but from time to time the media shows information about withdrawal of a number of machines. There is already an article in our blog about a software bug in Toyota - "Toyota: 81 514 issues in the code", but unfortunately, this is not the only case.
In the year 2005, 160 thousand Toyota Prius hybrids were withdrawn, manufactured in the end of 2004, beginning of 2005. The problem was that the machine could stop and conk out. It took about 90 minutes to fix the bug on one vehicle, in sum total about 240 man-hours.
Chrysler and Volkswagen
In May 2008, Chrysler withdrew 24535 Jeep Commanders cars manufactured in 2006. The reason was a programming error in the automatic transmission control module. The failure resulted in uncontrolled stop of an engine.
In June of the same year, Volkswagen recalls approximately 4000 Passat and 2500 Tiguan cars . In this case the software error caused the increase in the engine rotational rate. Tachometer registration was going up when the air conditioner is turned on.
Needless to say, that the process of car withdrawal is associated with enormous financial losses. What is much more dangerous for such huge manufacturers besides the financial expenses is the loss of consumer trust. Taking into account the toughness of the competition on the automotive market, such a mistake may have very negative consequences. Restoring a reputation as a reliable manufacturer may be very difficult.
Previously we were speaking about usual vehicles, manufactured not really long ago. As you see, even they may have software errors, what can we say about actively marketed environmentally friendly electric cars.
Let's talk about Tesla Model S, of course. May 7, 2016, Joshua Brown, who became famous due to his You Tube videos where he praises this car, got into an accident. He was driving Tesla Model S. Being 100% sure in the intellect of the car, he entrusted the autopilot. The result of this trust is tragic - Joshua died at the scene due to injuries received in the crash.
The accident gained wide publicity. An investigation started. The research showed that apparently, Brown wasn't really keeping his eyes on the road, and the autopilot got into a situation that wasn't programmed in his code. There was a truck with a trailer moving in front of Joshua's Tesla car. The car planned to make a manoeuvre - to turn left, which required slower speed. But the Tesla car, going behind, didn't start slowing down, as the autopilot systems didn't recognize the object located ahead.
Most likely, it happened because of the bright sun. Immediately after the crash, the explanation for the failure (put forth by Tesla) was that the car likely failed to distinguish the white tractor trailer from the sky. The official report states the following: "braking for crossing path collisions, such as that present in the Florida fatal crash, are outside the expected performance capabilities of the system." (source). The complete accident report is freely available to the public.
In other words, the autopilot is meant to help the driver (more advanced cruise-control, so to say), but not to replace the functions. Of course, such an excuse from Tesla company wasn't a great help. The work on the software continued, but Tesla Model S weren't withdrawn.
The representatives of the company shared such road stats: "On average, one person dies on U.S. highways for every 90 million vehicle miles traveled. In contrast, people had driven Tesla's autopilot 130 million miles before the first confirmed death. That number is now up to 200 million." (source).
On the one hand, such statistics show that electric cars are safer, but are you ready to entrust your life, the life of the passengers and other road users to a program?
And this is not a rhetorical question. Judging by the exchange news, contrary to the sensational accident, the Tesla shares grew up to 50% in 2017. Two major factors of this: the popularity of environmental improvement and eco-friendly movements and high personal rating of the Tesla head - Elon Musk.
The world level - year 2038 problem
I couldn't help bringing this example at the end of the article here. You may read more about the Year 2038 Problem in the article "2038: only 21 years away", but I would like to point to an important point.
Equipment for factories: all kinds of machines, conveyors, home appliances and other complex facilities, equipment with specialized software have quite a long service life. The probability that a machine manufactured in 2017 will be still working in 2038 is quite high. Hence, it is logical to conclude: the problem when 32-bit values of time_t type will no longer be able to correctly display the dates is already relevant!
If the software developers don't take that into account now, what will happen in 2038? There is every chance that, the software for embedded systems will bring quite a number of surprises. I believe, we will be witnessing that.
Perhaps the examples given in the article seem too epic. Of course, only tragic cases get the public attention. But I am sure that in every company engaged in software development, there is a story about how just one mistake has caused a lot of problems, albeit local ones.
Is it possible to find whom to blame? Sometimes yes, sometimes no. There is no point in finding the guilty person and chastise him/her. The idea is that the programs get more complex, they are penetrating into our lives more and more, which means that the requirements for the code reliability are also growing. The price of the typical errors is increasing, and the responsibility for the code quality falls on the shoulders of the developers.
What is the solution? Modernize the development process. Provide assistants for programmers - special programs for detection and fixing of bugs. Complex use of modern techniques significantly decreases the probability that a bug in the code will not be detected at the development stage.
I wish you not to make errors in the code, so that your projects never get to the list of that projects you have just read about.