The price of an error on the example of one PVS-Studio issue
We often write articles about software errors that we detect with our PVS-Studio static code analyzer. These errors are different: simple and complex, obvious and difficult to find, understandable and such, which require a few minutes of explanation. All these errors have something in common - the cost. We often have some disagreements with our readers about how high the price can be. Some say - what's the big deal that it's an error? We'll find it, we'll fix it. If we don't, we don't. Or a bit differently. Well the app will crash because of this error, no big deal. Let's restart and that's it.
I'd like to tell you a true story from our company's life. It doesn't really relate to static analysis, but it illustrates well the concept of the "price of an error".
On our site, we have the option to request a trial version. A person who got interested in the analyzer leaves a request using a certain form. After that, I immediately get a generated email, which gets to me with the specified information. If there are any additional questions in the email, I answer them. If not, an automatically generated email with a one-week trial key goes straight to the person. The answer is sent by mail usually within an hour from the request.
The algorithm is simple and working. There's a request form handler on the site, checking and sending mail use cron, so what can break here? However, on December 19, cron crashed on the server. It isn't really important why and we can't put a code fragment here anyway. I admit that there might be no error, just a bad configuration. But it broke. And emails with keys stopped being sent. At all...
As for me, I kept answering incoming questions that I was receiving. Still there were some conversations on support topics, though not that many as usually. Well, it's Christmas time - everyone is busy... I noticed the problem only on December 24, half a week later. Two users wrote that we don't respond to requests for keys. We immediately checked that and got horrified.
This incident cost me about 50 trial requests, which we haven't answered in time. We managed to get 50 users to our site thanks to marketing and advertising efforts but eventually couldn't handle them.
You might say: "Well, why don't you test sending mails from the site? It's an important part of the process!". Yes, it's important. That's why we do test it. "You probably send an e-mail in case of an error, but this time it wasn't sent?". No, we know that error handlers often contain errors, too. So we got creative. Every day at the same time, I'm getting an email that the mail works correctly. The last such email was on December 18. Unfortunately, I didn't notice that it stopped coming.
So what's the point of this small note? Mistakes (in programs, in their configuration, and just the human factor) lead to losses. If you can do something to reduce these risks, be sure to do it. For example, introduce a static code analyzer.
This whole case explains well the reason why in articles we often draw readers' attention to the importance of checking the results of functions, such as malloc. It's not normal when a program stops working (crashes), when something goes wrong. This is a denial of service (CWE-400). This is a potential vulnerability. By reducing their number, you reduce the risk of losses.