>
>
Discussion on Static Code Analysis

Andrey Karpov
Articles: 673

Discussion on Static Code Analysis

One of my articles aroused a few comments filled with so much prejudice against static analysis that I felt I should post my reply as a separate article for others to see. I hope it will give the author of the comments and other skeptics a different perspective on static analysis tools in general and PVS-Studio in particular.

Comments

The comments are in Russian and relate to the article "Give my Best Regards to Yandex Developers". The first comment can be found here. Here it is:

While they might be useful, all these analyzers are also a headache because they slow down the build process, produce false positives and meaningless output, and give you a false sense of security. In my opinion, professional programmers are better off without such tools - the fewer, the better - because the trouble by far outweighs the benefit. It's of no use to noobs either; they should learn how to write unit tests instead of wasting time fighting the warnings. Perhaps the guys from Yandex have the same view on it.

On the other hand, if you run analysis before releasing, that will be less of a headache so you could get a tolerable time/bugs ratio. So, who knows...

Disclaimer: I haven't tried PVS-Studio.

And here's an addition:

By the way, don't forget about your real competitors (not de jure but de facto): dynamic analyzers (ASAN, MSAN, TSAN, UBSAN). Sure, they have a weak point: they are not that fast and use more resources. But they've also a strong one: if, after fuzzing your code, *SAN says it's found "something", it's no use denying you have a bug that must be fixed. And static analyzers are not so... definite.

So you have to squeeze into the niche between the built-in analyzer from clang/gcc/msvc (free and always at hand) and *SAN (complicated and costly, but almost zero false positives so that nearly every issue reported can be submitted to the bug tracker). Is that niche wide enough?

I like these comments very much because they make a perfect generalized image of misconceptions about static analysis. Let's examine each point made by the critics in detail.

I'm talking from the standpoint of a PVS-Studio evangelist, but everything said below applies to any other static analysis tool as well.

Analyzers mostly produce distracting information noise

While they might be useful, all these analyzers are also a headache because they slow down the build process, produce false positives and meaningless output, and give you a false sense of security. In my opinion, professional programmers are better off without such tools - the fewer, the better - because the trouble by far outweighs the benefit.

A static analyzer is a higher-level and more intelligent version of compiler warnings. In fact, some diagnostics originally implemented in static analyzers are gradually adopted by compilers. In the same way, some functionality from the boost library is adopted by std, yet boost is still doing well. Maybe it's not the best analogy, but the central idea should be clear.

Anyway, analyzers provide a better and deeper analysis of source code than compilers do. Plus, good analyzers produce a lot fewer false positives than compilers do. Static analyzers also come with a bunch of additional features to make their use easier - and that's what the analyzer developers are paid for.

I can hear you arguing that I'm downplaying the false-positive issue; that your compiler produces almost zero warnings, while the static analyzer issues thousands of them.

Well, that's easy. You see, you have to customize the analyzer's settings before using it. It's unfair to compare it with the compiler. Try compiling your code with a different compiler, one you haven't used before, and you'll get just as many thousands, if not dozens of thousands, of warnings. Not to mention that you'll be lucky if you manage to compile it at the first try without prior tweaking.

As I said, good static analyzers produce few false positives and provide means for easy suppression of those few. You just have to spend some time on configuring it. For more reflections on that topic, please refer to my article "Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives".

I've got carried away a bit but I had to explain why a static analyzer is a more powerful tool than compiler warnings.

So, rejecting static analysis is as silly as turning off all compiler warnings. Now, let's rewrite the original comment as follows:

While they might be useful, all these compiler warnings are also a headache because they slow down the build process, produce false positives and meaningless output, and give you a false sense of security. In my opinion, professional programmers are better off without such tools - the fewer, the better - because the trouble by far outweighs the benefit.

Any sensible programmer would call it nonsense. Yes, compiler warnings are sometimes just noise, but that doesn't make them less useful.

If people say compiler warnings give them a false sense of security, the problem is with them, not with the compiler.

Lastly, if compiler warnings do more harm than they do good, then it's a sign of the programmer's professional inaptitude rather than skill. They just don't know how to use compiler warnings properly.

This obviously applies to static analysis as well. We have sorted out the first portion of misconceptions and now move on to the next one.

No use to beginners

It's of no use to noobs either; they should learn how to write unit tests instead of wasting time fighting the warnings.

I agree that beginners should, above all, focus on learning the programming language, learning how to debug their code, write unit tests, and so on. All the auxiliary tools are of no use without fundamental knowledge.

Yes, static analysis is probably not the kind of technology one should start with. But static analysis is always your friend, no matter how experienced you are. College students may well benefit from it, as it could tell them what's wrong with their code and inspire further learning. By the way, we have a nice article about analysis of students' code: "Of Evil Accidentally Summoned by a Sorcerer's Disciples".

If you take new employees who already have some expertise, static analysis could help team leaders to quickly estimate newcomers' skill and figure out in what aspects they need additional training.

Note. By the way, unit tests are not a cure-all either; static analysis complements unit-testing rather than competing with it. See the article "How to complement TDD with static analysis".

Running a static analyzer before releasing is fine

On the other hand, if you run analysis before releasing, that will be less of a headache so you could get a tolerable time/bugs ratio. So, who knows...

This is a totally wrong way to use static analysis! The only worse way is not to use it at all or use it once in half a decade.

If you stick to this approach, you have a hard time picking bugs out slowly and painfully by means of unit tests, debugging, testers, and so on. If you then run a static analyzer on the final release version, all it will find is minor bugs, whose insignificance is the reason why you haven't found them earlier.

What static analysis is actually all about is to catch bugs as soon as possible! That is, at the coding stage. It's this way that the analyzer could save your time, money, and nerve cells.

Again, let's try it with compiler warnings. Suppose you have them turned off and spend 3 months developing your project. Then, the day before releasing, you turn them back on. Silly, isn't it?

I also recommend my colleague's article about how single-time checks make little sense: "Philosophy of Static Code Analysis: We Have 100 Developers, the Analyzer Found Few Bugs, Is Analyzer Useless?".

Dynamic analysis vs static analysis

By the way, don't forget about your real competitors (not de jure but de facto): dynamic analyzers (ASAN, MSAN, TSAN, UBSAN). Sure, they have a weak point: they are not that fast and use more resources. But they've also a strong one: if, after fuzzing your code, *SAN says it's found "something", it's no use denying you have a bug that must be fixed. And static analyzers are not so... definite.

True, dynamic analyzers have strong points, but they also have weak ones. They can detect bugs that static analyzers can't. And vice versa! Dynamic analyzers produce almost zero false positives. But then they are badly suited for testing certain parts of code or can't do it in reasonable time. Unlike them, static analyzers can check the entire source code swiftly.

The point is that you shouldn't view this as "static analysis vs dynamic analysis". These types of analysis don't compete; they complement each other. By using static and dynamic analysis together, you can catch tons of bugs of all kinds.

We don't see dynamic analyzers as competitors because professionals never ask the question, "Which technology to pick?" They use both, because both are answers to the question, "What else can I do to make my code better?"

Note

By the way, for some unknown reasons, some programmers believe that dynamic analyzers do just the same job as static analyzers, and that they are, therefore, better because they produce fewer false positives. No, that's wrong. There are a lot of things that dynamic analyzers can't do. Again, this is nicely covered in the following articles:

It's difficult to squeeze into the niche between compilers and dynamic analyzers

So you have to squeeze into the niche between the built-in analyzer from clang/gcc/msvc (free and always at hand) and *SAN (complicated and costly, but almost zero false positives so that nearly every issue reported can be submitted to the bug tracker). Is that niche wide enough?

Our niche is wide enough to accommodate not only PVS-Studio but also tools by many other companies such as SonarSource, Synopsys (formerly known as Coverity), Gimpel Software, Rogue Wave Software, and so forth.

What makes it wide? That's easy: what's said in the comment doesn't imply any restrictions for static analyzers. Dynamic analyzers are on one border, but, as we've figured out, they are partners, not competitors.

Compilers are on the other border. Sure, they grow smarter. But static analyzers don't stand still and are evolving quickly too.

Skeptics should take a look at some of my articles where I show how easily PVS-Studio catches bugs in these compilers:

I haven't tried PVS-Studio

You really should :) Many grow to like it and become our customers.

The good news is that it's very easy to get started. Just visit the product page and download the demo version.

If you have any questions, don't hesitate to email us. We'll help you with checking a large project and sorting out the warnings, and advise you on licensing options.

Thank you for reading and may your code stay bugless!