A few words about interaction between PVS-Studio and Clang

Feb 18 2013

Author: Andrey Karpov

What is PVS-Studio based on?
Why do we have Clang in the distribution kit?
Are there overlapping diagnostics?
Conclusion

The reason for writing this post is that many programmers think that the PVS-Studio analyzer is based on Clang. It's not so. I'd just like to explain in brief why the PVS-Studio distribution kit contains the Clang compiler and what it is used for.

What is PVS-Studio based on?

The PVS-Studio analyzer is based on the open-source library OpenC++. Very little has remained of this library by now, though. The code of program text parsing has been significantly revised. For instance, it now supports the C and C++11 languages as well as non-standard language extensions for different compilers, etc.

The reason why we took the OpenC++ library, not the Clang compiler, as the basis for out tool is very simple: when we started, nobody ever heard of Clang. The development of the Clang project was only beginning then. We learned about it later, but it didn't suit our needs anyway, as it had been initially oriented on the Linux world.

If we started developing the static analyzer nowadays, we would have certainly chosen Clang. However, we don't intend to switch to it in the future. Yes, we would get a more efficient mechanism for code parsing in this case, but PVS-Studio is anyway good enough at doing this now, except for cases with complex templates. On the other hand, that's not much to search for in complex templates, and you just don't know how to generate an adequate diagnostic message when there is something to be found. That's why the analysis quality almost doesn't depend on the completeness of template parsing.

Thus, if we decide to switch to Clang, it'll take us at least 1 year, while you won't see any significant improvements to the code analysis quality. It's definitely an interesting programming task but quite an unreasonable enterprise from the commercial viewpoint. Since we are programmers and managers at the same time, we have to suppress our urges to make everything as ideal as possible from the technical viewpoint. We'd rather spend this year creating a hundred of new diagnostics.

Why do we have Clang in the distribution kit?

Although, as you now know, we don't use Clang for code parsing it is still included into the distribution kit. The point is that the OpenC++ library didn't have a preprocessor. Neither did we. Initially we used the Visual C++ compiler (cl.exe) for the preprocessing purpose. It's a nice and simple solution, except for one thing. Visual C++ seems to have two preprocessing mechanisms. One is fast. It is used during the compilation process. Another is slow and allows you to create preprocessed *.i files. I don't know why it is done in that way. What is important is that we have to use the slow version. It significantly slows down the analysis process. File preprocessing often takes more time than analysis itself.

So we've found an alternative way of generating *.i files. The Clang preprocessor appeared to work very fast and be capable to process header files included into Visual C++. Well, capable almost always: although the authors announced it to be completely compatible, you cannot have complete compatibility in practice, of course. It's not a trouble though. If you cannot use Clang, you can always select a safer but slower Visual C++ (cl.exe) in the settings.

Note. PVS-Studio does the following thing by default. It first tries to preprocess the file with Clang. If it fails, Visual C++ is used. It often goes unnoticed by users, and they may even not guess what is happening. For instance, suppose Clang fails to process 5% of files. So, the tool will have to additionally launch cl.exe to preprocess these remaining files. But despite this additional work, the analysis speed is much higher than when cl.exe is used all the time.

Are there overlapping diagnostics?

Some programmers think that PVS-Studio is based on Clang because some of the diagnostic capabilities of these tools overlap. Yes, there is some overlapping indeed, but the reasons are quite different:

1. Some ideas lie on the surface. The same diagnostic rules may be implemented by independent teams in different tools in a very similar way.

2. The Clang developer team borrows some ideas realized in PVS-Studio, and vice versa. It all started after the article "PVS-Studio vs Clang" had been written.

Conclusion

I hope that I have clarified the issues related to interaction between PVS-Studio and Clang. If you want to know more about what errors PVS-Studio can detect, please see this collection of detected defects in some Open Source projects.

P.S. You are welcome to follow us in twitter where we regularly post links to articles on C/C++ programming and related subjects.