The other day Google revealed the sources of the robots.txt parser. Why not give a run for the already far and wide checked project using PVS-Studio and possibly find a bug. So said so done. But I wish we could find something meaningful. Well, then let it be just a reason to give full marks for Google developers.
robots.txt - is an index file that contains rules for search robots. It works for https, http and FTP protocols. Google made the parser of the robots.txt file available for everyone. Read more about this news here: Google opens the source code of the robots.txt parser
I think most of our readers know what PVS-Studio does. But in case it's your first time on our blog, I'll give a brief reference. PVS-Studio is a static code analyzer that allows you to find a variety of bugs, vulnerabilities, and flaws in projects written in C, C++, C# and Java. In other words, PVS-Studio is a SAST solution and it can work both on user machines, build servers and in the cloud. The PVS-Studio team also likes writing articles on checks of various projects. So let's get to the point and try to find errors in the source code of the parser from Google.
Unfortunately, but to the delight of everyone else, no mistakes were found. Only a couple of minor flaws, which I will tell about. Well, I have to write something about the project :). The lack of errors is due to the small amount of the project and high quality of the code itself. This doesn't mean that there are no hidden errors, but static analysis was helpless at that moment.
So this article happened to be in the spirit of another our post "The Shortest Article about a Check of nginx".
I found a case with possible optimization:
V805 Decreased performance. It is inefficient to identify an empty string by using 'strlen(str) > 0' construct. A more efficient way is to check: str[0] != '\0'. robots.cc 354
bool RobotsTxtParser::GetKeyAndValueFrom(char **key, ....)
{
....
*key = line;
....
if (strlen(*key) > 0) {
....
return true;
}
return false;
}
It's inefficient to call the strlen function to find out if a string is empty. This check can be much simpler: if (*key[0] != '\0'). This way you don't have to traverse the entire string, if it's not empty.
V808 'path' object of 'basic_string' type was created but was not utilized. robots.cc 123
std::string GetPathParamsQuery(....)
{
std::string path;
....
}
The string is declared, but not used further. In some cases, unused variables may indicate an error. In this case, it looks like this variable was used somehow, but after making changes it became unnecessary. Thus, the analyzer often helps to make the code cleaner and helps to avoid errors by simply removing prerequisites for their appearance.
In the next case, the analyzer recommends to add a default return after the entire main is executed. Perhaps it is worth adding a return statement at the very end in order to understand that everything has really worked out. However, if such behavior was intended, nothing needs to be changed. If you don't want to see this warning, in PVS-Studio you can suppress it and never see it again :).
V591 The 'main' function does not return a value, which is equivalent to 'return 0'. It is possible that this is an unintended behavior. robots_main.cc 99
int main(int argc, char** argv)
{
....
if (filename == "-h" || filename == "-help" || filename == "--help")
{
ShowHelp(argc, argv);
return 0;
}
if (argc != 4)
{
....
return 1;
}
if (....)
{
....
return 1;
}
....
if (....)
{
std::cout << "...." << std::endl;
}
}
I also found that two functions below which had different names were implemented in the same way. Perhaps this is the result of the fact that earlier these functions had different logic, but came to one. It may be that a typo crept somewhere, so such warnings should be carefully checked.
V524 It is odd that the body of 'MatchDisallow' function is fully equivalent to the body of 'MatchAllow' function. robots.cc 645
int MatchAllow(absl::string_view path, absl::string_view pattern)
{
return Matches(path, pattern) ? pattern.length() : -1;
}
int MatchDisallow(absl::string_view path, absl::string_view pattern)
{
return Matches(path, pattern) ? pattern.length() : -1;
}
It's the only place I'm suspicious of. It should be checked by the project's authors.
Thus, the check of the robots.txt parser from Google showed that this project, which have been checked multiple times and is widely used, is of great quality. Even some found flaws cannot spoil the impression of cool Google coders writing this project :).
We suggest you as well to download and try PVS-Studio on the project you're interested in.