Webinar: C++ semantics - 06.11
As software developers, we always want our software to work properly. We'll do everything to improve the software quality. To find the best solution, we are ready to use parallelizing or applying any various optimization techniques. One of these optimization techniques is the so-called string interning. It allows users to reduce memory usage. It also makes string comparison faster. However, everything is good in moderation. Interning at every turn is not worth it. Further, I'll show you how not to slip up with creating a hidden bottleneck in the form of the String.Intern method for your application.
In case you've forgotten, let me remind you that string is a reference type in C#. Therefore, the string variable itself is just a reference that lies on the stack and stores an address. The address points to an instance of the String class located on the heap.
There are several ways to calculate how many bytes a string object takes on the heap: the version by John Skeet and the version by Timur Guev (the last article is in Russian). In the picture above, I used the second option. Even if this formula is not 100 % true, we can still estimate the size of string objects. For example, about 4.7 million lines (each is 100 characters long) are enough to take up 1 GB of RAM. Let's say there's a large number of duplicates among the strings in a program. So, it's just worth using the interning functionality built into the framework. Now, why don't we briefly recap what is string interning?
The idea of string interning is to store only one instance of the String type in memory for identical strings. When running an app, the virtual machine creates an internal hash table, called the interning table (sometimes it is called String Pool). This table stores references to each unique string literal declared in the program. In addition, using the two methods described below, we can get and add references to string objects to this table by ourselves. If an application contains numerous strings (which are often identical), it makes no sense to create a new instance of the String class every time. Instead, you can simply refer to an instance of the String type that has already been created on the heap. To get a reference to it, access the interning table. The virtual machine itself interns all string literals in the code. We may choose one of two methods: String.Intern and String.IsInterned.
The first one takes a string as input. If there's an identical string in the interning table, it returns a reference to an object of the String type that already exists on the heap. If there's no such string in the table, the reference to this string object is added to the interning table. Then, it is returned from the method. The IsInterned method also accepts a string as input and returns a reference from the interning table to an existing object. If there's no such object, null is returned (everyone knows about the non-intuitive return value of this method).
Using interning, we reduce the number of new string objects by working with existing ones through references obtained via the Intern method. Thus, we do not create a large number of new objects. So, we save memory and improve program performance. After all, many string objects, references to which quickly disappear from the stack, can lead to frequent garbage collection. It will negatively affect the overall program performance. Interned strings won't disappear up to the end of the process, even if the references to these objects are no longer in the program. This thing is worth paying attention to. Using interning to reduce memory consumption can produce the opposite effect.
Interning strings can boost performance when comparing these very strings. Let's take a look at the implementation of the String.Equals method:
public bool Equals(String value)
{
if (this == null)
throw new NullReferenceException();
if (value == null)
return false;
if (Object.ReferenceEquals(this, value))
return true;
if (this.Length != value.Length)
return false;
return EqualsHelper(this, value);
}
Before calling the EqualsHelper method, where a character-by-character comparison of strings is performed, the Object.ReferenceEquals method checks for the equality of references. If the strings are interned, the Object.ReferenceEquals method returns true when the strings are equal (without comparing the strings themselves character-by-character). Of course, if the references are not equal, then the EqualsHelper method will be called, and the subsequent character-by-character comparison will occur. After all, the Equals method does not know that we are working with interned strings. Also, if the ReferenceEquals method returns false, we know that the compared strings are different.
If you are sure that the input strings are interned at a specific place in the program, then you can compare them using the Object.ReferenceEquals method. However, it's not the greatest approach. There is always a chance that the code will change in the future. Also, it may be reused in another part of the program. So, non-interned lines can get into it. In this case, when comparing two identical non-interned strings via the ReferenceEquals method, we will assume that they are not identical.
Interning strings for later comparison seems justified only if you plan to compare interned strings quite often. Remember that interning an entire set of strings also takes some time. Therefore, you shouldn't perform it to compare several instances of strings once.
Well, we revised what string interning is. Now, let's move on to the problem I've faced.
In our bug tracker, there was a task created long ago. It required some research on how parallelizing the C++ code analysis can save analysis time. It would be great if the PVS-Studio analyzer worked in parallel on several machines when analyzing a single project. I chose Incredibuild as the software that allows such parallelization. Incredibuild allows you to run different processes in parallel on machines located on the same network. For example, you can parallelize source files compiling on different company machines (or in a cloud). Thus, we save time on the building process. Game developers often use this software.
Well, I started working on this task. At first, I selected a project and analyzed it with PVS-Studio on my machine. Then, I ran the analysis using Incredibuild, parallelizing the analyzer processes on the company's machines. At the end, I summed up the results of such parallelization. So, having positive results, we'll offer our clients such solutions to speed up the analysis.
I chose the Unreal Tournament project. We managed to persuade the programmers to install Incredibuild on their machines. As a result, we had the combined cluster with about 145 cores.
I analyzed the Unreal Tournament project using the compilation monitoring system in PVS-Studio. So, I worked as follows: I ran the CLMonitor.exe program in monitor mode and performed a full build of Unreal Tournament in Visual Studio. Then, after building process, I ran CLMonitor.exe again, but in the analysis launch mode. Depending on the value specified in the PVS-Studio settings for the ThreadCount parameter, CLMonitor.exe simultaneously runs the corresponding number of PVS-Studio.exe child processes at the same time. These processes are engaged in the analysis of each individual source C++ file. One PVS-Studio.exe child process analyzes one source file. After the analysis, it passes the results back to CLMonitor.exe.
Everything is easy: in the PVS-Studio settings, I set the ThreadCount parameter equal to the number of available cores (145). I run the analysis getting ready for 145 processes of PVS-Studio.exe executed in parallel on remote machines. Incredibuild has Build Monitor, a user-friendly parallelization monitoring system. Using it, you can observe the processes running on remote machines. The same I observed in the process of analysis:
It seemed that nothing could be easier. Relax and watch the analysis process. Then simply record its duration with Incredibuild and without. However, in practice, it turned out to be a little bit complicated...
During the analysis, I could switch to other tasks. I also could just meditate looking at PVS-Studio.exe running in the Build Monitor window. As the analysis with Incredibuild ended, I compared its duration with the results of the one without Incredibuild. The difference was significant. However, the overall result could have been better. It was 182 minutes on one machine with 8 threads and 50 minutes using Incredibuild with 145 threads. It turned out that the number of threads increased by 18 times. Meanwhile, the analysis time decreased by only 3.5 times. Finally, I glimpsed the result in the Build Monitor window. Scrolling through the report, I noticed something weird. That's what I saw on the chart:
I noticed that PVS-Studio.exe executed and completed successfully. But then for some reason, the process paused before starting the next one. It happened again and again. Pause after pause. These downtimes led to a noticeable delay and did their bit to prolong the analysis time. At first, I blamed Incredibuild. Probably it performs some kind of internal synchronization and slows down the launch.
I shared the results with my senior colleague. He didn't jump to conclusions. He suggested looking at what's going on inside our CLMonitor.exe app right when downtime appears on the chart. I ran the analysis again. Then, I noticed the first obvious "failure" on the chart. I connected to the CLMonitor.exe process via Visual Studio debugger and paused it. Opening the Threads, my colleague and I saw about 145 suspended threads. Reviewing the places in the code where the execution paused, we saw code lines with similar content:
....
return String.Intern(settings == null ? path
: settings
.TransformToRelative(path.Replace("/", "\\"),
solutionDirectory));
....
analyzedSourceFiles.Add( String.Intern(settings
.TransformPathToRelative(analyzedSourceFilePath,
solutionDirectory))
);
....
What do these lines have in common? Each of them uses the String.Intern method. And it seems justified. Because these are the places where CLMonitor.exe handles data from PVS-Studio.exe processes. Data is written to objects of the ErrorInfo type, which encapsulates information about a potential error found by the analyzer. Also, we internalize quite reasonable things, namely paths to source files. One source file may contain many errors, so it doesn't make sense for ErrorInfo objects to contain different string objects with the same content. It's fair enough to just refer to a single object from the heap.
Without a second thought, I realized that string interning had been applied at the wrong moment. So, here's the situation we observed in the debugger. For some reason, 145 threads were hanging on executing the String.Intern method. Meanwhile, the custom task scheduler LimitedConcurrencyLevelTaskScheduler inside CLMonitor.exe couldn't start a new thread that would later start a new PVS-Studio.exe process. Then, Incredibuild would have already run this process on the remote machine. After all, from the scheduler's point of view, the thread has not yet completed its execution. It performs the transformation of the received data from PVS-Studio.exe in ErrorInfo, followed by string interning. The completion of the PVS-Studio.exe process doesn't mean anything to the thread. The remote machines are idle. The thread is still active. Also, we set the limit of 145 threads, which does not allow the scheduler to start a new one.
A larger value for the ThreadCount parameter would not solve the problem. It would only increase the queue of threads hanging on the execution of the String.Intern method.
We did not want to remove interning at all. It would increase the amount of RAM consumed by CLMonitor.exe. Eventually, we found a fairly simple and elegant solution. We decided to move interning from the thread that runs PVS-Studio.exe to a slightly later place of code execution (in the thread that directly generates the error report).
As my colleague said, we managed to make a very accurate edit of just two lines. Thus, we solved the problem with idle remote machines. So, we ran the analysis again. There were no significant time intervals between PVS-Studio.exe launches. The analysis' time decreased from 50 minutes to 26, that is, almost twice. Now, let's take a look at the overall result that we got using Incredibuild and 145 available cores. The total analysis time decreased by 7 times. It's far better than by 3.5 times.
It's worth noting that once we saw the threads hanging at the places where we call the String.Intern method, we almost instantly thought that under the hood this method has a critical section with some kind of lock. Since each thread can write to the interning table, there must be some synchronization mechanism inside the String.Intern method. It prevents several threads from overwriting each other's data. To confirm my assumptions, we decided to look at the implementation of the String.Intern method on the reference source. We noticed that inside our interning method there had been a call to Thread.GetDomain().GetOrInternString(str) method. Well, take a look at its implementation:
internal extern String GetOrInternString(String str);
Now, it's getting more interesting. This method is imported from some other build. Which one? Since the CLR VM itself does the strings interning, my colleague guided me directly to the .NET runtime repository. After downloading the repository, we went to the CoreCLR solution. We opened it and viewed the entire solution. There we found the GetOrInternString method with the appropriate signature:
STRINGREF *BaseDomain::GetOrInternString(STRINGREF *pString)
So, we saw a call to the GetInternedString method. In the body of this method, we noticed the following code:
....
if (m_StringToEntryHashTable->GetValue(&StringData, &Data, dwHash))
{
STRINGREF *pStrObj = NULL;
pStrObj = ((StringLiteralEntry*)Data)->GetStringObject();
_ASSERTE(!bAddIfNotFound || pStrObj);
return pStrObj;
}
else
{
CrstHolder gch(&(SystemDomain::GetGlobalStringLiteralMap()
->m_HashTableCrstGlobal));
....
// Make sure some other thread has not already added it.
if (!m_StringToEntryHashTable->GetValue(&StringData, &Data))
{
// Insert the handle to the string into the hash table.
m_StringToEntryHashTable->InsertValue(&StringData, (LPVOID)pEntry, FALSE);
}
....
}
....
The execution thread gets into the else branch only if the method that searches for a reference to the String object (the GetValue method) in the interning table returns false. Let's move on to the code in the else branch. Here we are interested in the line where an object of the CrstHolder type named gch is created. Now, we turn to the CrstHolder constructor and see the following code:
inline CrstHolder(CrstBase * pCrst)
: m_pCrst(pCrst)
{
WRAPPER_NO_CONTRACT;
AcquireLock(pCrst);
}
We notice the call to the AcquireLock method. It's getting better. Here's the code of the AcquireLock method:
DEBUG_NOINLINE static void AcquireLock(CrstBase *c)
{
WRAPPER_NO_CONTRACT;
ANNOTATION_SPECIAL_HOLDER_CALLER_NEEDS_DYNAMIC_CONTRACT;
c->Enter();
}
In fact, that's the entry point to the critical section – the call to the Enter method. After I'd read the comment "Acquire the lock", I had no doubts that this method deals with locking. I didn't see much point in diving further into the CoreCLR code. So, we were right. When a new entry is entered into the interning table, the thread enters the critical section, forcing all other threads to wait for the lock to release. Just before calling the m_StringToEntryHashTable->InsertValue method, the object of the CrstHolder type comes out, and therefore the critical section appears.
The lock disappears immediately after we exit the else branch. In this case, the destructor which calls the ReleaseLock method is called for the gch object:
inline ~CrstHolder()
{
WRAPPER_NO_CONTRACT;
ReleaseLock(m_pCrst);
}
When there are few threads, the downtime can be small. But when their number increases, for example to 145 (as happened with Incredibuild), each thread that tries to add a new entry to the internment table temporarily blocks the other 144 threads that also try to add a new entry to it. The results of these locks we observed in the Build Monitor window.
I hope that this case will help you to apply string interning more carefully and thoughtfully, especially in multithreaded code. After all, these locks, adding new records to the internment table, may become a bottleneck, as in our case. It's great that we were able to find out the truth and solve the detected problem. That made the analyzer work faster.
Thank you for reading.
0