Oct 12 2011

Sequence point

Oct 12 2011

A sequence point in programming is any point in a program where it is guaranteed that the side effects of all the previous calculations have already emerged while there are no side effects of the following calculations yet.

Sequence points are often mentioned when speaking of C and C++ languages since in these languages it is especially simple to write an expression whose value might depend on an undefined order of side effects' emergence. Adding one or several sequence points determines the order more strictly and is one of the methods for achieving a stable (i.e. correct) result.

Sequence points are necessary when one and the same variable is modified more than once in a single expression. The expression i=i++ is often given as an example in which the 'i' variable is being assigned to and incremented at the same time. What value will the 'i' variable posses? The language standard must either define one of the possible program behaviors as the only correct one, define a range of behaviors which are correct, or specify that the program's behavior is completely undefined in such a case. In C and C++, calculation of the i=i++ expression causes an undefined behavior since this expression does not contain any sequence points inside.

The following sequence points are defined in C and C++:

Between evaluations of the left and the right operands in operators && (logical AND), || (logical OR) and comma operators. For instance, in the expression *p++ != 0 && *q++ != 0, all the side effects of the left operand *p++ != 0 will emerge before any calculations begin in the right operand.
Between evaluations of the first, the second or the third operands in the condition operator. In the string a = (*p++) ? (*p++) : 0, the sequence point is located after the first operand *p++. When the second expression is being calculated, the p variable is already incremented by 1.
At the end of the whole expression. This category includes directive expressions (a=b;), expressions in 'return' directives, control expressions in parentheses belonging to 'if' or 'switch' conditional directives and 'while' or 'do-while' loops, and all of the three expressions within parentheses of the 'for' loop.
Before entering an evoked function. The order of arguments' evaluation is not defined, but this sequence point guarantees that all of its side effects will emerge before entering the function. In the expression f(i++) + g(j++) + h(k++), each one of the three variables i, j and k takes a new value before entering f, g and h correspondingly. However, the order in which the functions f(), g(), h() are called is not defined, consequently the order of i's, j's and k's increments is not defined either. Values j and k inside the body of the f function turn out to be undefined. Note that a call of the function f(a,b,c) with several arguments is not considered to be a case of using a comma operator and does not determine the order of arguments' values calculation.
When returning from a function, at the point when the returned value is copied into the calling context. (It has a definite description only in the C++ standard unlike C)
In declaration containing the initialization, at the moment of completing calculation of the initializing value, for example, at the moment of completing calculation of the (1+i++) in the int a = (1+i++) expression;.
In C++ overloaded operators act as functions, that's why a call of an overloaded operator is a sequence point.

Now let's examine several examples causing undefined behavior:

int i, j;
...
X[i]=++i;
X[i++] = i;
j = i + X[++i];
i = 6 + i++ + 2000;
j = i++ + ++i;
i = ++i + ++i;

In all these cases you cannot predict the calculations' results. Of course, these samples are artificial and you can see the danger at the first glance, so consider a code fragment found by the PVS-Studio analyzer in a real-life application:

while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
         Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31]))
{}
return (m_nCurrentBitIndex - BitInitial - 1);

The compiler could calculate any (left or right) argument of the '&' operator first. It means that the m_nCurrentBitIndex variable will or will not be incremented by one when calculating "m_pBitArray[m_nCurrentBitIndex >> 5]".

This code can work correctly for a long time. But you should keep in mind that its correct operation is guaranteed only until it is built with a particular compiler's version with an invariable set of compilation parameters. This is the correct code:

while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
         Powers_of_Two_Reversed[m_nCurrentBitIndex & 31]))
{ ++m_nCurrentBitIndex; }
return (m_nCurrentBitIndex - BitInitial);

This code doesn't contain ambiguities anymore. At the same time we got rid of the magic constant "-1".

Programmers often think that undefined behavior might occur only when using postincrement, while preincrement is safe. It's not true. Consider an example from a discussion on this topic.

Question:

I downloaded a demo version of PVS-Studio, ran it on my project and got the following warning: V567 Undefined behavior. The 'i_acc' variable is modified while being used twice between sequence points.

Code

i_acc = (++i_acc) % N_acc;

It seems to me there's no undefined behavior here because the i_acc variable doesn't participate in the expression twice.

Answer:

There is undefined behavior here. It's another thing that the probability of error occurrence is very low in this case. The '=' operator is not a sequence point. It means that the compiler might place the i_acc variable's value into the register first and then increment it in the register. After that it will calculate the expression and write the result into the i_acc variable. Then it will again write the incremented value from the register into the variable. The resulting code will look like this:

REG = i_acc;
REG++;
i_acc = (REG) % N_acc;
i_acc = REG;

The compiler has a full right to do this. Of course, in practice it most likely will increment the variable at once and everything will work as the programmer expects. But you should not rely on that.