Jul 15 2013

Lines of code

Jul 15 2013

Physical and logical lines
Counting example
Number of lines and program characteristics
Conclusion

Lines of code is a metric that is commonly used to measure the size and complexity of a software project. It is used for estimating effort during project planning, estimating time during development, and estimating labor productivity after project completion.

Physical and logical lines

There are two common techniques for counting lines of code: counting the number of "physical" lines and counting the number of "logical" lines. Note that these terms are not definitive, and the nuances of their meanings may vary in specific cases. In general, the number of "physical" lines is usually equal to the number of source code lines, including comments and possibly even empty lines. "Logical" line counting attempts to count executable expressions (operators, functions, etc.), but the definitions of such expressions vary from one programming language to another.

Hence, the advantages of both approaches are that the number of "physical" lines is easier to determine. However, there are disadvantages: it strongly depends on the coding style and formatting of the source code. "Logical" lines have no such disadvantage, but the number of lines is rather difficult to count.

Counting example

Let's look at the following code fragment:

for (i=0; i<100; ++i) printf("%d bottles of beer on the wall\n");
// How many LOCs are here?

In this case, the code contains two physical lines, two logical lines (the for loop statement and the printf function call statement), and one comment line.

If we change the code formatting, we get five physical lines of code, but we still have the same two logical lines of code, and one comment line:

for (i=0; i<100; ++i)
{
    printf("%d bottles of beer on the wall\n ");
}
// How many LOCs are here?

Number of lines and program characteristics

The number of code lines obviously relates to the system complexity: the more code there is, the more complex it is. For example, the Windows NT 3.1 operating system core is estimated at 4-5 million lines, while Windows XP is 45 million. The number of code lines in the Linux core is 5.6 million in version 2.6 and 15.9 million in version 3.6.

However, when it comes to quality and safety, things are not so simple. In the real world, all programs contain errors, and it is likely that the bigger the program is, the more errors it has. This is pretty obvious, if you introduce a "number of errors to code" factor (even if it is constant), the absolute number of errors increases as the program grows. However, intuition tells us that as the code grows, the number of errors increases due to the increasing system complexity. It's not just our intuition (see the diagram: "typical error density"). Similar ideas underlie design principles such as KISS, DRY, and SOLID. To support the idea, here is a quote from E. Dijkstra: "Simplicity is prerequisite for reliability". And a paragraph from his work "The Fruits of Misunderstanding":

...Yet people talk about programming as if it were a production process and measure "programmer productivity" in terms of "number of lines of code produced". In so doing they book that number on the wrong side of the ledger: we should always refer to "the number of lines of code spent".

Conclusion

As the number of code lines in a program increases, the code becomes more complex, and the number of errors increases as well. Unfortunately (or perhaps fortunately), technological progress is inevitable. Systems will continue to grow in complexity, requiring more and more resources to find and fix bugs (and, of course, new bugs will emerge). So, the use of static analysis and special tools can help reduce errors and increase the efficiency of the entire development process.

Resources