The article describes 7 types of metrics and more than 50 their representatives, provides a detailed description and calculation algorithms used. It also touches upon the role of metrics in software development.
The article is the first stage of work carried out by OOO "Program Verification System" in the sphere of creating new specialized tools for calculating metrics [1]. It is a review of the existing metrics which will help you better understand how to solve some tasks. We are planning to develop a methodology of estimating complexity of porting software on other platforms and also of paralleling a program code. This sphere is development of PVS-Studio product's abilities not simply as a static analyzer but as a tool for prognosing man-hours for programmers when mastering 64-bit systems and adapting algorithms to multi-core systems.
This article describes a great range of software metrics. Naturally, it is unreasonable to describe all the existing metrics, because most of them are never used in practice due either to impossibility of using the results, or impossibility of automating calculations, or specialized character of these metrics. But there are metrics which are used rather frequently and it is these metrics that are reviewed further.
In general, using metrics allows managers of projects and businesses to estimate complexity of a project already developed or even only being developed and to estimate the scope of work, stylistics of a program being developed and efforts of every developer participating in creation of a particular solution. However, metrics can only serve as recommendations, one cannot rely fully on them for when developing software, programmers try to minimize or maximize this or that aspect of their program and can refer to various tricks up to reducing efficiency of the program's operation. Besides, if, for example, a programmer has written a few code lines or made some structure alterations it does not mean that he has done nothing, but it can mean that a program defect was too difficult to detect. The last problem, however, can be partly solved when using complexity metrics as it is more difficult to find an error in a more complex program.
First of all, we should consider quantitative characteristics of program source code (because they are simple). The simplest metric is the number of code lines (SLOC). This metric has been originally developed to estimate man-hours for a project. However, for the same functionality both can be split into several lines or written into one line, this metric is nearly impossible to use when dealing with languages in which you can write more than one command in one line. That's why we distinguish logical and physical code lines. Logical code lines are the number of a program's commands. This variant of description has its disadvantages because it greatly depends on the programming language and style being used.
Besides SLOC, there are some other quantitative characteristics:
Sometimes estimate of a program's stylistics (F) is distinguished separately. It consists in splitting a program into n equal fragments and evaluating each fragment by the formula Fi = SIGN (Ncom.i / Ni - 0,1), where Ncom.i is the number of comments in the i fragment, Ni is the general number of code lines in i fragment. The general estimate for the whole program will be evaluated in the following way: F = SUMFi.
Besides, Holstead's metrics are also referred to the group of metrics based on calculation of some units in the program code [2 [RU]]. These metrics are based on the following indices:
n1 - the number of unique operators of a program including separation symbols, names of procedures and operators (operator dictionary),
n2 - the number of unique operands of a program (operand dictionary),
N1 - the general number of operators in a program,
N2 - the general number of operand in a program,
n1' - a theoretical number of unique operators,
n2' - a theoretical number of unique operands.
Taking into consideration the denotations introduced, we can estimate:
n=n1+n2 - the program's dictionary,
N=N1+N2 - the program's length,
N'=n1'+n2' - the program's theoretical dictionary,
N'= n1*log2(n1) + n2*log2(n2) - a theoretical length of the program (for programs which are stylistically correct, deviation of N from N' does not exceed 10%)
V=N*log2n - the program's size,
V'=N'*log2n' - a theoretical size of the program where n* is a theoretical dictionary of the program.
L=V'/V - programming quality level, for an ideal program L=1
L'= (2 n2)/ (n1*N2) - programming quality level based only on the parameters of a real program without taking into consideration theoretical parameters,
EC=V/(L')2 - the program's understanding complexity,
D=1/ L' - labor-intensiveness of the program's coding,
Y' = V/ D2 - expression language level
I=V/D - information content of the program; this characteristic allows you to estimate intellectual efforts on creation of the program
E=N' * log2(n/L) - estimate of necessary intellectual efforts when developing the program characterizing the number of necessary elementary solutions when writing the program
When using Holstead's metrics, disadvantages relating to writing one and the same functionality by a different number of lines and operators, are partly made up.
One more type of software metrics relating to quantitative metrics is Jilb's metrics. They estimate the complexity of software on the basis of saturation of a program with conditional operators or loop operators. Despite its simplicity, this metric shows rather well the complexity of writing and understanding a program; and when adding such an index as the maximum level of nesting conditional and loop operators, efficiency of this metric increases quite significantly.
The next large class of metrics based not on quantitative indices but on analysis of the control graph of the program is called metrics of program control flow complexity.
Before we describe the metrics themselves, we will describe the control graph of a program and means of building it for better understanding.
Let we have a program. We build an oriented graph for this program containing only one input and one output, the nodes of the graph correlate with those code sections of the program which have only sequential calculations and no branch and loop operators, and the arcs correlate with passages from block to block and branches of program execution. The condition of building this graph is that each node can be accessed from the initial one and the final node can be accessed from any other node [4].
The most popular estimate based on analysis of the built graph is cyclomatic complexity of the program (McCabe's cyclomatic number) [3]. It is evaluated by the formula V(G)=e - n + 2p, where e is the number of arcs, n is the number of nodes and p is the number of cohesion components. The number of cohesion components of the graph can be considered as the number of arcs which we need to add into the graph to turn it into a strongly connected graph. A strongly connected graph is a graph whose two any nodes are mutually accessible. For the graphs of correct programs, i.e. graphs which have no sections inaccessible from the input point and dangling input and output points, a strongly connected graph is usually built by linking the node representing the end of the program and the node representing the input into the program with an arc. Essentially, V(G) evaluates the number of linearly independent loops in a strongly connected graph. So, p=1 in correctly written programs and that's why the formula of estimating cyclomatic complexity turns into:
V(G)=e - n + 2.
Unfortunately, this estimate cannot distinguish cyclomatic and conditional constructions. One more essential disadvantage of this approach is that programs represented by the same graphs can have predicates of absolutely different complexities (a predicate is a logical expression containing at least one variable).
To correct this defect G. Mayers developed a new methodology. He suggested to take as an estimate the interval (this estimate is also called an interval estimate) [V(G),V(G)+h], where h equals zero for simple predicates and n-1 for n-argument ones. This method allows you to distinguish between predicates which differ in complexity but it is used very rarely in practice.
There is one more modification of McCabe's method - Hansen's method. In this case, the measure of a program's complexity is represented by a pair (cyclomatic complexity, the number of operators). The advantage of this measure is its sensibility to software's structuring.
Chen's topological measure expresses a program's complexity through the number of intersections of bounds between the fields formed by the program's graph. This approach can be used only in case of structured programs allowing only sequential connection of control constructions. In case of unstructured programs, Chen's measure greatly depends on conditional and non-conditional transfers. In this case we can define the high bound and the low bound of the measure. The high bound is m+1, where m is the number of logical operators mutually nested. The low bound equals 2. When the control graph of the program has only one cohesion component, Chen's measure coincides with McCabe's cyclomatic measure.
To continue the topic of analysis of a program's control graph we can mention one more subgroup of metrics - Harrison's metrics and Mayjeal's metrics.
These measures take into account the program's nesting and length levels.
Each node is assigned its complexity in accordance with the operator it represents. This initial complexity of a node can be estimated by any method including Holstead's metrics. For each predicate node let's assign a subgraph spawned by the nodes serving as the ends of the arcs stretching from it and also by the nodes accessible from each of these nodes (the low bound of the subgraph) and the nodes lying in the paths between the predicate node and some low bound. This subgraph is called the influence area of the predicate node.
Reduced complexity of the predicate node is a sum of initial and reduced complexities of the nodes included into its influence area plus initial complexity of the predicate node itself.
Functional measure (SCOPE) of a program is a sum of reduced complexities of all the nodes of the control graph.
Functional relation (SCORT) is the ratio of the number of nodes in the control graph to its functional complexity, terminal nodes being excluded from the number of nodes.
SCORT can take different values for graphs with the same cyclomatic number.
Pivovarovskiy's metric is another modification of cyclomatic complexity's measure. It allows you to trace differences not only between sequential and nested control constructions but between structured and non-structured programs as well. It is expressed by the relation N(G) = v *(G) + SUMPi , where v *(G) is a modified cyclomatic complexity calculated in the same way as V(G) but with the difference that CASE operator with n outputs is considered to be one logical operator and not n-1operators.
Pi is the nesting level of i predicate node. To estimate the nesting level of predicate nodes the number of "influence areas" is used. By a nesting level we understand the number of all the "influence areas" of predicates which either are fully included into the area of the node under consideration or intersect with it. The nesting level increases due to nesting of "influence areas" and not predicates themselves. Pivovarovskiy's measure increases when passing on from sequential programs to nested and further on to non-structured ones. It is a great advantage of this measure over many other measures of this group.
Woodword's measure is the number of intersections of the control graph's arcs. As such intersections should not be in a well structured program, this metric is used mostly in weakly structured languages (Assembler, Fortran). An intersection point appears when control excesses the limits of the two nodes serving as sequential operators.
Boundary value method is also based on analysis of a program's control graph. To define this method we need to introduce some additional notions.
Let G be the control graph of a program with a single initial and a single final nodes.
In this graph, the number of incoming arcs is called a negative degree of the node and the number of outgoing arcs - a positive degree of the node. So the set of the graph's nodes can be divided into two groups: nodes whose positive degree <=1 and nodes whose positive degree >=2.
The nodes of the first group are called receiving nodes and the nodes of the second - sampling nodes.
Each receiving node has a reduced complexity equaling 1 except for the final node whose reduced complexity equals 0. Reduced complexities of all the nodes of G graph are summed and present the absolute boundary complexity of the program. After that the relative boundary complexity is estimated:
S0=1-(v-1)/Sa,
where S0 is the relative boundary complexity of the program, Sa is the absolute boundary complexity of the program and v is the general number of the nodes of the program's graph.
There is also Shneidewind's metric expressed through the number of possible paths in the control graph.
The next type of metrics is metrics of data control flow's complexity.
Chepin's metric: this method consists in estimating the information strength of a separate program module with the help of examining the way the variables from the input-output list are used.
The whole set of variables comprising the input-output list is divided into 4 functional groups:
1. P - input variables for calculations and output,
2. M - modified (or created inside the program) variables,
3. C - variables participating in controlling the program module's operation (control variables),
4. T - variables not used in the program ("parasite" variables).
As each variable can simultaneously perform several functions we should consider it in each corresponding functional group.
Chepin's metrics:
Q = a1*P + a2*M + a3*C + a4*T,
where a1, a2, a3, a4 are weighting ratios.
Weighting ratios are used to reflect different influence of each functional group on the program's complexity. As the metric's author supposes, the functional group C has the largest weight (3) because it influences the control flow of the program. Weighting ratios of the other groups are arranged as follows: a1=1, a2=2, a4=0.5. The weighting ratio of T group does not equal 0 as "parasite" variables do not increase the complexity of the program's data flow but sometimes make it more difficult to understand. Taking into account weighting ratios:
Q = P + 2M + 3C + 0.5T
Span metric is based on localization of data calls inside each program section. Span is the number of statements containing this identifier between its first and last appearances in the program text. Consequently, the identifier that appeared n times has span n-1. The larger the span, the more complex testing and debugging.
One more metric considering the complexity of data flow is metrics binding complexity of programs with calls to global variables.
The pair "module - global variable" is identified as (p,r), where p is a module possessing access to the global variable r. Depending on presence or absence of a real call to r variable in the program there are two types of pairs "module - global variable": actual and possible. A possible call to r with the help of p shows that the existence domain of r includes p.
This characteristic is identified as Aup and shows how many times Up modules had actual access to global variables while Pup number shows how many times they could get this access.
Ratio of the number of actual calls to possible ones is estimated in this way:
Rup = Aup/Pup.
This formula shows an approximate probability of linking of an arbitrary module to an arbitrary global variable. Evidently, the higher this probability, the higher the probability of "unauthorized" change of some variable what can make it more difficult to modify the program.
On the basis of the concept of information flows Kafur's measure has been created. To use this measure we should introduce the notions of local and global flows: the local information flow from A to B exists if:
1. A module calls B module (direct local flow)
2. B module calls A module and A returns into B the value which is used in B (indirect local flow)
3. C module calls modules A and B and transfers the result of A module's execution into B.
Further, we should introduce the notion of a global information flow: a global information flow from A into B through the global data structure D exists if A module puts information into D and B module uses information from D.
On the basis of these notions I value is introduced - it is information complexity of a procedure:
I = length * (fan_in * fan_out)2
Here:
length is complexity of the procedure's text (it is measured with the help of some size metrics like Holstead's, McCabe's, LOC metrics etc)
fan_in is the number of local flows incoming into the procedure plus the number of data structures from which the procedure gets information
fan_out is the number of local flows outgoing from the procedure plus the number of data structures which are updated by the procedure
We can define information complexity of a module as a sum of information complexities of the procedures included into it.
The next step is to consider information complexity of a module in relation to some data structure. Measure of information complexity of a module in relation to a data structure:
J = W * R + W * RW + RW *R + RW * (RW - 1)
W is the number of subprocedures which only update the data structure;
R only read information from the data structure;
RW both read and update information in the data structure.
One more measure of this group is Oviedo's measure. It consists in splitting the program into linear non-intersecting sections - rays of operators which comprise the control graph of the program.
The author of the metrics proceeds from the following suggestions: for a programmer it is easier to find the relation between defining and using occurrences of a variable than between the rays; the number of defining occurrences in each ray is more important than the general number of using occurrences of variables in each ray.
Let's define by R(i) a set of defining occurrences of variables which are situated within the range of i ray (a defining occurrence of a variable is considered to be within the range of a ray if the variable is either local in it and has a defining occurrence or has a defining occurrence in some previous ray and no local path definition). Let's define by V(i) a set of variables whose using occurrences are already presented in i ray. Then the complexity measure of i ray is defined in this way:
DF(i)=SUM(DEF(vj)), j=i...||V(i)||
where DEF(vj) is the number of defining occurrences of vj variable from the set R(i), and ||V(i)|| is the potency of V(i) set.
The fourth class of metrics is metrics close both to the class of quantitative metrics, metrics of program control flow's complexity and the class of metrics of data flow' complexity (actually, this class of metrics and the class of program control flow's metrics refer to the same class - topological metrics, but it is reasonable to distinguish them here to make it clearer for understanding). This class of metrics estimates complexity of a program's structure both on the basis of quantitative calculations and analysis of control structures.
The first such metric is testing M-measure [4]. Testing M-measure is a complexity measure meeting the following conditions:
The measure increases together with the nesting level and considers the length of the program. The measure based on regular nestings is close to the testing measure. This measure of program complexity consists in calculating the total number of symbols (operands, operators, brackets) in a regular expression with the minimum number of brackets defining the program's control graph. All the measures of this group are sensible to the nesting level of control constructions and program's length. But estimate becomes more labor-intensive.
Besides, program modules' cohesion also serves as a measure of software quality [5]. This measure cannot be expressed numerically. The types of module cohesion:
Data cohesion - modules interact through parameter transfer and each parameter is an elementary information object. This is the most preferable type of cohesion.
Data structure cohesion - one module sends a composite information object (a structure) to the other for the purpose of data transfer.
Control cohesion - one module sends an information object to the other - a flag intended for controlling its inner logic.
The common coupling of modules takes place when they refer to one and the same area of global data. Common coupling is not desirable because, firstly, an error in the module using the global area can occur unexpectedly in any other module; secondly, such programs are difficult to understand for it is difficult for a programmer to find out what data exactly are used by a particular module.
Content cohesion - one of the modules refers inside the other. This is an illegal cohesion for it contradicts the principle of modularity, i.e. presentation of a module as a blackbox.
External cohesion - two modules use external data, for example a communication protocol.
Message cohesion - the loosest type of cohesion: the modules are not connected directly but interact through messages having no parameters.
Absence of cohesion - the modules do not interact.
Subclass cohesion - a relation between the parent class and the descendant class, the descendant is related to the parent while the parent is not related to the descendant.
Time cohesion - two actions are grouped in one module only because they take place simultaneously due to some circumstances.
One more measure concerning stability of a module is Kolofello's measure [6]. It can be defined as the number of changes we must introduce into the modules different from the module whose stability is being checked, and the changes must concern the module being checked.
The next metric of this class is McCloore's metric. There are three steps of estimating this metric:
1. For each control variable i the value of its complexity function is calculated by this formula: C(i) = (D(i) * J(i))/n,
where D(i) is the value estimating the influence area of i variable. J(i) is a measure of complexity of modules' interaction through i variable, n is the number of separate modules in the partitioning scheme.
2. For all the modules included into the partitioning area the values of their complexity function M(P) is estimated by the formula M(P) = fp * X(P) + gp * Y(P)
where fp and gp are correspondingly the numbers of modules directly preceding or directly following P module and X(P) is the complexity of addressing to P module,
Y(P) is complexity of control of calling other modules from P module.
3. General complexity of MP hierarchical partitioning scheme of dividing a program into modules is defined by the formula:
MP = SUM(M(P)) by all the possible values of P, i.e. modules of the program.
This metric is meant for programs which are well structured and composed from hierarchical modules defining functional specification and control structure. It is also supposed that each module have one input point and one output point, the module performs only one function and module call is performed in accordance with the hierarchical control system which defines relation of the call at the set of the program modules.
There is also a metric based on information concept - Berlinger's metric [7]. Complexity measure is estimated in this way: M=SUMfi*log2pi, where fi is the frequency of i symbol's appearance, pi is the probability of its appearance.
The disadvantage of this metric is that a program containing many unique symbols but in a small amount will be of the same complexity as a program containing few unique symbols but in a larger amount.
Due to development of object-oriented programming languages a new class of metrics has appeared. They are correspondingly called object-oriented metrics. The most frequently used metrics in this group are sets of Martin's metrics and Chidamber and Kemerer's metrics. Let's study the first subgroup.
Before we discuss Martin's metrics we need to introduce the notion of a class category. Actually, a class can be rather rarely used again separately from the other classes. Nearly every class has a group of classes with which it cooperates and from which it cannot be easily separated. To use such classes once again you need to use the whole class group. Such a class group is strongly connected and is called a class category. A class category exists in the following conditions:
The classes within the class category are closed to any changes all together. It means that if one class changes, all the other classes in this category are likely to change too. If any of the classes is open to some changes all of them are open to these changes as well.
The classes in the category are used again only together. They are so much interdependent that cannot be separated. Thus, if you try to use one class in the category once again, all the other classes must be used together with it.
The classes in a category share some common function and achieve some common goal.
Responsibility, independence and stability of a category can be estimated by calculating dependencies interacting with this category. There are three metrics:
1. Ca :Centripetal cohesion. The number of classes outside this category which depend on the classes inside this category.
2. Ce: Centrifugal cohesion. The number of classes inside this category which depend on the classes outside this category.
3. I: Instability: I = Ce / (Ca+Ce). This metric has the range [0,1].
I = 0 denotes a maximum stable category.
I = 1 denotes a maximum instable category.
You can define the metric estimating abstractness of a category (a category is abstract if it is rather flexible and can be easily extended) in the following way:
A: Abstractness: A = nA / nAll.
nA - number_of_abstract_classes_in_the_category.
nAll - general number_of_classes_in_the_category.
Values of this metrics vary within the range [0,1].
0 = the category is absolutely concrete,
1 = the category is absolutely abstract.
Now on the basis of Martin's metrics described above we can build a graph reflecting dependence between abstractness and instability. If we draw a straight line on it defined by the formula I+A=1, there will be the categories lying on this line which have the best balance between abstractness and instability. This straight line is called the main sequence.
Further, we can introduce two more metrics:
Main sequence distance: D=|(A+I-1)/sqrt(2)|
Main sequence normalized distance: Dn=|A+I-2|
It is true nearly for any category that the nearer they are to the main sequence, the better.
The next subgroup of metrics is Chidamber and Kemerer's metrics [9]. These metrics are based on analysis of methods of a class, an inheritance tree etc.
WMC (Weighted methods per class), total complexity of all the class methods: WMC=SUMci, i=1...n, where ci is complexity of i method estimated by any metrics (Holstead's metrics etc, depending on the criteria we are interested in). If all the methods have the same complexity, WMC=n.
DIT (Depth of Inheritance tree) - the depth of the inheritance tree (the longest path through the class hierarchy from the parent class to the class observed). The longer the path, the better, for abstractness of data increases together with the depth while saturation of the class with methods decreases, but it gets very difficult to understand and write the program at rather a large depth.
NOC (Number of children) - the number of descendants (direct). The more children, the higher data abstractness.
CBO (Coupling between object classes) - coupling between classes. It shows the number of classes with which the original class is connected. All the statements given above for module cohesion are true for this metric, that is, at high CBO data abstractness decreases and it is more difficult to use the class again.
RFC (Response for a class) - RFC=|RS|, where RS is a response set of the class, i.e. a set of methods which can be potentially called by a class method in response to data received by the class object. I.e. RS=(({M}({Ri}), i=1...n , where M is all the possible class methods, Ri is all the possible methods which can be called by i class. Then RFC is potency of this set. The higher RFC, the more complicated testing and debugging.
LCOM (Lack of cohesion in Methods) - lack of methods' cohesion. To define this parameter let's consider C class with methods M1, M2, ... ,Mn, then {I1},{I2},...,{In} are sets of variables used in these methods. Now let's define P - it is a set of pairs of methods which have no shared variable. Then LCOM=|P|-|Q|. Lack of cohesion can signal that the class can be divided into several other classes or subclasses, so it is better to increase cohesion in order to increase encapsulation of data and reduce complexity of classes and methods.
The next type of metrics is metrics close to quantitative ones but based on the number of errors and defects in a program. There is no sense in examining peculiarities of each of these metrics, it is enough just to enumerate them: the number of structure alterations, made since the moment of the previous test; the number of errors detected during code review; the number of errors detected during program testing and the number of necessary structure alterations needed for correct operation of the program. For most projects, these indices are considered at one thousand code lines, i.e. the average number of defects for a thousand code lines.
In conclusion we should mention also one more class of metrics called hybrid. The metrics of this class are based on simpler metrics and are their weighted total. The first representative of this class is Cocol's metric. It is defined in the following way:
H_M = (M + R1 * M(M1) + ... + Rn * M(Mn)/(1 + R1 + ... + Rn)
where M is the base metrics, Mi is other metrics we are interested in, Ri is a correctly selected ratios and M(Mi) is functions.
Functions M(Mi) and ratios Ri are calculated with the help of regressive analysis and task analysis for a particular program.
As the result of investigation, the author of the metrics singled out three models for measures: McCabe's, Holstead's and SLOC, where Holstead's measure is used as the base metrics. These models were called "the best", "random" and "linear".
Zolnovskiy, Simmons and Tayer's metrics is also a weighted total of different indices. There are two variants of this metrics:
(structure, interaction, size, data) SUM(a, b, c, d).
(interface complexity, computational complexity, input/output complexity, readability) SUM(x, y, z, p).
The metrics used are selected in each case depending on the task, and ratios - depending on the role the metrics plays in making a decision in this case.
To sum it up, I would like to note that there is no universal metric. Any controllable metric characteristics of a program must be controlled depending either on each other, or on a particular task. Besides, you can use hybrid metrics but they also depend on simpler metrics and cannot be universal as well. Strictly speaking, any metric is only an index which depends greatly on the language and style of programming, that's why you should not absolutize any metric and make your decisions taking into account only this metric.