
Principal Components of Orthogonal Object-Oriented Metrics (323-08-14)
White Paper Analyzing Results of NASA Object-Oriented Data
Submitted By: Victor Laing
And
Charles Coleman, Manager, SATC
October 12, 2001
|
Technical POC: Dr. Linda Rosenberg |
Administrative POC: Dennis Brennan |
|
|
|
|
Phone #: 301-286-0087 |
Phone #: 301-286-6582 |
|
Fax #: 301-286-1701 |
Fax #: 301-286-1667 |
|
Email: Dennis.Brennan@gsfc.nasa.gov |
|
|
Mail Code: 304 |
Mail Code: 300 |
This report presents the results of Task 323-08-14, Principal Components
of Orthogonal Object-Oriented Metrics, performed by the Software Assurance
Technology Center (SATC) at NASA Goddard Space Flight Center. The task developed an approach to
formulating a set of Orthogonal Object-Oriented metrics. The research is intended to find a way to
produce cheaper and higher quality software. The Orthogonal Object-Oriented set
of metrics will be selected from the core set of metrics that the SATC uses for
code analysis.
The development
of a large software system is a time and resource-consuming activity. Even with
the increasing automation of software development activities, resources are
still scarce. There is also a great
interest in software metrics due to their potential for use as a cost saving
device.
This paper presents the research results of the Software Assurance
Technology Center (SATC) at NASA Goddard Space Flight Center, to develop an
approach to formulating a set of Orthogonal Object-Oriented metrics. The research is intended to find a way to
produce cheaper and higher quality software. The Orthogonal Object-Oriented set
of metrics will be selected from the core set of metrics that the SATC uses for
code analysis.
The set of Orthogonal Object-Oriented metrics obtained in the study was
applied to three real world industrial strength object-oriented systems to
predict their overall quality. The
level of quality found in these three systems is classified into three types: low, medium, and high. The classification obtained for the systems
using the Orthogonal Object-Oriented metrics set were validated with the SATC
approach used for their code analysis for identifying code quality.
Table of
Contents
2.3 Orthogonal Object-Oriented Metrics
3.1 Overview of the Object-Oriented
Paradigm
3.2 A Reduced CK Metrics Suite
3.3 Applying the Reduced Metrics Suite
4.1 Descriptions of the Applications
4.2 Definitions for Statistical
Analysis
5. Conclusions and Future Work
Table 1:
Summary of Applications used to Validate Reduced CK Metrics Set
Table
2: Descriptive Statistics for System A
Table
3: Correlation Analysis for System A
Table
4: Descriptive Statistics for System B
Table
5: Correlation Analysis for System B
Table
6: Descriptive Statistics for System C
Table
7: Correlation Analysis for System C
Table
8: Correlation Analysis Summary
Table
9: Regression Analysis Summary
Figure
1: Distributions of the Reduced CK
Metrics Set for System A
Figure
2: Distributions of the Reduced CK
Metrics Set for System B
Figure
3: Distributions of the Reduced CK
Metrics Set for System
The Software Assurance Technology Center (SATC) at NASA Goddard Space Flight Center is currently conducting research on an approach to formulating a set of Orthogonal Object-Oriented metrics. The research is intended to find a way to produce cheaper and higher quality software. It was determined by the SATC that there is a lack of research being conducted in the area of Orthogonal Object-Oriented metrics. The potential benefit of this research can be applied to both NASA and industry.
Object-Oriented Programming (OOP) is a programming paradigm that is based on abstractions of object types in the application [HUD]. The key difference between object-oriented programming and structured programming is that the former identifies the object types in the applications, while the latter models the applications as a set of functions.
There is a general shift in the industry from the structured (traditional) programming and development environment to an object-oriented paradigm, and NASA is no exception in adhering to this shift. If organizations wish to make a successful change, they need the appropriate metrics for this new paradigm. One such metrics suite was proposed by Chidamber and Kemerer [CHI].
There is also a great deal of interest in software metrics due to their potential for use in procedures to control cost of system development and maintenance activities [TEG]. However, metrics programs are sometimes viewed as being costly with little return on investment. One way of reducing the cost of a metrics program, increasing efficiency, and decreasing the intrusiveness caused by the metrics program is to measure “less”. This is feasible only if we can obtain at least the same level of accuracy with the reduced information obtained by measuring “less”. A good candidate for accomplishing this task is Orthogonal Object-Oriented metrics, if one is working in an object-oriented environment.
The set of Orthogonal Object-Oriented metrics proposed in this paper is designed to evaluate key features of object-oriented design such as encapsulation, inheritance, and polymorphism. This evaluation is used to classify the quality level of object-oriented software systems.
The remainder of the paper is organized as follows: Section 2 first provides an overview of both traditional and object-oriented metrics. This section also defines and discusses the concept of Orthogonal Object-Oriented metrics. The theoretical framework for the Orthogonal Object-Oriented metrics set is developed in Section 3. An empirical investigation is conducted on three industrial strength object-oriented metrics to validate the metrics set in Section 4. Our conclusions from this empirical study and the research results are given in Section 5. Future research directions are also presented in this section.
Traditional and object-oriented programming are fundamentally different and therefore different metrics are needed for their evaluation. According to Moreau [MOR87], [MOR89], [MOR90], traditional metrics are inappropriate for object-oriented systems. However, research conducted by Tegarden [TEG] has shown that a combination of both traditional and object-oriented metrics may give the best results when analyzing object-oriented systems with respect to their overall quality.
In Subsection 2.1 we define three traditional metrics, that are popular with practitioners and researchers. The Chidamber and Kemerer (CK) metrics suite for object-oriented design are given in Subsection 2.2. Subsection 2.3 defines and explains the concept of Orthogonal Object-Oriented metrics. The SATC uses most of the metrics listed below or a modification of them (see [ROS95], [ROS97], [ROS98]).
Traditional metrics have been applied to the measurement of software complexity of structured systems since 1976 [MCC76]. This subsection presents the McCabe Cyclomatic Complexity metric along with two other popular traditional software design metrics, Source Lines of Code and Comment Percentage.
McCabe Cyclomatic Complexity (CC): Cyclomatic complexity is a measure of a module control flow complexity based on graph theory [MCC99]. Cyclomatic complexity of a module uses control structures to create a control flow matrix, which in turn is used to generate a connected graph. The graph represents the control paths through the module. The complexity of the graph is the complexity of the module [MCC76], [MCC99]. Fundamentally, the CC of a module is roughly equivalent to the number of decision points and is a measure of the minimum number of test cases that would be required to cover all execution paths. A high cyclomatic complexity indicates that the code may be of low quality and difficult to test and maintain.
Source Lines of Code (SLOC): The SLOC metric measures the number of physical lines of active code, that is, no blank or commented lines code [LOR94]. Counting the SLOC is one of the earliest and easiest approaches to measuring complexity. It is also the most criticized approach [TEG]. In general the higher the SLOC in a module the less understandable and maintainable the module is.
Comment Percentage (CP): The CP metric is defined as the number of commented lines of code divided by the number of non-blank lines of code. Usually 20% indicates adequate commenting for C or Fortran code [ROS95]. A high CP value facilitates in maintaining a system.
In 1994 Chidamber and Kemerer [CHI] proposed a now widely accepted suite of metrics for an object-oriented system. Basili validated the metrics suite in 1996 [BAS] and Tang in 1999 [TAN]. The six object-oriented metrics are listed below.
Weighted Methods Per Class (WMC): WMC measures the complexity of an individual class. Two different approaches are used to calculate the WMC metric. The first uses the sum of the complexity of each method contained in the class. The second approach assigns a complexity of 1 for each method in the class and then sums the result. This is equivalent to using the number of methods per class as a measure for WMC [CHI]. The number of methods and complexity of methods involved is a direct predictor of how much time and effort is required to develop and maintain the class.
Depth of Inheritance Tree of a Class (DIT): DIT is defined as the length of the longest path of inheritance ending at the current module [CHI]. In cases involving multiple inheritances, the DIT will be the maximum length from the node to the root of the tree [CHI]. The deeper the inheritance tree for a class, the harder it might be to predict its behavior due to the interaction between the inherited features and new features. However, the deeper a particular class is in the hierarchy, the greater the potential for reuse of inherited methods.
Number of Children (NOC): NOC represents the number of immediate subclasses subordinated to a class in the class hierarchy [CHI]. A moderate value for NOC indicates scope for reuse and high values may indicate an inappropriate abstraction in the design. Classes with a large number of children have to provide more generic service to all the children in various contexts and must be more flexible, a constraint that can introduce more complexity into the parent class.
Coupling Between Objects (CBO): CBO is defined as the count of the number of other classes to which it is coupled [CHI]. A class is coupled to another class if it uses the member method and/or instance variables of the other class. Excessive coupling indicates weakness of class encapsulation and may inhibit reuse. High coupling also indicates that more faults may be introduced due to inter-class activities.
Response for a Class (RFC): RFC gives the number of methods that can potentially be executed in response to a message received by an object of that class [CHI]. If a large number of methods can be invoked in response to a message, the testing and debugging of the class becomes more complicated since it requires a greater level of understanding required on the part of the tester.
Lack of Cohesion in Methods (LOCM): LOCM counts the number of method pairs whose similarity is 0 minus the count of method pairs whose similarity is not zero. The larger the number of similar methods in a class the more cohesive the class is [CHI]. Cohesiveness of methods within a class is desirable, since it promotes encapsulation and lack of cohesion implies classes should probably be split into two or more subclasses.
2.3 Orthogonal Object-Oriented Metrics
This subsection describes what it means for two or more object-oriented metrics to be orthogonal. The main focus of this study is to produce a minimal set of Orthogonal Object-Oriented metrics capable of analyzing code quality with the same degree of accuracy as afforded by a metrics set of a significantly larger cardinality.
Orthogonal: Orthogonality is a measure of intrinsically different characteristics of the code, therefore any correlation among the measured values is due to relationships among the target modules and not due to any relationships among the actual metrics themselves. For example, lines of code and number of comments are said to be non-orthogonal since adding comments simultaneously increases lines of code. However, source lines of code and number of comments are said to be orthogonal since source lines of code can be increased without any changes in the comment count.
Orthogonal Object-Oriented (OOO): Two object-oriented metrics are said to be Orthogonal Object-Oriented metrics if they are orthogonal.
This section provides the foundation and justification of a theoretical framework for developing an Orthogonal Object-Oriented metrics suite. Subsection 3.1 gives an overview of the more important aspects of the object-oriented paradigm. The CK suite of object-oriented metrics is reduced to a single equation and one standalone metric in Subsection 3.2. A discussion is presented in Subsection 3.3 on how to apply this reduced CK metrics suite.
3.1 Overview of the Object-Oriented Paradigm
Object-oriented modeling and design is a way of thinking about problems using models organized around real-world concepts. The fundamental construct is the object, which combines both data structure and behavior in a single entity [RUM]. There are three fundamental characteristics required for an object-oriented approach: encapsulation, polymorphism, and inheritance. Encapsulation is not unique to the object-oriented paradigm; however polymorphism and inheritance are two aspects unique to the object-oriented approach [TEG]. These three aspects of the object-oriented paradigm are described below.
Encapsulation: Encapsulation consists of separating the external aspects of an object, which are accessible to other objects from the internal implementation details of the object that are hidden from other objects. Encapsulation prevents a program from becoming so interdependent that a small change has massive effects. For example, the implementation of an object can be changed without affecting the application that use it. One may want to change the implementation of an object to improve performance, fix a bug, consolidate code, or for porting.
Polymorphism: Polymorphism means having the ability to take several forms. For object-oriented systems, polymorphism allows the implementation of a given operation to be dependent on the object that contains the operation. For example, there can be different operations to pay employees based on the employee’s object type, e.g., part-time, hourly, or salaried. Each type of employee object can have its own customized compute-pay operation. When a new type of employee is created, e.g., student, the programmer simply creates a new type of employee object and a new implementation of compute-pay for the new type of employee. When an instance of student receives the message to compute-pay, it uses the operation defined in the new object to perform the calculation. The compute-pay operations of the other types of employees are not affected by the payment operations required for the student. In contrast, structured systems often have all pay operations contained in one program. The program must be capable of differentiating between the different types of employees and applying the appropriate operation. A modification to a new type of employee typically requires existing structured code to be changed.
Inheritance: Inheritance is a reuse mechanism that allows programmers to define objects incrementally by reusing previously defined objects as the basis for new objects. For example, when defining a new type of employee (e.g., student), the new employee type can inherit the characteristics common to all employees (e.g., name, address), from a generic type of employee. In this approach, the programmer needs only to be concerned with the difference between student employees and generic employees. Structured systems do not have an inheritance mechanism as part of their formal specification.
3.2 A Reduced CK Metrics Suite
In Section 2 we listed the complete CK suite of object-oriented metrics, however they were not rigorously defined. A subset of the suite is formally defined (name, definition, and theoretical basis) in this subsection in order to develop the reduced suite. See [CHI] for the formal definitions of the complete set of the CK suite of object-oriented metrics.
Metric 1: Weighted
Methods Per Class (WMC)
Definition 1: Consider a class C1, with methods M1, …, Mn that are defined in the class. Let c1, …, cn be the complexity of the methods. Then:
n
WMC = ∑ ci.
i =1
If all method complexities are considered to be unity, then WMC = n, the number of methods.
Metric 2: Coupling Between Objects (CBO)
Definition 2: CBO for a class is the count of the number of other classes to which it is coupled.
Theoretical Basis 2: CBO relates to the notion that an object is coupled to another object if one of them acts on the other, i.e., methods of one use methods or instance variables of another. Since objects of the same class have the same properties, two classes are coupled when methods declared in one class use methods or instance variables defined by the other class.
Metrics 3: Response for a Class (RFC)
Definition 3: RFC = |RS| where RS is the response set for the class.
Theoretical Basis 3: The response set for the class can be expressed as
RS = {M}Uall i{Ri}
where {Ri} = set of methods called by method i and {M} = set of all methods in the class.
The response set of a class is a set of methods that can potentially be executed in response to a message received by an object of that class. The cardinality of this set is a measure of the attributes of objects in the class. Since it specifically includes methods called from outside the class, it is also a measure of the potential communication between the class and other classes.
The formal definitions above are now used to construct an equation relating two out of the three more important aspects (encapsulation, polymorphism, and inheritance) of the object-oriented paradigm namely, encapsulation and polymorphism. From Definition 3 we have:
RFC = NLM + NRM (1)
where NLM = number of local methods in a class and NRM = number of remote methods called from a class. Definition 1 stated that if all the method complexities in a class are considered to be unity, then WMC = n, the number of methods in the class, which gives NLM = WMC [CHI].
It was mentioned in Section 2 that excessive coupling between objects indicates weakness of class encapsulation and may inhibit reuse. However, some coupling between objects is necessary for objects to be able to interact with each other. Ideally, objects should be coupled as loosely as possible in order to promote encapsulation and reusability. This is accomplished by having objects interact with which other exclusively through their interface. There are other types of coupling. The CBO metric defined in Definition 2 describes objects that are tightly coupled; the objects are accessed internally though remote methods calls (NRM) to other object methods or instance variables.
Because tight coupling between objects is undesirable, we shall use this fact as one of the cornerstones in identifying low quality software. On average, the number of remote method calls is much larger than the number of instance variables accessed from one object to the next. Thus, under tight coupling of objects the number of remote methods calls approximates the measure of coupling between objects that is NRM ≈ CBO.
Substituting the metrics WMC and CBO for NLM and NRM respectively into Equation 1 (RFC = NLM +NRM) gives:
RFC = WMC + CBO (2).
Equation 2 and the DIT metrics shall be used to identify low quality software in Section 4, Empirical Investigation.
3.3 Applying the Reduced Metrics Suite
There are three fundamental aspects of the object-oriented paradigm, namely encapsulation, polymorphism, and inheritance. The equation RFC = WMC + CBO captures both encapsulation and polymorphism and their relationship to each other. High values for WMC and CBO indicate low encapsulation [HUD], [LOR93], [LOR94] a class may be implementing too much of a system’s functionality or may be coupled too tightly. The RFC metric measures polymorphism by recording the number of remote method calls by the class, for example, virtual methods in the C++ programming language is a direct implementation of the concept of polymorphism. It is clear that the DIT metric quantifies the inheritance aspect of the object-oriented paradigm and should need no further explanation.
Examining equation RFC = WMC + CBO more closely suggests that if WMC and CBO increases then RFC also increases and if WMC and CBO decreases, RFC also decreases. If objects in a system are loosely coupled (one indicator of high quality code) then the CBO metric will be low and increases in RFC are due to WMC. That is, including more methods in a class increases the RFC metric but since coupling is loose, the CBO metric should not significantly increase. On the other hand, if objects in a system are tightly coupled (one indicator of low quality code) then the CBO metric will be high and increases in RFC would be due to both CBO and WMC. Different researchers like Kidd, Lorenz, [LOR94] and Rosenberg [ROS97] give similar preferred highest values for these object-oriented metrics along with values for the DIT metric also.
Thus, as coupling between objects increases, the equation RFC = NLM + NRM approaches the equation RFC = WMC + CBO. This can be used to identify low quality object-oriented systems and with the DIT metrics capture the three more important aspects of the object-oriented paradigm: encapsulation, polymorphism, and inheritance.
This section provides theoretical justification for suggesting the completeness and sufficiency of a minimal set of object-oriented metrics for analyzing the quality of an object-oriented system, where some of the metrics were related in the form of an equation. The next section applies this reduced set of object-oriented metrics to three real world projects to measure their overall quality and compare the results with the results obtained from using a larger set of both traditional and object-oriented metrics.
This section applies the results obtained in the study to three industrial strength software systems. The outcome is validated with previous results obtained from extensive full-scale code analysis performed by the SATC on the same three systems.
Subsection 4.1 describes the software applications used in validating the reduced object-oriented metrics set in detail. The statistical terminologies used in the investigation are defined and discussed in Subsection 4.2 and the statistical analysis is conducted in Subsection 4.3.
4.1 Descriptions of the Applications
The three applications used in this empirical study to validate the reduced object-oriented metrics set are industrial strength software. Two of the applications were NASA systems and one was a commercial product. We labeled the applications as: System A, System B, and System C. System A was the commercial software implemented in Java and consisted of approximately 50,000 lines of code and had 46 classes. One of the NASA software applications was also implemented in Java and consisted of approximately 300,000 lines of code and contained 1,000 classes. We labeled this application System B. The last application, System C, was also a NASA product approximately consisting of 500,000 lines of code distributed over 1,617 classes. System C was implemented in the C++ programming language.
Table 1 summarizes this descriptive information along with other information for each System. The last two rows in the table were obtained from the SATC full-scale code analysis of these systems. The table shows a direct positive correlation between the degree of object-oriented constructs and the level of quality for each software application.
|
System
|
A |
B |
C |
Lines of Code |
50k |