SATC LOGO

Developing An Effective Metrics Program

L. Rosenberg, Ph.D. L. Hyatt
Unisys /GSFC, Bld 6 Code 300.1 GSFC NASA Bld 6 Code 302.
Grenebelt, MD 20771 USA Greenbelt. MD 20771 USA
Tel:301-286-0087, Fax: 301-286-0304 Tel:301-286-7475,Fax: 301-286-1701

ABSTRACT

Software metrics programs can be very cost effective or a totalwaste of resources. This paper discusses how to develop an effective,affordable metrics program that will help project managers monitorproject risks and evaluate product quality. The Goal/Question/Metricparadigm (GQM) is used to demonstrate how a meaningful metricsprogram can be started and uses data from projects at GoddardSpace Flight Center (GSFC) to demonstrate some analysis and applicationtechniques.

This paper supplies both project managers and software developerswith techniques to initiate a metrics program that yields timely,relevant, usable information at minimal cost.

1. INTRODUCTION

The Software Assurance Technology Center (SATC) was establishedin 1992 in the Systems Reliability and Safety Office at NASA'sGoddard Space Flight Center (GSFC). The SATC has programs in fourareas: Software Standards and Guidebooks; Software Metrics Researchand Development; Assurance Tools and Techniques; and Project Supportand Outreach. The SATC, as a center of excellence in softwareassurance, is dedicated to making measurable improvement in thequality and reliability of software developed for GSFC and NASA.

But developing a metrics program is not easy. It has many possiblepit-falls that can lead to the ruin of the metrics program itselfand possibly the project if incorrectly applied. This paper startswith a discussion of the costs versus benefits of a metrics program.How to develop and implement a metrics program using the Goal/Question/MetricParadigm is then discussed, followed by an example using datafrom GSFC projects.

2. METRIC PROGRAM COSTS VS. BENEFITS

It is difficult to pin down the costs of a metrics program becausemetrics are usually just one aspect of an overall improvementprogram. When investigating the feasibility of starting a metricsprogram, it is often found that managers are individually collectingsome form of data. This decreases initial program start up cost.Accurate and complete measurements are not inexpensive; comprehensivemetrics programs for software products and process annual costscan be 2 to 3 percent of the total software budget for collectinghard data.[31 Attempts to pin down the cost of metrics hide thereal issue, however, developers don't really have a choice. Thecost of not implementing a software metrics program can be measuredin terms of project and business failures. Those projects andcompanies who make the investment in metrics have a competitiveadvantage over those who do not. They have the advantage of moreinformed and timely decisions that will ultimately make them moresuccessful, with the best track records in terms of bringing softwareprojects to completion and achieving high levels of user satisfaction.[2]

It is difficult, if not impossible, to place a dollar amount onthe benefits of a metrics program because as in the case of riskmanagement, you are trying to measure something that did not happen.The benefits derived are also not only applicable to the currentproject but to future projects. As with any new project, whetherit is implementing a new engineering design or a metrics program,start up costs are high. But as management and staff become familiarwith the tasks and tools are developed, the costs decrease toa low maintenance level.

3. DEVELOPING A METRICS PROGRAM

3.1 Where to Start

Once a developer decides to implement a metrics program, the nextstep is How. How a metrics program is developed can determineits success or failure. One approach is to investigate tools availablefor metrics collection, purchase the tool, then collect and attemptto apply whatever metrics are provided by the tool. This may workbut has a major hurdle - what will the data collected tell themanagement and developers about their specific project. Data collectedjust because it is available has minimal value at best and usuallyends up a waste of resources.

Successful metrics programs generally begin by focusing on a problem.At the start of the metrics program, goals must be establishedthat address the problem. Related questions that management wantsanswered are identified then the data that is needed to answerthese questions are specified. This leads to the tool specificationfor purchase or in-house development. Data collection can be expensiveif not carefully monitored - the temptation is to collect allpossible data and decide how to use it later.

The earlier benefits are seen by both management and developers,the sooner metrics programs are accepted. Metric programs shouldbe designed to show visible benefits as soon as possible, thisis the key to continued support.

3.2 Goal/Question/Metric (GQM) Paradigm

The Goal/Question/Metric (GQM) Paradigm is a mechanism that providesa framework for developing a metrics program. It was developedat the University of Maryland as a mechanism for formalizing thetasks of characterization, planning, construction, analysis, learningand feedback. The GQM paradigm was developed for all types ofstudies, particularly studies concerned with improvement issues.The paradigm does not provide specific goals but rather a frameworkfor stating goals and refining them into questions to providea specification for the data needed to help achieve the goals.[l]

The GQM paradigm consists of three steps:

  1. 1. Generate a set of goals
    2. Derive a set of questions
    3. Develop a set of metrics

1 - Generate a set of goals based upon the needs of the organization- Determine what it is you want to improve. This provides a frameworkfor determining whether or not you have accomplished what youset out to do. Goals are defined in terms of purpose, perspectiveand environment using generic templates:

 
Purpose: To (characterize, evaluate, predict, motivate, etc.)the (process, product, model, metric, etc.) in order to (understand,assess, manage, engineer, learn, improve, etc.) it.
 
 
Perspective: Examine the (cost, effectiveness, correctness,defects, changes, product metrics, reliability, etc.) from thepoint of view of the (developer, manager, customer, corporateperspective, etc.)
 
 
Environment: The environment consists of the following: processfactors, people factors, problem factors. methods, tools, constraints,etc.

2 - Derive a set of questions - The purpose of the questions isto quantify the goals as completely as possible. This requiresthe interpretation of fuzzy terms within the context of the developmentenvironment. Questions are classified as product-related or processrelatedand provide feedback from the quality perspective. Product-relatedquestions define the product and the evaluation of the productwith respect to a particular quality (e.g., reliability, usersatisfaction). Process-related questions include the quality ofuse, domain of use, effort of use, effect of use and feedbackfrom use.

3 - Develop a set of metrics and distributions that provide theinformation needed to answer the questions - In this step, theactual data needed to answer the questions are identified andassociated with each of the questions. As data items are identified,it must be understood how valid the data item will be with respectto accuracy and how well it responds to the specific question.The metrics should be objective and subjective and should haveinterpretation guidelines, i.e., what value of the metric specifiesthe product higher quality. Generally, a single metric will notanswer a question, but a combination of metrics is needed.
Once goals are defined, questions derived, and metrics developed,matrices are created to indicate their relationships and to identifysingle relationships that may not be cost effective.

3.3 GQM Example

The most effective way to understand a methodology is to reviewan example. This section demonstrates how a small metrics programwould be developed using the GQM. The program starts with thegoals, questions and proposed metrics, then demonstrates how GSFCdata could be used to answer some of the questions and satisfythe goals.

Figure 1 demonstrates sample goals, questions and metrics. Thegoals are general and could be adapted with minor modificationsto any project development. Questions are derived to quantifythe goals, often supporting more than one goal as shown in parenthesis( ). The metrics needed to provide the answers to the questionsare then chosen and shown in italics.

Figure 1:  Goals/Questions/Metrics

Figure 1: Goals/Questions/Metrics

Matrices similar to Figure 2 are developed showing direct andindirect correlations between goals and questions, then questionsand metrics.

Figure 2: Goals/Questions

Figure 2: Goals -> Questions

In the remainder of the paper, we will demonstrate some of themetrics and data that can be used to answer the questions andsatisfy the goals.

Question 1: Expected vs. actual effort level

Effort is usually measured in hours worked on specific projecttasks, such as training, requirements, design, coding, and testing.In Figure 3, the Rayleigh Manpower Curve for effort expenditurefor a typical software project is plotted against some projecteffort data. [4] Projects ramp up to full speed fairly quickly,then taper off as the maintenance phase approaches. Applying thiscurve assists managers adjust personnel levels to the expectedwork load.

Figure Rayleight Manpower Curve

Figure 3: Rayleigh Manpower Curve

Question 2: Requirement volatility

Late requirement changes are costly and may cause a ripple effectand additional changes. The earlier in the Life cycle the requirementsstabilize, the less the risk. Figure 4 shows total requirementsper schedule, indicating stabilizing requirements. Modificationsshould also be tracked.

Figure 4 Stabilization of Requirements

Figure 4: Stabilization of Requirements

Question 4: 90% of the errors be located.

This implies the ability to estimate the total number of errorsin the software. One industry guideline is to expect approximately7 errors per 1000 Source Lines of Code. This guideline is helpfulin an overall estimate of the number of errors, but does not takeinto account the rate at which errors are removed. The SATC isworking to release the Waterman Error Trending Model for determiningthe status of testing by projecting the number of errors remainingin the software and the expected time to find some percentageof errors. The SATC is developing this model rather than usingthe standard Musa model because it provides for nonconstant testingresource levels and is less sensitive to data inaccuracy. Figure5 is an example of the model's application.

Figure 5: Waterman Error Trending Model

Figure 5: Waterman Error Trending Model

If this project is expected to release at week 52, Figure 5 indicatesthat 96% of the errors will be found.

Question 6: Modules exceeding guidelines

Figure 6 is a graph template developed by the SATC to use as anindicator of the module risk levels. The x-axis represents thenumber of executable statements in a module; the y-axis is theextended cyclomatic complexity (number of test paths) for themodule.

There are many different guidelines for both code measures asto when risk increases or decreases and what is are acceptablelevels. The parameters in Figure 6 are based on guidelines fromvarious industry, military and NASA sources as well as error correlationsfrom GSFC data.

Figure 6:  Classification of Moduel Risk

Figure 6: Classification of Module Risk

To apply the graph in Figure 6, each module of code is plottedas shown in Figure 7. The percentage of modules in each regionand a list of module names in Regions 4, 5, and 6 are suppliedto the project management. It is recommended that developers furtherinvestigate the modules in these regions using further using metricssuch as fan in/ fan out, comment percentage and number of errors.One observation made by the SATC is that C and C++ code have alower percentage of modules in Regions 4, 5 and 6 than FORTRAN.

Figure 7:  Modules at Risk in a FORTRAN Project

Figure 7: Modules at Risk in a FORTRAN Project

Question 7: High risk modules

To answer question 7, inputs from multiple metric are needed.One factor is the risk derived from Table 3. The number of errorsby criticality also serves as an input. Comment percentage alsosupplies information relative to risk. Factors such as fan-in/fan-outand internal data flow also influences risk but are not shownin Table 3.

Table 3:  Risk evaluation of code modules

Table 3: Risk evaluation of code modules

4. CONCLUSION

The SATC applies goals to evaluate the quality of products (requirementdocuments through test applications) and provide risk informationto project managers. Metric programs are initiated to answer aquestion or provide numerical input to solve a problem.

The first step in developing a metrics program is to identifywhat are the goals or objectives of the program, then stay focusedon them. The objectives can be expanded into specific goals usingthe structure of the Goal/Question/Metric templates.

The second step is to define the T attributes that are be measured.These attributes are a subset of the quality attributes and arechosen based on the project objectives and goals. If the GQM isused, some of the goals will relate to the attributes.

The third step in developing the metrics program is to clarifyand quantify the goals. This is done by specifying questions andidentifying metrics and data that is needed. At this point a toolis chosen based on the needs of the project.

The final and a very critical step is to close the loop - providemanagement with answers to their questions based on the metricanalysis. The key to continued success of a metrics program isimmediate, visible benefits. It must do the job it was designedto do and supply management with usable information to solve theircurrent problem in a timely fashion.

REFERENCES

[1] Basili, Victor R, and Rombach, H. Dieter,
"Tailoring the Software Process to Project Goals and Environments",Department of Computer Science, University of Maryland, ACM, 1987.

[2] Grady, Robert, Practical Software Metrics For Project Managementand Process Improvement, Prentice Hall, 1992.

[3] Gillies, Alan, Software Quality, Theory and Management, Chapman& Hall Computing, 1992.

[4] Putnam, L., Myers, W., Measures for Excellence: Reliable Softwareon Time, Within Budget, Yourdin Press, 1992.

Presented at the European Space Agency Software Assurance Symposium,Netherlands, March, 1996.