|
Measuring Requirements
|
Experience Report
| Theodore Hammer NASA/GSFC Code 303 Greenbelt, MD 20771 |
Linda Rosenberg, Ph.D. Unisys Federal Systems Code 300.1 Greenbelt, MD 20771 |
Lenore Huffman SATC Code 300.1 Greenbelt, MD 20771 |
Lawrence Hyatt NASA/GSFC Code 302 Greenbelt, MD 20771 |
ABSTRACT
Requirements are written to specify the functionality of a completed software system.
Software systems are often released in segments called builds, each one adding new
functionality and satisfying an additional set of requirements. New software requirements
tools are allowing quality assurance engineers to develop and use new metrics to assist
them in evaluating the relationships of requirements to tests, thus ensuring the required
functionality in the new system. NASA's Goddard Space Flight Center is applying new tools
and technology to measure the effectiveness of requirements testing. This paper discusses
an effort that uses project data to demonstrate metrics effectiveness.
KEYWORDS
Requirements, Testing, Metrics, Quality Assurance, CASE Tools
INTRODUCTION
Requirements development and management have always been critical in the implementation of software systems-engineers are unable to build what analysts can't define. Recently, automated tools have become available to support requirements management. The use of these tools not only provides support in the definition and tracing of requirements, but it also opens the door to effective use of metrics in characterizing and assessing risks. Metrics are important because of the benefits associated with early detection and correction of problems with requirements. Problems that are not found until testing are approximately 14 times more costly to fix than if the problem was found
in the requirement phase.[2] Automated requirements management tools are being used on several large projects at NASA's Goddard Space Flight Center (GSFC).
In support of these projects, the Software Assurance Technology Center (SATC) and the project Quality Assurance Office, members of whom are part of the GSFC Data Systems Assurance Group, are working together to develop and apply a metrics program that utilizes the information available through the application of requirements management tools. Metrics based on this information provide insight into the testing of requirements; this information assists the Quality Assurance Office in its project oversight role.
The requirements segment of a metrics program is multifaceted it evaluates the quality of the requirements document for quality indicators, such as weak phrases [9]; characterizes the expansion of requirements between levels of detail [4]; measures design and code element traceability; and tracks testing as a way to ensure that all requirements have been satisfied. The focus of this paper is the application of metrics available through the use of a requirements management CASE tool. These metrics assist project managers and quality assurance engineers in identifying risks associated with ensuring that the functionality specified by the requirements is contained in the completed software system. This new metrics technology is being developed at GSFC by the SATC in conjunction with a very large software development project. The requirement testing metrics discussed in this paper and their interpretation are applicable to any software development effort, however.
There are no published or industry standard guidelines for these metrics-intuitive
interpretations, based on experience and supported by project feedback, are used in this
paper. Project management has reacted favorably to these metrics and has used the analysis
results to mitigate a perceived risk. The SATC continues working on methods to
mathematically validate the intuitive guidelines. Joint work also continues to identify
new metrics available through the application of a requirement management CASE tool. The
objective is to assist project management in producing high-quality requirements and test
plans while identifying and minimizing project risks.
DEVELOPMENT ENVIRONMENT
While the project must remain anonymous, a general understanding of the project's development environment is necessary. For clarity, we will also describe some development aspects that may not be standard in all environments. The project being analyzed is implementing a large system in three main incremental builds. The development of these builds is overlapping, design and coding of the second and third builds having been started prior to the completion of the first build. Each build adds new functionality to the previous build and satisfies a further set of requirements.
The definition of requirements for this system started with the formulation of System Level Requirements, referred to as "Level 1" requirements. These are mission-level requirements for the spacecraft and ground system; they are at a very high level and rarely, if ever, change. We will not discuss requirement sat this level because they are not stored in the requirements database under scrutiny. Level 1 requirements then undergo several levels of decomposition to produce Allocated Requirements, called & quot; Level 2"; these requirements are also high-level and change should be minimal. Project development started at this requirement level. Level 2 requirements are then divided into subsystems and a further level is derived in greater detail; hence, " Level 3 Derived Requirements. " Generally, contracts are bid using this level of requirement detail. Each requirement in Level 2 traces to one or more requirements in Level 3. This is a bi-directional tracing, with Level 3 requirements refocusing into Level 2 requirements. The Detailed Requirements are found in " Level 4" requirements; these requirements are used to design and code the system. There is also a bi-directional tracing between Level 3 requirements and Level 4 requirements. This traceability is critical to ensuring that the completed system contains all functionality specified at each level and that no additional functionality creeps into the final system, increasing risks, cost, and time to complete.
The project used Marconi's Requirements and Traceability Management(RTM) CASE tool for
storing and tracking requirements. RTM is one of a collection of tools designed to support
a project throughout its life cycle by providing requirements identification and
engineering, traceability of requirements, and configuration management.[7]The metrics in
this paper were derived though the analysis of the requirements information contained in
the RTM.
REQUIREMENT TESTING
Once requirements are written, methods for ensuring that the system contains the functionality specified must be developed. There are a number of methods for validating functionality and determining if the software reacts in the expected way: testing, inspections, analysis, and demonstrations.[2,8] The main method used in all systems, and the only one considered in this paper, is testing. To validate the requirements, test plans are written that contain multiple test cases; each test case is based on one system state and tests some functions that is based on a related set of requirements.[6]
In the total set of test cases, each requirement must be tested at least once, and some requirements will be tested several times because they are involved in multiple systems in varying scenarios and in different ways. But as always, time and funding are issues; while each requirement must be comprehensively tested, limited time and limited budget are always constraints upon writing and running test cases. It is important to ensure that each requirement is adequately, but not excessively, tested. In some cases, the requirements can be grouped together using criticality to mission success as their common thread; these must be extensively tested. In other cases, requirements can be identified as low criticality; if a problem occurs, their functionality does not affect mission success while still achieving successful testing.[2,6,8]In order to ascertain the point at which testing benefits become marginal, the Quality Assurance office asked the SATC for a solution using the application of metrics. The SATC's objective then became to identify data available in the RTM that would yield metrics for requirement testing.
In this paper, although the data is sometimes shown for Build 1, we will focus on the
testing of Build 2 and Build 3, and the satisfaction of the associated Level 3 Derived
Requirements and Level 4 Detailed Requirements. When testing requirements, Level 4
requirement testing is referred to as System Test and is done first; Level 3 requirements
are then validated through Acceptance Testing. Table 1 shows the life-cycle status of each
Build and the number of requirements for each level. Because the project is still
evolving, most data in this paper is for the June 1996 time period.
REQUIREMENT METRICS
One metric is usually insufficient to conclusively evaluate anything; multiple metrics must be applied.[5] Metrics must also be viewed from different perspectives; in one view, the picture may show a perfect project, but when viewed from another perspective, a very flawed picture may appear. In this paper we investigate three metrics that assist in the evaluation of requirements testing: Requirement Links to Tests, Test Span, and Test Complexity. For each metric, we will first describe the metric and the analysis of the data. We will then investigate how this metric assists the quality assurance engineer and project management. All metrics will be discussed in generic terms so as to be applicable to any project; however, the data used to demonstrate the metric is real project data.
Life-Cycle Status |
Level 3 Requirements |
Level 4 Requirements |
|
1 |
Released |
155 |
371 |
2 |
Start Integration Test |
1,070 |
2,498 |
3 |
Start Coding |
1,626 |
2,830 |
Table 1: Requirement Counts by Build and Level
Metric 1: Requirement Links to Tests
The first objective is to verify that each requirement will be tested; the implication is that if the software passes the test, the requirement's functionality is included in the system. This is done by determining that each requirement is linked to at least one test case. It is expected that each requirement will be linked to multiple test cases, and that each test case will test multiple requirements [2,6,8]. These relationships are shown in Figure 1. This linkage is counted at both the Level 3 and Level 4 requirements level.
Figure 1: Sample Linkage Requirement-Tests
Figure 2 shows three groups of bars: the total number of Level 3 requirements for each build, the number of linked requirements, and the number of unlinked requirements. Build 1 appears completely unlinked due to aforementioned database problems. In Build 2,96% of the Level 3 requirements are linked to a test case, and in Build 3, all Level 3 requirements are linked to a test case. This data indicates there may be a problem with testing Build 2 since it is at the start of Integration Testing and all tests should have been specified. But since only 4% of the requirements are not linked this may indicate a simple lapse in data entry. Build 3, currently in the coding phase, has all linkages established and indicates a " good" state.
Figure 2: Level 3 Requirement Linkage to Tests
Since Level 3 requirements are tested after the Level 4 requirements, Level 4
requirement linkage to test cases must be investigated to ascertain the risks implied in
Level 4 test results. The data shown in Figure 3 indicates that the possible concern in
Build 2, as indicated in Figure 2, is in fact justified.

Figure 3: Level 4 Requirement Linkage to Tests
Build 2 is currently testing Level 4 requirements, but 40% of the requirements are not linked to any test. This indicates that there is no way to verify whether the functionality of 999 requirements is included in the system. At this point, the quality assurance engineer and project managers have multiple tasks. First, it must be verified that the links are really non-existent, not just un-entered in the database. Second, it may be that some of the requirement shave been deferred to a future build, in this case, traceability of this move must be verified. Third, if there are no test cases developed for 999 requirements, the requirements must be identified and their criticality specified. This information will assist in developing the additional test cases. It may be possible to link some of the requirements to existing test cases with minimum modification to the test data. If new test cases must be developed, budgetary problems may be created and the testing schedule must be increased. In all cases, further investigation of the missing links is warranted.
For Build 3, just starting the coding phase (coding continues for approximately 10
additional months), only 25% of the requirements are not linked to test cases. This
situation needs to be monitored on a monthly basis but is not one for major concern at
this time.
Metric 2: Test Span
The second metric for requirements testing attempts to characterize the test plan and
identify insufficient or excess testing. Requirements are usually tested by more than one
test, and one test usually covers more than one requirement.[1] Since each test costs
money and takes time, the obvious questions are how may requirements are covered by one
test, and how many tests cover only one requirement. On the other hand, if requirements
are insufficiently tested, functionality may not be verified. This metric is in two parts
because of the bi-directional linkage between the requirements and tests. Each direction
yields different information. Counting the number of unique tests used for a requirement
indicates tests that possibly include too many requirements and may not comprehensively
test those requirements. Counting the number of unique requirements tested indicates the
exclusivity of the testing.
Requirements Per Test
The requirements per test metric counts the number of unique tests associated with each requirement. The data is then summarized to count the number of unique tests which are associated with a given number of requirements. The expected data curve is shown in Figure 4. On the X-axis, the Number of Requirements is represented. The Y-axis shows the Number of Unique Test Cases. Please note that the numbers used in Figure 4 have no relation to any the oreticalor real project, they are for illustration only. The data shows that there are five different tests which verify only one requirement, there are seven different tests which verify only two requirements, etc. These are unique tests; those five tests are not included elsewhere on the graph. The requirements are not unique however: a requirement in the " ones" bar may be again counted within another group of requirements per test. The sum of the number of tests for each requirement group (bar), is the total number of tests. The sum of the number of tests times the number of requirements will exceed the total number of requirements, but will equal the total number of individual links between requirements and tests.

Figure 4: Sample Curve, Requirements to Tests
Figure 4 shows the expected shape of the curve that results from plotting the data. The expected graph in Figure 4 has two humps, the first of which is from integration testing and the second from acceptance testing. Each type of testing should produce a" bell shaped " curve, due to the nature of the testing process. Testing involves the process of putting the system into one of its assumable " states" or of defining a functional thread, and then testing requirements that are associated with that state or thread. The costs and time associated with testing are planning the test cases and test data and then setting up the system for the test cases. Thus, test cases that test few requirements are easy to design but (relatively) expensive to develop and run; test cases that test many requirements are difficult to plan, but relatively inexpensive to run per requirement. The curve should peak where test cases and data are not too hard and too costly to develop. These cases test a medium number of requirements.
Since acceptance testing is based more on operational scenarios than on system states and threads, the number of requirements tested per case would be expected to be greater than in integration testing, and the number of cases fewer (although they will be more encompassing). This produces the second peak in the curve.
Figure 4 shows that most tests will link to multiple requirements, but there are instances where a group of tests verify only a few requirements. This group may include quite specific tests due to the functionality or criticality of the requirement being tested. The focus is on weightiness of each test; the more requirements it tests, the more important it is.[8] Given this suggested shape for the data, we now investigate project data.
Although the data is available for both Level 3 and Level 4 requirements, we will look
at the data for Level 4 requirements for Build 2 and Build 3. This data is of interest
since Figures 2 and 3 indicate a potential problem with the test linkages for Level 4
requirements.(This data was not available for Build 1.) Figure 5 shows the distribution of
Level 4 requirements loading per test case for Build 2; Figure 6 shows the same
information for Build 3.
Figure 5: Build 2-Level 4
Figure 6: Build 3-Level 4
In Figure 5 for Build 2, 78 unique tests verify one requirement and 51 unique tests verify two requirements, etc. This shows there are a number of narrow span tests possibly designed for the testing of quite specific and/or unique requirements. The tail on the right side of the curve is also of interest, indicating that there are four tests that each verify 40 requirements. These tests maybe too inclusive or general; either they are too ambitious for adequate testing, or is an indication that these tests are attempting to verify requirements which are too broad for this level of requirement decomposition. In certain cases, they may even raise awareness of anomalies in the requirements and test management procedures. Recall, many links from requirements to test had not been identified(Figure 3). When this data is added, it may drastically alter the picture currently being presented. This graph in Figure 5 starts higher than the desired pattern but does show a slight second hump at 14 requirements.
While no data mimics the theoretical situation shown in Figure 4, the picture in Figure 6 is closer to the one expected. In this graph however, we see a long tail at the end of the graph, indicating many instances where only one test is used to verify 49 to 63 requirements. This raises a few questions of whether the requirements are being sufficiently tested, and if the broadness of the test matches the broadness of the requirements. As mentioned, there is always a trade off between the time and cost of testing versus the comprehensiveness of the testing.
Tests Per Requirement
The tails of Figure 5 and 6 show single tests for multiple requirements, but since the
requirements are not unique, they may be counted in another test also. This data only
represents uniqueness one direction, so it is important to reverse the axis of the graphs
and look at the uniqueness of the tests.[1] Figure 7 shows an expected profile where the
X-axis is now the Number of Unique Requirements and the Y-axis the Number of Test Cases.
This data is interpreted that 35 requirements are tested by only a single test case, 25
requirements are tested using only two test cases, etc. As in Figure 4, the numbers shown
are for discussion purposes only, not to imply expected counts for actual projects.
Figure 7: Sample-Tests to Requirements
The graph in Figure 7 shows the dilemma of structuring a test program. A testing criterion is to have a one to one relationship between tests and requirements. In this way the validation of requirements is isolated to single tests and so are easily verified. The problem with large systems is that a one to one relationship between tests and requirements will cause a large test program to be developed which will be too costly and too time consuming to complete, due to the number of test cases, the amount of test data required, and the large number of test sessions needed to execute all of the tests. Therefore a balance must be obtained where a one to one relationship between requirements and test cases is developed for critical requirements, but less critical requirements are tested in groups based on system states or functional threads. At the extreme end of the chart in Figure 7 there area few requirements that have several test cases linked to them indicating certain requirements which need several different test cases to thoroughly validate them.[8]
For consistency and comparison, we again look at the data for Level 4 requirements for
Build 2 and Build 3. Figure 8 shows the data for Build 2 and Figure 9 for Build 3. This is
derived from the same data as in Figures 5 and 6, but is now viewing unique requirements
as opposed to unique number of tests.
Figure 8: Build 2-Level 4
Both graphs depict a profile that is similar to what is expected. However, a closer
look is needed at the extreme ends. The question is whether too many one to one
relationships exist, and do certain requirements need complex testing and many different
test cases.
Neither graph exactly depicts the expected shape, so further investigation is needed to
determine if a problem is indicated by the data. Recall, this set of data must also be
combined with the other information presented to get a comprehensive and complete picture.
In Figure 8, we see a fairly compressed testing plan represented, although 443
requirements (18%) are tested by only 1 test. At a glance, it appears Figure 9 for Build 3
shows a similar picture, but on further investigation, we see 731 requirements (26%) are
tested only once. There is also a very large number of out liers, where many tests are
associated with only one requirement (51 tests for only one requirement). This may
represent redundant testing which should be removed as this span of testing may make it
difficult to verify the requirement. This may be viewed as a pointer to a possible
problem: what is the nature of these requirements that they require so much testing?
Figure 9: Build 3-Level 4
Metric 3: Testing Complexity
Figure 9 indicates there may be excessive testing scheduled for Build 3 Level 4 requirements due to the large one to one requirement to test case ratio seen on the left hand side of the chart. The next step in applying metrics to evaluate requirements testing is to investigate the testing magnitude through the complexity of the linkages. One way this can be done is to look at the number of requirements, the number of linkages, and the number of tests. Recall, each link is a connection between a requirement and a test. This data presents a third view of the data previously presented. Figure 10 shows this information for Build 2 and Build 3.
Figure 10: Requirements, Links, and Tests
Figure 10 shows a number of different perspectives of the test information. First look at the ratio of the number of links to the number of requirements summarized in Table 2 below. In the Level 3 requirements for both Build 2 and Build 3, there are at least two links for each requirement. This means that on average, each requirement is tested by at least two different tests. For Level 4 requirements however, there is less than one link for each requirement. This indicates there are some requirements that are not linked to any test, hence their functionality may not be verified. This reinforces the conclusions drawn previously from Figure 3, indicating that many requirements were not linked to a test.
Build |
Level |
Ratio Links to Requirements |
Ratio Links to Tests |
Ratio Requirements to Tests |
2 |
3 |
2 : 1 |
5 : 1 |
2 : 1 |
3 |
3 |
4 : 1 |
10 : 1 |
4.5 : 1 |
2 |
4 |
0.5 : 1 |
3 : 1 |
4.5 : 1 |
3 |
4 |
0.75 : 1 |
4 : 1 |
1.25 : 1 |
Table 2: Ratio Links to Requirements
Another way of viewing the data is to look at the total number of links to the total number of tests - are there too many? These ratios are also contained in Table 2. Looking at Level 3 requirements for Build 3, the graph in Figure 10 is reinforced by the ratios in Table 2, that show a ratio links to tests of 10 to 1. This indicates that although the number of tests seems adequate for Build 3 (Figure 10), the complexity of the test program is too high; that is, the linkage between requirements and tests is complex(Table 2).
As stated, when metrics indicate a potential problem, further investigation is needed. The tentative conclusion is that the test plan for Build 3 Level 3 requirements is too complex, but the last column in Table 2 indicates the ratio of requirements to tests for Build 3 Level 3 requirements is not out of line. It is likely that the number of tests is sufficient, but the number of links is excessive and may need to be decreased, thus decreasing the complexity of the tests. But while the last column in Table 2 resolves one concern, another is raised. Looking at the ratio of requirements to tests for Build 3 Level 4 requirements, there is a one to one ratio. This indicates a very large number of tests for this Build and Level, suggesting a potential risk of failing to complete testing within schedule and budget. These factors may all be pointing to poor test case design, or, they may simply be a lapse in procedures for requirements and test case management.
As discussed previously in this paper, there are no guidelines for these metrics since
they are in research infancy; in evaluating the data in Table 2, we were looking for
inconsistencies based on experience.
SUMMARY
A successful test plan verifies that all functions specified in the requirements are
included in the final system. This test plan must be accomplished with minimum cost to
resources and schedule. Three metrics for evaluating the quality of a project test plan
were discussed in this paper. The first metric, Requirement Links to Test, verified the
inclusion of functionality in the system by counting the number of requirements that
linked to a test case. The second metric, Test Span, identified insufficient or excessive
testing by counting the links first from requirements to test, then from tests to
requirement. This warned of possible cost or schedule overruns. The third metric evaluated
the test complexity by comparing the number of requirements, number of links, and numbers
of tests. This metric also yielded information on potential risks to cost and schedule.
CONCLUSION
The use of an automated tool to track requirements and their test cases has opened the
door to the use of metrics to characterize a test program in terms of its structure and
complexity, and to assess whether all requirements are verified by test cases. The use of
an automated tool for requirements management is essential for gaining insights not
otherwise available. The metrics presented here were the result of many different attempts
to display and use the data. Key to this analysis was access to the RTM database. This
afforded an opportunity to allow the metrics program to evolve into the metrics which
provided the best insight into requirements and test cases. Access to databases that are
being used with the requirements definition and test case development is essential to
allow the project to define and refine metrics that are appropriate to analyzing the
status of the implementation effort.
REFERENCES
Measuring Requirements Testing was Presented at the International Conference on Software Engineering (ISCE '97), Boston, MA, May 1997.