VII. SOFTWARE AND SYSTEM SAFETY A. Concepts and Definitions System safety is concerned with the possibility of catastrophic failure of systems in such a way as to compromise the safety of people or property, or result in mission failure. Software safety is definable only in the system context. Software has no inherent dangers; however, systems controlled or monitored by software do fail, and some failures of some systems will have safety impacts. To the extent that system failures can be caused or fail to be prevented by software, there is a need for an activity called "software safety." If we are to be concerned with the safety of software only in a system context, we must then be concerned with nonconformances in the software and with the software requirements as well. Indeed, the most serious problems with software-based systems are those that develop when the software requirements are incorrect, inappropriate, or incomplete for the system situation. B. Software Problems System failures that are caused by software are due to one of two types of software problems: nonconformances (or failures to satisfy requirements) or an error or omission in the software requirements. A nonconformance may be simple (the most common is a coding error or "bug"), or more complex (i.e., a subtle timing error that delays a shuttle launch). The important point about nonconformances is that verification and validation techniques are designed to detect them and assurance techniques are designed to prevent them; improvements in these methods and a safety program based on specialized application of them are improving the safety and reliability of software controlled systems. An error or omission in requirements is less tractable. The software may perform exactly as required, but the requirements do not correctly deal with some system state. When the system enters the undefined state, unexpected and undesirable behavior may result. This type of problem cannot be handled within the software discipline; it results from a failure of the system and software engineering processes which developed and allocated the system requirements to the software. C. Methods for Improving Software Safety Improving the software development process and building better software are ways to increase system reliability, i.e., by producing software with fewer faults. Intuitively, more reliable software is probably safer software, but from a safety standpoint more concentration on safety-related software functions is needed. A first order approach is to identify the critical software that controls system safety- related functions and give it special attention through the development and testing process. This is just a special case of the "build it better" method, but it focuses scarce resources on critical areas. D. Software Safety Program (Example) System hazard analysis may indicate that some software requires a more formal safety program because it is included in a safety critical system component. The software safety program begins with a preliminary software safety analysis. The purpose of the preliminary software safety analysis is to identify software controlled functions that affect the safety critical component and the software components that execute the functions. These software components are safety critical. When a safety critical software component is identified, then software safety activities are initiated on that component and continued through the requirements, design, and code analyses and testing phases in the software development process. 1. Requirements Analysis Software safety requirements analysis forms the basis for subsequent software safety activities. The process of requirements analysis evaluates both software and interface requirements. The analysis is intended to identify errors and deficiencies in the software requirements that could result in the identified hazardous system states. Techniques employed in performing requirements analysis include criticality analysis; specification analysis; and timing, sizing, and throughput analysis. Criticality analysis evaluates each requirement in terms of the safety objectives derived for a given software component. This evaluation is to determine whether the requirement has safety implications. If so, the requirement is deemed critical and must be tracked throughout the software development cycle; that is, through design, coding, and testing. It must be traceable from the highest level specification all the way to the code and documentation. Specification analysis evaluates the completeness, correctness, consistency, and testability of identified software safety critical requirements. Specification analysis considers each requirement singly and all requirements as a set. Timing, sizing, and throughput analysis evaluates software requirements that relate to execution time, memory allocation, and channel usage. Timing, sizing, and throughput analysis focuses on noting and defining program constraints based on maximum required and allowable execution times, maximum memory usage and availability, and throughput considerations based on I/O channel usage. 2. Design Analysis Design analysis verifies that the program design correctly implements safety critical requirements. Design logic analysis evaluates the equations, algorithms, and control logic of the software design. Design data analysis evaluates the description and intended usage of each data item used in design of the critical component. Interrupts and their effect on data must receive special attention in safety critical areas to verify that interrupts and interrupt handling routines do not alter critical data items used by other routines. Design interface analysis verifies the proper design of a software component's interfaces with other components of the system, including hardware, software, and operators. Design constraint analysis evaluates the design solutions against restrictions imposed by requirements and real-world limitations. The design must be responsive to all known or anticipated restrictions on the software component. These restrictions may include timing, sizing, and throughput constraints, equation and algorithm limitations, input and output data limitations, and design solution limitations. 3. Code Analysis Code analysis verifies that the coded program correctly implements the verified design and does not violate safety requirements. The techniques used in the performance of code analysis mirror those used in design analysis. 4. Safety Testing Software safety testing verifies analysis results, investigates program behavior, and confirms that the program complies with safety requirements. Special safety testing, conducted in accordance with the safety test plan and procedures, establishes the compliance of the software with the safety requirements. Safety testing focuses on locating program weaknesses and identifying extreme or unexpected situations that could cause the software to fail in ways that would cause a violation of safety requirements. The safety testing effort is limited to those software requirements classified as safety critical items. E. Techniques and Tools In the last few years, there has been much effort to adapt methods used in hardware safety and reliability to software. Tools like fault tree analysis and sneak circuit analysis have been applied to software with some success. Modeling of software using Petri nets has been tried, and other modeling techniques have been advocated, but with only limited success to date. While some techniques may have some limited usefulness, their success depends heavily on the ability of the analyst that applies them.