Using Simics and Simulation in IEC61508 Safety-Critical Systems – an Interview with Andreas Buchwieser
At Wind River, we deal a lot with customers who work with various forms of safety-critical systems. Many of the most important systems in the world today are powered by Wind River operating systems and developed using Wind River tools. As a result, we have built up quite some expertise in how to develop safety-critical systems. In this blog post, I interview Andreas Buchwieser, who works with our safety-critical operating system platforms for transportation, industrial automation, process control, automotive, and medical systems. We will explore how simulation and Simics can be used to develop systems for those areas. It is important to note that aerospace is a different market, with different requirements, processes, and standards from what we will discuss here – and Simics has seen a lot of use in that market, as discussed previously.
Jakob Engblom (JE): Please introduce yourself!
Andreas Buchwieser (AB):
I am Andreas Buchwieser, Director Product Management, managing the Safety portfolio and certification for Industrial, Medical, Auto and Transportation markets. Before I joined Wind River in 2006, I had been working as an architecture consultant for safety related systems in the defense and automotive industry since 2003. With Wind River I have been in charge of various projects for developing industrial equipment (IEC61508, IEC60880), transportation systems (EN50128, IEC 62279), and medical devices (IEC62304). Representing Wind River, I also work closely with certification bodies such as TÜV Süd, TÜV Rheinland, and TÜV Nord, and Lloyds Register Rail.
JE: So what are your thoughts about using simulation in the development of safety-critical systems?
AB: Simulation, modeling and prototyping are integral parts of the development of safety-critical systems: they are used to supplement the validation process, to verify the software architectural design and for the conceptual capture of the functionality to be realized (open/closed loop control, monitoring) as well as for the simulation of real physical system behaviors.
Standards like IEC 61508, EN 50128 and ISO 26262 highly recommend simulation techniques. I have been discussing simulation with various certification bodies, and there is a great deal of interest in using simulation for certified systems.
I see three main areas where Simics could be a great asset during a certification project – and this list is probably not exhaustive. One is the use of fault injection during testing of diagnostic software. The other is early validation of requirements, to find issues before they become real problems or bugs. Third one is increasing the coverage metrics during module testing.
JE: I happen to like fault injection, so let’s start with that topic. What are your thoughts on fault injection and safety standards?
AB: Safety related devices have to implement built-in features to control failures during operation. These failures are typically random hardware failures. To control random hardware failures a couple of measures – so called “diagnostics” – have to be implemented. These diagnostics are part of the certified software, and have to be validated with a rigor corresponding to the calculated Safety Integrity Level (SIL). Indeed, validating diagnostic software is one of the most difficult parts of a safety or certification project. This type of software is present to diagnose problems with the hardware, detect problems, stop the problem from spreading and take the entire system to a safe state. This is very difficult to test.
One of the highly recommended techniques for validation in the safety standards is fault injection.
JE: How do you test validation software without a simulator?
AB:How do you make the hardware fail in a controlled way, in the way you want it to fail, and how do you repeat it? Today you typically use a debugger. There are a couple of issues when using a debugger for validating diagnostics:
- The executable in debug mode is not the same as release version
- Identifying addresses and symbols is cumbersome
- The program execution has to be halted
- There is no traceability of registers and opcode
With Simics, you can simulate the processor and target environment, and introduce faults into the system using the software release version.
JE: Have you used Simics for fault injection yourself?
AB: Right now I’m working as a Functional Safety Manager on developing VxWorks 7 Safety Profile. To keep VxWorks 7 Safety Profile modular and scalable, we are certifying the kernel only. Wind River will also provide the complementary piece of the kernel: support for a couple of architectures e.g. PPC-, Intel-, and ARM processors. The architecture support will have integrated processor diagnostics, which is of high value for our customers. We are investigating ways to qualify Simics as a tool suitable to be used as a means to validate these integrated processor diagnostics, one of the three areas I mentioned earlier.
JE: What do the standards have to say about fault injection and diagnostics testing with a simulator?
AB: Safety standards highly recommend simulation as a technique for safety validation. Look at how IEC 61508-7, section C.5.19 describes simulation as follows:
“The creation of a system, for testing purposes only, which mimics the behavior of the equipment under control (EUC).
The simulation may be software only or a combination of software and hardware. It shall
- Provide all the inputs of the system under test which will exist when the system is installed,
- Respond to outputs from the system in a way which faithfully represents the controlled equipment,
- Have provision for operate the inputs to provide any perturbations with which the system under test is required to cope.
When software is being tested, the simulation may be a simulation of the target hardware with its inputs and outputs.”
The software being tested with the proposed Simics approach is the diagnostics, the “Equipment under control” is the processor used. Using Simics facilitates the execution of validation under conditions present during normal operation, anticipated occurrences and undesired conditions requiring system action. Simics is a natural fit for this way of working.
JE: What would you do with fault injection in simulation more concretely?
AB: Let’s use a simple example: The IEC 61508 mandates to detect change of information caused by soft-errors in CPU registers and/or internal RAM to claim medium diagnostic coverage. Now you want to validate that the implemented diagnostics work correctly. With Simics you simply modify the information in a register and check whether the diagnostics detect this ‘fault’ and initiate the specified system reaction.
JE: What about requirements validation?
AB: It’s related to prototype development. Most of the problems or bugs come from the fact that requirements are not well specified. You can solve this by developing prototypes and use prototypes for early validation of requirements. Using a system simulator, that can simulate the entire system, is very useful for such early prototyping. By using virtual hardware, you get insight into both the software and the hardware, and how dependencies and interaction work. In the IEC 61508 standard, prototyping using simulation is called out as a way to test software with its environment and better understand the requirements:
“Prototyping may be used in any phase to elicit requirements or to obtain a more detailed view on the requirements and their consequences.
IEC 61 508: verification and validation tools such as simulators.
the software shall be exercised by simulation of:
1) input signals present during normal operation;
2) anticipated occurrences;
3) undesired conditions requiring system action;
Functional and black-box testing: Prototyping animation.”
JE: When IEC 61508 mentions prototyping, in what way does that relate to certification?
AB: The safety standards mandate functional and black box testing to verify the software architecture, which includes the interactions with the underlying hardware. For the developers it is crucial to understand the resources requirements at a very early stage of the project; reliable estimates can be obtained using prototypes. A reliable Simics model helps to check the feasibility of implementing the system against any given constraints.
JE: Sounds like prototyping is done in the early phases?
AB: Yes. This starts early in the design flow. Let’s take a simple example: assume you have selected a processor and a certain amount of RAM, and you have to check the RAM against error regularly. Typically, you would do this late in the process. However, what if the system is busy with the RAM checking all the time? With the simulator, you can see if the RAM check is feasible at all at an early stage. Do you need to change to a faster processor or change to ECC RAM to remove the need for RAM checking altogether.
This is not like fault injection, which comes in towards the end of the project when you validate that what you have implemented really works.
JE: The last issue you mentioned at the start of our discussion was the increase of coverage metrics. What does that mean?
AB: The safety standards highly recommend so called “structure based testing” to test software modules. The goal is to exercise a large percentage of the program code. The percentage of code coverage varies, depending upon the level of rigor required, but for high-critical Software the following requirements generally apply:
- Structural test coverage (statements) 100%
- Structural test coverage (branches) 100 %
- Structural test coverage (conditions, MC/DC) 100 %
Software communicates at some points with the hardware and reacts on hardware events. This makes it very cumbersome, if not impossible, to achieve the coverage requirements of the safety standards. Testers cannot easily trigger a specific hardware event that causes the software to execute a specific part of the program code. And keep in mind that much of the module testing is done on a host environment only, and not on the real hardware. As a result the coverage does not meet the 100% requirement and testers need to work around this by adding justification statements. This bears some certification risks, because it may not get accepted in call cases by assessors. Simics would help a lot to increase the resulting coverage, since it can simulate certain hardware events, which causes certain program code to be executed.
JE: That makes sense, we have seen many users use Simics to force rare events or extreme conditions to happen in order to test the code triggered by those events.
Still, one concern that people have is how you know that what you observe with Simics is relevant to the real world. What is your view on this issue?
AB: We think of Simics as a T2 tool, which supports the test or verification of the design or executable code, where errors in the tool can fail to reveal defects, but cannot directly create errors in the executable software. In part 3 of the IEC 61508 there is a section, 7.4.4, which provides guidance on how tool qualification has to be executed. The Simics setup would have to be qualified in accordance with the safety standard requirements, which would be done by Wind River and the user together.
JE: That sounds like a practical and pragmatic approach taken by the standard. It appears to imply that Simics can be selected as a coherent part of the software development activities under the IEC 61508 standard. Is anything else needed to enable this?
AB: We have had a couple of discussions with certification bodies on how to use and qualify Simics as a T2 tool. An approach to qualify the model is to execute existing processor test lists from vendors like Intel etc. on real processors and compare the results with same tests executed on a Simics model. This approach has to be mutually agreed on with certification bodies, of course. Since safety projects usually uses established accessible processors (vs completely new processors), there is real hardware to compare to, which facilitates this approach.
JE: Thank you, that was very interesting and I certainly learned something new about safety-critical systems and how Simics could be applied to their development and test.
For more reading on related topics, please see some previous blog posts:
- Cyberphysical system simulation with Simics (running a Simics simulation and a physics simulation together), which does include an element of fault injection and boundary case exploration.
- Systematically exploring software to find issues with it is a big topic for Simics. See for example the work of Ben Blum and Tingting Yu.
- Fault injection with Simics: overall concept, the work of Hyungmin Cho, and a concrete example of a serial port.
- Automatic testing is also very important to keep the tests running.