Internet of Things Automatic Testing – Using Simulation
Developing and testing Internet of Things applications and systems is a big challenge, since the systems are simply big – they contain a lot of units and they need a large space. And the bigger something is, the harder it is to get it into the software development lab for testing. When developing software that will run on a hundred or a thousand low-power sensor nodes, just how do you practically test that software in the daily workflow? Simulating the systems is a good answer.
The IoT systems that are being built today often follow this template:
There is a wireless mesh network consisting of hundreds of sensor nodes, connected to one or a few much more powerful gateway nodes (which can be powered by Wind River software) that connect to the Internet as well as to each other – redundancy is often necessary to obtain a reliable system. The gateways protect the wireless network from the outside world, and collect data from the sensor nodes that they forward to the server for processing and storage. There are often control messages going into the wireless network from the gateways, as well.
How do you manage an experiment and test setup for a system like this in the physical world? You want to have the wireless nodes spread out over a large area so that not all are in contact with each other, which requires using entire buildings or campuses as the “lab”. Setting up and maintaining such a network is a significant amount of work, with labor costs quickly dwarfing the cost of the nodes themselves. Having little nodes attached to furniture, ceilings, and walls all over an office building is not necessarily popular with co-workers and other development groups, and prone to accidents involving cleaning or moves of other equipment or furniture.
When I was a PhD student, a research group working with mesh networks had a clever solution to this problem. They equipped the students they were teaching with free laptops (which were quite rare back then), running their experimental mesh network software stack. In this way, they got quite a few nodes to spread out across campus, providing a nice test setup. This is not a solution that is likely to work today in professional industry. However, simulation is.
In a simulator, setting up a large network is really easy. You just write a program to virtually deploy and spread out the nodes over the virtual space you need, and then model the wireless reachability between the nodes (more on this below). Instead of manually handling and maintaining hundreds of physical items, you manage a single script. The simulated system would usually contain the wireless nodes and the gateways (since they are also part of the wireless mesh network), and sometimes the server or command center. Often, however, it makes more sense to leave the server in the real world, and connect to it using Simics real-world networking. The resulting Simics system setup would look something like this:
Note that each sensor node in the mesh network is a fully simulated node, just like the gateways. Simics simulates the hardware, with processor, memory, timers, LEDs, and wireless radio. Often, there is a serial port connected from the wireless unit to a sensor that receives the data to be passed along across the network. The simulated hardware runs the real embedded OS and target application, using the same binary as would run on the real hardware. This allows for testing of the entire system, including self-organizing mesh network algorithms, the integration of the code reading sensors with the wireless communications system, and the effect of nodes sleeping to save power. Deploying code on the nodes can be expedited using Simics backdoors; at the very least, Simics makes it very easy to change the software image stored in FLASH memory on the nodes.
With the entire network encapsulated inside of Simics, we can apply parallel and automatic testing, just like I wrote about in a previous blog post. Using a set of servers, multiple virtual IoT networks can be running in parallel, each with its own scenario or parameter set. Simics can provide many additional networks to augment the few (often a single one) physical networks typically available for IoT application testing.
If issues are discovered, they can be encapsulated in a session checkpoint, and passed back to development for analysis and fixing. This flow was discussed in my previous blog post on continuous integration and Simics, and it applies just as well to a single machine as to a network of hundreds of machines. Once the system is encapsulated inside of Simics, checkpoints and be used to pass back test results.
But can it really work in practice to simulate hundreds or thousands of nodes? As far as we have seen so far, it definitely does.
IoT sensor nodes often have a very low duty cycle. The sensors do not sense the world continuously, but rather wake up regularly to take a sample and report it. Each sample run might take a second or just a few milliseconds, and then the system can be idle for minutes or even hours. This saves power, and makes it possible to have nodes deployed in the real world for extended periods of time without having to service them to change batteries.
Thus, there is a large amount of idle time in the system, idle time that can be exploited to accelerate the simulation. Simics has always used hypersimulation or idle-loop optimization, where the simulator skips ahead in time to the next interesting event. Rather than playing out idle time cycle by cycle, Simics jumps straight to the point of the next external stimuli or internal interrupt that happens in a powered-down or idle node. Simics can do “nothing” very quickly indeed, and that means that a system that is mostly idle can be simulated many times faster than real time.
Hypersimulation makes it possible to both scale up the system simulation to encompass hundreds of nodes, and to do so while still executing faster than real time. Testing that a lightly loaded system stays stable over time is thus much faster on Simics than doing it in the real world, making it possible to regularly perform automated long-term tests. Note that the time in the simulator is still the same, it is just that the simulator speeds up the execution of the idle part. From the perspective of the code running on the machines in the simulation, hypersimulation is invisible and does not affect its execution semantics at all. Repeatability is maintained.
I actually tested this almost ten years ago, when I set up a network of one thousand mostly-idle sensor nodes in Simics as a research project. The result surpassed my expectations, in that it actually ran faster than the real world. And this was on a single-core fairly slow laptop computer with Windows XP32-bit. Today, with wide multicore hosts and a parallel Simics simulator, this should run an order of magnitude faster. Technology is moving ahead, and Simics is moving with it.
IoT systems are operating in a noisy, hostile, and difficult real-world environment. Thus, testing for robustness, fault tolerance, and reliability in the presence of network connection difficulties and individual nodes crashing is a necessary part of building IoT systems. This can be very hard to do in the real world, since controlling the environment and radio network connectivity is technically challenging, and reliably repeating tests is very challenging. With Simics, you can directly inject faults into the system, as I have discussed a few times before. You can also vary the reachability of a wireless network – Simics lets you set the signal strength values between any two nodes to any value. How these values are determined is entirely up to the user, the domain expert on the types of radio reachability issues seen in the real world.
In the picture above, we show some simple examples of reachability. Node pairs that are close to each other have high signal strength, while node pairs that are further apart have lower signal strength. If a node cannot be reached from some other node, the signal strength for the pair is set to zero, which is interpreted as cannot reach (note that signal strength numbers are really arbitrary value, but we use 100 for perfect and zero for nonreachable). The signal strength parameters can be changed at any point during the simulation (Simics simulations can be reconfigured arbitrarily at runtime), allowing the simulation of changing real-world conditions and unexpected interruptions such as a train passing across the line of sight between two nodes.
In the end, physical labs are needed to test the interaction with the real world, radio behavior, and other aspects not captured in the simulation. As always, you have to eventually test what you ship and ship what you test. But on the way there, Simics simulations can be used to test more, test faster, perform automated regression testing, do fault injection, collaborate, communicate, and make software development run smoother.
For more on Simics, visit here.