Predictable Performance is Key to Successful OpenStack Telco Deployments
It’s taken a few years, but the telecom industry seems to have finally accepted that OpenStack is viable for deployments in networks that require high levels of service availability, performance and security. The key is selecting the right OpenStack platform: you need one that’s been properly hardened, packaged and stress-tested to ensure it meets the stringent requirements of service providers.
Telco clouds based on Wind River’s Titanium Cloud architecture have been proven to address the critical challenges presented by deployments in a wide range of core, edge and access use cases.
If you’d like to learn more about how the Titanium Core and Titanium Edge virtualization platforms solve some very key problems for virtual CPE, the white paper “Overcoming OpenStack Obstacles in vCPE” provides an in-depth explanation.
For more on the vCPE topic, you might also be interested to watch the recording of our webcast with Peter Willis from BT, in which Peter discusses specific OpenStack problems identified by BT and we review how Wind River’s products have solved them.
Now it’s time to take a look at another important OpenStack topic: how to guarantee predictable performance for applications like telecom networks with performance requirements much more challenging than found in the kind of enterprise applications for which OpenStack was originally designed.
At OpenStack Summit in Boston this month, two of Wind River’s leading OpenStack experts will be presenting a paper titled “Behind the Curtains: Predictable Performance with Nova”. Ian Jolliffe and Chris Friesen will provide detailed insights not only into the techniques that enable predictable performance to be achieved but also into how to troubleshoot problems when something goes wrong. If you’re going to be at the OpenStack Summit you’ll want to check out this session and if not you can always watch the recording later.
As Ian and Chris will explain in their talk, there are five fundamental problem areas that must be addressed in order to develop an OpenStack implementation that can deliver performance that’s truly predictable: CPU, memory, PCI bus, networking and storage. (Actually at this Summit they’ll only cover the first four and you’ll have to join us at a subsequent event to learn about storage.)
The primary CPU-related challenges are all about contention. Multiple guests contending for the same resources can result in unpredictable performance for all, so various techniques must be employed to prevent this situation. These include reducing or disabling CPU overcommit, setting the correct CPU thread policy and using dedicated CPUs for each guest. CPU cache contention can be avoided by the use of an appropriate CPU thread policy and also by leveraging Intel’s Cache Allocation Technology. CPU contention between host and guest(s) should be eliminated by avoiding System Management Interrupts (SMIs) as well as by controlling CPU “stealing” by host processes and kernel threads.
There are three memory-related areas that need to be addressed to avoid performance problems. Memory contention can be avoided by reducing overcommit, by turning it off altogether or by enabling hugepages. Memory bandwidth contention can be more complicated: it’s important to ensure that each guest NUMA node is mapped to a unique host NUMA node, while also distributing Virtual Machines (VMs) across NUMA nodes to spread memory accesses. Finally, host TLB contention (in which the TLB caches a virtual page to a physical page mapping) can be minimized through the use of hugepages as well as dedicated CPUs.
PCI bus contention is eliminated by ensuring that each PCI bus is connected to a single host NUMA node, so that instances using PCI devices will be affined to the same NUMA node as the device itself.
Networking must also be considered. It’s important to avoid cross-NUMA traffic, so VMs should be located on the same NUMA as the physical switch they’re connected to, while virtual switch (vSwitch) processes should be configured with NUMA awareness of their physical NIC PCI termination and NUMA instance placement. PCI PassThrough and SR-IOV will also be impacted by crossing NUMA nodes. To avoid network bandwidth contention, all instances connecting to the same host network should use the same host NIC, which ideally should be a 10G NIC or better.
There are several approaches that should be considered in order to address limitations in host-guest network bandwidth. Emulated NICs are slow and para-virtualized NICs are faster, as long as the guest supports them. PCI PassThrough and SR-IOV are faster still because they ensure that the PCI device is “passed through” into the guest, though they require the presence of a suitable device driver in the guest, still have the overhead of virtual interrupts and can be challenging to configure initially. The fastest guest will be one that’s based on DPDK and leverages a Poll Mode Driver (PMD), though this does consume more power because the guest is running a tight loop.
Lastly, Enhanced Platform Awareness (EPA) is critical in ensuring predictable performance for the applications themselves. EPA enables optimized VM placement through techniques that include NUMA, memory requirements, NIC support, acceleration engines and hyperthreading awareness.
By implementing all these techniques and more, the Titanium Cloud architecture has been proven to deliver the level of predictable performance required for telco networks, which is one of the reasons why it’s being widely adopted as service providers transition to network virtualization.
As a leading contributor to the OpenStack community and with a major focus on solving telco-related problems in Nova, Wind River is pleased to have the opportunity to share more details of these performance techniques with the rest of the community. We hope you can join Ian and Chris for their upcoming OpenStack Summit session and if you can’t make it we encourage you to watch the recording afterwards.