High Performance Software Switching


We are hiring!


VALE is a software switch that interconnects physical or virtual network interfaces, similar to what the Linux bridge module or Open vSwitch do. Unlike these other software switches, VALE achieves:

  • High throughput with low CPU usage using the netmap API
  • Scalability to hundreds of ports through a novel packet forwarding algorithm
  • Modular packet processing (e.g., forwarding packets based on header fields), implemented as a loadable kernel module

As a result, VALE can act either as a stand-alone, high performance switch (including for SDN networks), or as a scalable and flexible virtualization back-end. Note that virtual ports are accessed from user space using the netmap API: they can be virtual machines (e.g., QEMU instances, XEN domains), or generic, netmap-enabled applications.

VALE architecture. A switch supports a large number of ports that can attach to virtual machines or processes, physical ports, or to the host’s network stack.


With mSwitch we have made important contributions to the VALE’s basic design and implementation, including:

  • The ability to connect NICs directly to the switch.

  • Increasing the maximum number of ports on the switch from 64 to 255 in order to accommodate a potenially large number of VMs.

  • Modifying the switch so that its switching logic is modular. Third-party kernel modules can easily extend the switch by implementing their own lookup functions.

  • Replacing the original bitmap-based algorithm with one that scales to a large number of ports.


We compare existing software switches on a server with a Xeon E5-2695 CPU running at 3.2 Ghz, 32 GB quad channel DDR3-1600 RAM and an Intel X520-T2 dual-port 10 Gbps NIC.


In this experiment we forward packet from one NIC port to the the other one. VALE can forward packets at 10 Gbps line rate except for the shortest packets (10.28 Mpps out of 14.88 Mpps line rate).

This rate is slightly lower than what Intel DPDK vSwitch achieves (11.90 Mpps for 60 byte packets). However, VALE achieves much higher scalability in terms of ports, and achieves these rates while consuming considerably less CPU cycles.


Since VALE adopts an interrupt-based model, it does not waste CPU cycles on idle ports. We run an experiment that forwards packet from a single NIC port to a number of virtual ports in a round-robin fashion, assigning CPU cores to each of these virtual ports. We plot throughput and cummulative CPU usage among all the CPU cores.

With our contributions, VALE now has improved throughput scalability in the presence of many ports. We plot the cumulative throughput and CPU utilization of the original VALE versus the current one.