One aspect of open infrastructure that makes it an exciting field to work in
is that it is a continually changing landscape. This is particularly true in
the arena of high performance networking.
Open Infrastructure
The concept of Open Infrastructure may not be familiar to all. It
seeks to replicate the open source revolution in compute, exemplified
by Linux, for the distributed management frameworks of cloud
computing. At its core is OpenStack, the cloud operating system
that has grown to become one of the most popular open source projects
in the world.
Kayobe is an open source project
for deploying and operating OpenStack in a model that packages all
OpenStack’s components as containerised microservices and orchestrates
the logic of their deployment, reconfiguration and life cycle using
Ansible.
Mellanox VF-LAG: Fault-tolerance for SR-IOV
In a resilient design for a virtualised data centre, hypervisors
use bonded NICs to provide network access for control plane data
services and workloads running in VMs. This design provides
active-active use of a pair of high-speed network interfaces, but
would normally exclude the most demanding network-intensive use
cases.
Mellanox NICs have a feature, VF-LAG, which claims to enable SR-IOV
to work in configurations where the ports of a 2-port NIC are bonded
together.
In NICs that support it, VF-LAG uses the same technology underpinning
ASAP2 OVS hardware offloading;
much of the process for creation of VF-LAG configurations is common
with ASAP2.
System Requirements
- VF-LAG requires Mellanox ConnectX-5 (or later) NICs.
- VF-LAG only works for two ports on the same physical NIC. It cannot
be used for LAGs created using multiple NICs.
- Open vSwitch version 2.12 or later (the Train release of Kolla shipped
with Open vSwitch 2.12).
Using VF-LAG interfaces in VMs
In common with SR-IOV and OVS hardware-offloaded ports, ports using VF-LAG and OVS hardware
offloading must be created separately and with custom parameters:
openstack port create --network $net_name --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' $hostname-vflag
A VM instance can be created specifying the VF-LAG port. In this example, it is one of
two ports connected to the VM:
openstack server create --key-name $keypair --image $image --flavor $flavor --nic net-id=$tenant_net --nic port-id=$vflag_port_id $hostname
The VM image should include Mellanox NIC kernel drivers to use the VF LAG interface.
Troubleshooting: Is it Working?
Check that the Mellanox Ethernet driver is managing the LAG correctly:
# dmesg | grep 'mlx5.*lag'
[ 44.064025] mlx5_core 0000:37:00.0: lag map port 1:2 port 2:2
[ 44.196781] mlx5_core 0000:37:00.0: modify lag map port 1:1 port 2:1
[ 46.491380] mlx5_core 0000:37:00.0: modify lag map port 1:2 port 2:2
[ 46.591272] mlx5_core 0000:37:00.0: modify lag map port 1:1 port 2:2
Check that the VFs have been created during bootup:
# lspci | grep Mellanox
5d:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
5d:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
5d:00.2 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:00.3 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:00.4 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:00.5 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:00.6 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:00.7 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.2 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.3 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.4 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.5 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.6 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:01.7 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
5d:02.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
Data about VFs for a given NIC (PF) can also be retrieved using ip link (here for ens3f0):
# ip link show dev ens3f0
18: ens3f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
link/ether 24:8a:07:b4:30:8a brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 1 link/ether 3a:c0:c7:a5:ab:b2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 2 link/ether 82:f5:8f:52:dc:2f brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 3 link/ether 3a:62:76:ef:69:d3 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 4 link/ether da:07:4c:3d:29:7a brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 5 link/ether 7e:9b:4c:98:3b:ff brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 6 link/ether 42:28:d1:6a:0d:5d brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 7 link/ether 86:d2:c8:a4:1b:c6 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
The configured number of VFs should also be available via sysfs (shown here for NIC eno5):
cat /sys/class/net/eno5/device/sriov_numvfs
8
Check that the Mellanox NIC eSwitch has been put into switchdev mode, not legacy mode
(use the PCI bus address for the NIC from lspci, here 37:00.0):
# devlink dev eswitch show pci/0000:37:00.0
pci/0000:37:00.0: mode switchdev inline-mode none encap enable
Check that tc hardware offloads are enabled on the physical NICs and also the representor ports
(shown here for a NIC ens3f0 and a representor eth0):
# ethtool -k ens3f0 | grep hw-tc-offload
hw-tc-offload: on
# ethtool -k eth0 | grep hw-tc-offload
hw-tc-offload: on
Check that Open vSwitch is at version 2.12 or later:
# docker exec openvswitch_vswitchd ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.12.0
DB Schema 8.0.0
Check that Open vSwitch has hardware offloads enabled:
# docker exec openvswitch_vswitchd ovs-vsctl get Open_vSwitch . other_config:hw-offload
"true"
Once a VM has been created and has network activity on an SR-IOV interface,
check for hardware-offloaded flows in Open vSwitch. Look for offloaded flows coming in
on both bond0 and on the SR-IOV VF:
# docker exec openvswitch_vswitchd ovs-appctl dpctl/dump-flows --names type=offloaded
in_port(bond0),eth(src=98:5d:82:b5:d2:e5,dst=fa:16:3e:44:44:71),eth_type(0x8100),vlan(vid=540,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:29, bytes:2842, used:0.550s, actions:pop_vlan,eth9
in_port(eth9),eth(src=fa:16:3e:44:44:71,dst=00:1c:73:00:00:99),eth_type(0x0800),ipv4(frag=no), packets:29, bytes:2958, used:0.550s, actions:push_vlan(vid=540,pcp=0),bond0
Watch out for kernel errors logged of this form as a sign that offloading is not applying successfully:
2020-03-19T11:42:17.028Z|00001|dpif_netlink(handler223)|ERR|failed to offload flow: Operation not supported: bond0