LXD Networking
Posted on Fri 06 September 2019 in Containers
Starting in LXD 2.3 there has been quite a bit of network functionality exposed to the operator. As before 2.3 there is the default lxdbr0
bridge created when LXD is installed on the host system, but several other options exist.
LXD Bridges
Using the Default LXD Bridge
First things first, let's go over the default bidge and what it gives you. Out of the box, any new containers will be attached to this bridge and the host system will be configured to NAT oubound connections using the default IP address of the host system via iptables MASQUERADE rules. IP addresses will be assigned to new containers using a built-in DHCP server using the range provided during the LXD init process. If you're looking to keep things simple, just stick with the default bridge.
Creating a Custom LXD Bridge
If for some reason the default bridge does not suffice (e.g. the network namespaces need to be segregated, differing requirements for addressing or NAT behavior, etc), it is possible to create a new bridge. There are a few different options for configuring a bridge.
To create an autoconfigured bridge named lxdbr1337:
root@host# lxc network create lxdbr1337
To configure the bridge by hand:
root@host# lxc network create lxdbr1337 ipv6.address=none ipv4.address=10.13.37.1/24 ipv4.nat=true
This will create a new bridge call lxdbr1337
with IPv6 disabled and an IPv4 address of 10.13.31.1, will create MASQUERADE rules in iptables to NAT outbound traffic. The built-in DHCP server will hand out addresses to containers attached to the bridge from the range 10.13.37.0/24 (i.e. 10.13.31.2 - 254).
Attach to a Bridge
So now we have an LXD bridge or two; how do we use it?
Let's attach the container app1
's eth0 to the host's bridge lxdbr1337:
root@host# lxc network attach lxdbr1337 app1 eth0
Now let's restart the container to apply the new configuration and verify the bridge configuration on app1:
root@host# lxc restart app1
root@host# lxc info app1
... snip
devices:
eth0:
ipv4.address: 10.13.37.224
nictype: bridged
parent: lxdbr1337
type: nic
... snip
YAHTZEE
Set Static IP
Now we have a new bridge and we have successfully attached a container to that bridge. The container successfully pulled an IP address from the DHCP pool and we should be able to NAT out to the Internet. What if we want to make 100% sure that the IP address never changes? This is where static IP addresses come in. One thing to note about static IP addresses on devices connected to an LXD bridge is that they are not actually static IP addresses; they are DHCP reservations. This has a few benefits:
- It avoids IP address conflicts as the DHCP server is aware the IP address is in use
- It allows the DHCP server to pass additional options to the container (e.g. gateway, DNS, NTP, etc values).
Let's add a "static" IP address to the container app1
's eth0 interface:
root@host# lxc config device set app1 eth0 ipv4.address 10.13.37.66
Time to restart and verify the change:
root@host# lxc restart app1
root@host# lxc info app1
... snip
devices:
eth0:
ipv4.address: 10.13.37.66
nictype: bridged
parent: lxdbr1337
type: nic
... snip
This also works for IPv6 with the ipv6.address
flag, but our bridge has IPv6 disabled.
Port Security
There may be scenarios where there need to be security measures in place to avoid MAC address spoofing or to avoid forwarding traffic for other containers (e.g. nesting containers). The following will accomplish that:
root@host# lxc config device set app1 eth0 security.mac_filtering true
DNS
LXD runs a DNS server on the bridge, which allows the domain to be set via the dns.domain
network property, and supports 3 different modes (via dns.mode
):
- "managed" - one DNS record per container, matching its name and known IP addresses. This record cannot alter this record through DHCP.
- "dynamic" - containers can self-register through DHCP; the current hostname during DHCP negotiation is the domain name
- "none" - simple recursive DNS server with no local DNS records
The default mode is "managed"## External Briding
It's possible for the container to participate in the host's LAN as well using bridges outside the control of LXD, but at the expense of losing control of networking for the host through LXD itself. This means that things like DHCP, NAT, etc will also need to be handled external to LXD.
Bridging with bridgeutils
Let's start with the most basic option using bridgeutils. Bridgeutils creates an internal switch on the host and attaches the configured hosted containers and the configured physical interface. There are a lot of other options, but they are not explicitly germain to the topic at hand. Let's start by crating a new bridge:
root@host# brctl addbr lanbr
You've probably already deduced that we have created a new bridge named lanbr
. Let's verify that:
root@host# brctl show
bridge name bridge id STP enabled interfaces
lanbr 8000.000000000000 no
Nice! There aren't any physical or logical interfaces attached to the bridge yet, so it's just chillin there waiting for some friends. Let's give him a physical friend:
root@host# brctl addif lanbr enp4s0f0
This adds the physical interface enp4s0f0
to the bridge. Don't believe me? See for yourself:
root@host# brctl show lanbr
bridge name bridge id STP enabled interfaces
lanbr 8000.00e0814cbc1c no enp4s0f0
Now we're getting somewhere. We still don't have any containers attached though. The default profile to which the container was attached uses the lxdbr0 interface, so we will need to override that inherited bridge for the eth0 interface on the container. Let's do that now:
root@host# lxc config device override app1 eth0 parent=lanbr
Let's see what we're looking at now:
root@host# lxc config device show app1
eth0:
nictype: bridged
parent: lanbr
type: nic
root@host# brctl show lanbr
bridge name bridge id STP enabled interfaces
lanbr 8000.00e0814cbc1c no enp4s0f0
vethce92439e
Awesome! So to recap, we have now created our bridge lanbr
using the brctl interface to bridgeutils, added a physical interface to the bridge, then added a container to the bridge. There are a ton of other options available through bridgeutils like VLAN tagging, loop avoidance through STP, and more, but that goes way outside the scope of this discussion. Poke around and see what you can do!
Bridging with macvlan
Say you don't need all the fancy knobs associated with bridgeutils and just want your containers to be able to interact directly with the LAN without worry of a layer 2 loop. Enter macvlan. Macvlan creates a trivial bridge and attach subinterfaces to that bridge. Subinterfaces use the naming convention of subint@int: e.g. updog@enp3s0
, where updog
is an arbitrary interface name and enp3s0
is the parent interface to which the bridge is bound.
root@host# lxc config device set app1 eth0 nictype=macvlan
root@host# lxc config device set app1 eth0 parent=enp4s0f0
root@host# lxc config device show app1
eth0:
nictype: macvlan
parent: enp4s0f0
type: nic
Restart the container and you'll be in business. There are a points of clarification that need to be made about macvlan to avoid pulling your hair out:
- Containers will not be able to communicate directly with the host via the physical interface to which the macsubinterface is bound This means that if a container is using the hosts physical enp3s0 as a macvlan parent, the container will be limited in the ways in which the container can interact with the host; you will need a second physical interface up and running.
- Containers attached to a physical interface using macvlan share the fate of that interface. If there are 10 containers on a host all using the same parent interface and that interface goes down, the entire bridge goes down. You may expect that inter-container traffic to continue uninterupted, but that bridging is performed by hairpinning traffic through the physical interface. Kind of hard to do when the interface is down!
Bridging with openvswitch
Another option to provide host bridging is openvswitch. openvswitch is a virtual switch that comes with a whole host of nerd knobs you can tweak to your heart's content. You can even go buck wild by attaching vswitches from multiple hosts to an SDN controller (read: openflow). For this sake of brevity we will stick to creating a bridge and adding our container to the bridge tagged for a specified vlan.
Configuration
root@host# ovs-vsctl add-br ctr-bridge
root@host# ovs-vsctl add-port ctr-bridge enp4s0f1
root@host# ovs-vsctl add-br vlan409 ctr-bridge 409
root@host# lxc config device override app1 eth0 parent=vlan409
Explanation
Here we've created a bridge called ctr-bridge
and assigned host interface enp4s0f1
to the bridge. Next we've created a dummy bridge called vlan409
that is tied to the parent bridge ctr-bridge
and is tagged for, you guessed it, vlan 409. By attaching enp4s0f1
to the parent bridge, it will act as a VLAN trunk as 802.1q behavior is the default in ovs vsiwtches. Any other devices also connected to the vlan409
bridge will also be tagged as vlan 409.
Tunnelling
At this point we've configured bridging a few different ways. Let's say we have configured additional hosts and we containers on those hosts to be able to talk to each other without any bridging to the external network or NAT-ing using an LXD bridge. This is where tunelling comes into play. If you've worked with networking in any way int the past... oh, say 10 years... it's probably safe to assume you've heard of overlay networking. This is overlay networking. BUZZWORDS!
GRE Tunnelling
Generic Routing Encapsulation (GRE) provides a method to allow two networks to talk to each other that would not be able to otherwise. GRE takes the packet to be transmitted and wraps it inside a new header (i.e. encapsulates it) specifying the local router (in this case our local LXD host) as the source and the remote router (i.e. our remote LXD host) as the destination. When the GRE packet is received, it is decapsulated and routed as usual. This is very similar to VPNs, but it should be noted that GRE does NOT encrypt the payload like true VPNs (ipsec, openvpn, sslvpn, etc). If multiple hosts are participating via GRE, each endpoint will need to be configured on every other host.
Configuration
On host1:
root@host1# lxc network set br0 tunnel.host2.protocol gre
root@host1# lxc network set br0 tunnel.host2.id 10
root@host1# lxc network set br0 tunnel.local 11.11.11.11
root@host1# lxc network set br0 tunnel.remote 99.99.99.99
On host2:
root@host2# lxc network set br0 tunnel.host1.protocol gre
root@host2# lxc network set br0 tunnel.host1.id 10
root@host2# lxc network set br0 tunnel.local 99.99.99.99
root@host2# lxc network set br0 tunnel.remote 11.11.11.11
Explanation
On each host we're configuring the parameters for the other side along with the local IP address. This will allow the two hosts to understand the routing and be able to properly return the traffic.
VXLAN Overlays
While GRE allows two networks to communicate where end-to-end communication wouldn't otherwise be possible, VXLAN instead allows those two networks to act as a single network. This means that a container on one host could use the other host as a gateway, or containers could be moved between the hosts without needing to change their IP addresses. This is a huge win when it comes to trying to cluster LXD hosts. There are two options when it comes to VXLAN networking in LXD: unicast and multicast. Unicast is going to look a lot like the GRE configuration above - we will explicitly define each endpoint. There is also an option to use multicast. The benefit of using multicast is that it will work without specifying each individual endpoint with one major caveat: they ALL have to be on the same layer 2 network. If you have a host in one datacenter and a host in another datacenter, multicast is not going to work (unless there are some major layer 2 shenanigans going on between datacenters).
Unicast
If the hosts do not have L2 connectivity between them, it is possible to create the tunnel in unicast mode.
On host1:
root@host1# lxc network set br0 tunnel.host2.protocol vxlan
root@host1# lxc network set br0 tunnel.host2.id 10
root@host1# lxc network set br0 tunnel.local 11.11.11.11
root@host1# lxc network set br0 tunnel.remote 99.99.99.99
On host2:
root@host2# lxc network set br0 tunnel.host1.protocol=vxlan tunnel.host1.id=10 tunnel.host1.local=99.99.99.99 tunnel.host2.remote=11.11.11.11
root@host2# lxc network attach-profile br0 default eth0
Multicast
In this example, we can configure host1 to route all of the traffic and allow containers to communicate between hosts, then configure host2 to tunnel all traffic to host1. Since this is multicast, it is dependent on L2 connectivity between the hosts.
First create the bridge on host1:
root@host1# lxc network create br0 tunnel.lan.protocol=vxlan
Now attach the bridge to eth0 in the default profile:
root@host1# lxc network attach-profile br0 default eth0
Then create the bridge on host2:
root@host2# lxc network create br0 tunnel.lan.protocol=vxlan ipv4.address=none ipv4.address=none tunnel.lan.protocol=vxlan
Attach the bridge to eth0 in the default profile:
root@host2# lxc network attach-profile br0 default eth0
Fan Networking
Fan networking is another overly option. Without going into too much detail, a fan consists of an underlay network with a /16. The hosts in the underlay network pull IP addresss from the /16 and each provide a /24 from the same /8 network. For example, the fan may use an underlay network 172.21.0.0/15
and the overlay network 10.0.0.0/8
. The hosts are assigned addresses from the /16, so lets say host1 has the address 172.21.3.15 and host2 has the address 172.21.5.20. host1
will then assign containers addresses from the range 10.3.15.0/24
and host2
will assign containers the range from 10.5.20.0/24
. Notice the lower two octets of the underlay address are used as the second and third octets for the hosts range. Containers can move around to different hosts, but traffic will always route out through the original host. There's a lot to unpack when it comes to fan networking and we're only going to cover the basics, so if you're interested in learning more about fan networking, I would suggest giving the fan networking docs a read.
Configuration
First we need to create the fan on our hosts:
root@host1# fanctl up -o 10.0.0.0/8 -u 172.21.3.5/16
root@host2# fanctl up -o 10.0.0.0/8 -u 172.21.5.20/16
Now lets verify the fans are configured:
root@host1# fanctl show
Bridge Underlay Overlay Flags
fan-10 172.21.3.5/16 10.0.0.0/8
root@host2
Bridge Underlay Overlay Flags
fan-10 17.21.5.20/16 10.0.0.0/8
Now that we have the fan interfaces defined we can update our default bridge to use the fans to overlay:
root@host1# lxc profile device set default eth0 parent=fan-10
root@host1# lxc profile device set default eth0 mtu=1498
root@host2# lxc profile device set default eth0 parent=fan-10
root@host2# lxc profile device set default eth0 mtu=1498
Explanation
On each host we have created a fan bridge with 10.0.0.0/8
as the overlay network and 172.21.0.0/16
as the underlay network. This makes the asusmption that both hosts have interfaces configured with addresses from the same /16 and that they are on the same layer 2 network segment. Once the fan has been verified, the parent device for the containers eth0
can be set to the newly-created fan bridge and lower the MTU to account for encapsulation overhead (to prevent unnecessary fragmentation).