Multi-Chassis LAG (MC-LAG)¶
Multi-Chassis LAG is a technology that allows two independent systems to present themselves as a single system while still operating indepedently. This is in contrast to Virtual Chassis, which makes multiple independent systems operate as a single system. At the highest level, MC-LAG works by forcing the two upstream devices to use the same LACP System ID, which makes the downstream device think that the upstream devices are just one device.
Before diving into the specifics of an MC-LAG configuration, it is critical to know which settings must match and which settings must be unique between peers. The following section describes which must match and which must be unique [1].
Must match:
Must be unique:
- MCAE Chassis ID
- MCAE Status Control
- Local ICCP IP
- Peer ICCP IP
- MC-LAG Protection
Finally, a diagram is worth a thousand words. If you have the Juniper MX Series book from O’Reilly [4], I can highly recommend Figure 9-4 as a reference. I won’t reproduce it here because that would be incredibly rude and inconsiderate, so please pick up that book if you can. Chapter 9, MC-LAG, is definitely worth it! Just in case you have a Safari subscription, here’s a link straight to the diagram. But again, definitely read the entire chapter. It’s absolutely worth it.
Note
I’m hoping the authors don’t mind me including a link directly to the diagram in the Safari Online book. But if you’re one of the authors or the publisher and want that removed, please reach out to me and let me know and I will gladly remove it.
Terminology¶
This section explains some of the terminology and configuration elements listed above. [2]
Service ID¶
The Service ID must match between two ICCP peers. The Service ID exists for use cases where MC-LAG is used in more than one routing instance. This ID is global.
Redundancy Group ID¶
The Redundancy Group ID must match between two ICCP peers. It contains a grouping of MC-LAG bundles and their associated VLANs. It is useful because it allows the ICCP peers to send an update once instead of sending an update for each MC-LAG. For example, when a MAC address is learned, the MAC only needs to be sent to the ICCP peer one time for that redundancy group instead of multiple times for each MC-LAG in the group. This ID is not required to be global, but can encompass one or more MC-LAG bundles.
Note
In vQFX 18.1R1.9, on which this document is based, there can only be
one Redundancy Group ID configured. Although a list is supported,
the commit check
will fail with an error if more than one is
provided. This is probably because only one bridge domain is
supported, and a Redundancy Group is a broadcast domain. In the MX
router, multiple Redundancy Group IDs are supported.
MCAE ID¶
The Multi-Chassis Aggregated Ethernet Identifier must match between peers. This number must be unique per MC-LAG.
MCAE Chassis ID¶
The MCAE Chassis ID uniquely identifies a chassis. Therefore, this ID must be unique between ICCP peers.
LACP System ID¶
This ID is used by LACP to identify the system to which a remote belongs. This ID must match between peers. It is used to “trick” the remote LACP peer into thinking it is talking to only one system.
LACP Admin Key¶
This is used in conjunction with the LACP System ID to uniquely identify an LACP peer. It must match between peers.
MCAE Mode¶
active-active
or active-standby
. Must match between peers.
Note
Only active-active
is supported on the QFX and EX Series.
However, both active-active
and active-standby
are supported
on the MX series with MPCs. active-active
with a DPC on an MX
is not supported; you must use active-standby
in those scenarios.
[3]
Status Control¶
Controls whether the node will be the active or standby node if ICCP
fails. Only two options exist: active
or standby
. One node
must be active
and the other must be standby
. The standby
node will change its LACP System ID.
MC-LAG Protection¶
Multi-chassis Link Protection ensures the appropriate behavior of the
node configured as Status Control standby
when the ICL goes
down. For example, if the ICL goes down, should the standby
node
change its LACP System ID, thereby taking its MC-LAG link down,
or should it leave it as-is, making it actively receive traffic? Link
Protection uses an out-of-band mechanism to determine if the remote ICCP
peer is still up or not. If the ICL is down but the remote ICCP peer is
still up, then the standby
node will change its
LACP System ID. However, if the ICL is down and the remote ICCP
peer is also down, then the standby
node will not change its
LACP System ID. This will ensure that there is no outage.
Guidelines¶
The follow sections describe high-level, general guidelines when deploying MC-LAG. None of these recommendations are hard-and-fast rules, but they should be taken into consideration with any MC-LAG deployment.
Inter-Chassis Link Redundancy¶
Although not required, it is strongly recommended that the ICL consist of at least two links. This ICL link needs to have VLANs trunked across it. It needs to have the downstream VLANs trunked plus the ICCP VLAN. The reason for trunking the downstream VLANs is in case one of the MC-LAG links goes down and traffic needs to go between the switches.
Compensating for Lack of LACP support¶
Generally speaking, your downstream LACP bundle will be a
single-interface LAG per upstream device. If the downstream device does
not support LACP, you can use the force-up
flag. However, if the
downstream device does support LACP, it is strongly recommended to use
LACP and leave the force-up
option off.
Layer 3 Connectivity¶
For simple layer 3 connectivity, such as when the downstream device is a
server, both of the upstream MC-LAG peers can have their gateways
configured with the same IP address when mcae-mac-synchronization
is
enabled on the VLAN. Unlike VRRP, this will keep traffic local to the
switch that receives the packet. This works by allowing one upstream
switch to respond with the peer switch’s MAC address. The end result is
that each upstream switch treats a packet as if it is its own.
mcae-mac-synchronization
in this scenario is required because,
without it, ARP requests are not sniffed or replicated by the peers.
Note
DHCP Relay is not supported with mcae-mac-synchronization
. If
a DHCP Relay is required, you must use VRRP.
When routing protocols are required with an MC-LAG,
mcae-mac-synchronization
is no longer an option. This is because
the MAC synchronization option works similarly to an anycast service,
but routing protocols require 1:1 direct relationships. For this
reason, when protocol adjacency is required (such as OSPF between the
aggregation and core layers), VRRP is required. In these instances, the
devices should peer to the VRRP VIP, not the real IP. To compensate for
potential issues, a static ARP address for the remote peer’s IRB MAC
and real IP should be configured.
Spanning Tree Protocol¶
Although the goal of MC-LAG is to remove the need for a Spanning Tree Protocol, it is highly recommended that STP is enabled to prevent loops caused by miswiring as well as to prevent unintentional propagation of BPDUs.
When configuring STP, disable STP on the ICL. This is because STP could
cause the traffic on the ICL to be blocked, thereby breaking the
MC-LAGs. Configure all MC-LAG interfaces as edge
. A downstream
device should not be able to cause a loop in a properly designed
network. Turn on bpdu-block-on-edge
. This is to prevent malicious
or unintentional BPDU propagation.
Note
Do not configure MSTP or VSTP. This can cause loops when not configured appropriately on all devices, including downstream devices. [3]
Configuration¶
In this example, there are four vQFX switches: vQFX-1 through vQFX-4.
vQFX-1 and vQFX-2 are our MC-LAG head-end devices. Their ICL is
ae0
, which consists of members xe-0/0/0
and xe-0/0/1
.
They run an MC-LAG down to vQFX-3 on ae1
and to vQFX-4 on ae2
.
ae1
consists of xe-0/0/2
on vQFX-1 and xe-0/0/4
on vQFX-2.
ae2
is made up of xe-0/0/3
on vQFX-1 and xe-0/0/5
on vQFX-2.
The interface numbers are the same on vQFX-3 and vQFX-4 as their
upstream devices with the exception of the bundle interface. On vQFX-3
and vQFX-4, that is ae0
. You can see this below in the
show lldp neighbors
output for each device:
root@vQFX-1> show lldp neighbors
Local Interface Parent Interface Chassis Id Port info System Name
xe-0/0/0 ae0 02:05:86:71:68:00 xe-0/0/0 vQFX-2
xe-0/0/1 ae0 02:05:86:71:68:00 xe-0/0/1 vQFX-2
xe-0/0/2 ae1 02:05:86:71:bc:00 xe-0/0/2 vQFX-3
xe-0/0/3 ae2 02:05:86:71:c5:00 xe-0/0/3 vQFX-4
{master:0}
root@vQFX-2> show lldp neighbors
Local Interface Parent Interface Chassis Id Port info System Name
xe-0/0/4 ae1 02:05:86:71:bc:00 xe-0/0/4 vQFX-3
xe-0/0/5 ae2 02:05:86:71:c5:00 xe-0/0/5 vQFX-4
xe-0/0/1 ae0 02:05:86:71:fa:00 xe-0/0/1 vQFX-1
xe-0/0/0 ae0 02:05:86:71:fa:00 xe-0/0/0 vQFX-1
{master:0}
root@vQFX-3> show lldp neighbors
Local Interface Parent Interface Chassis Id Port info System Name
xe-0/0/4 ae0 02:05:86:71:68:00 xe-0/0/4 vQFX-2
xe-0/0/2 ae0 02:05:86:71:fa:00 xe-0/0/2 vQFX-1
{master:0}
root@vQFX-4> show lldp neighbors
Local Interface Parent Interface Chassis Id Port info System Name
xe-0/0/5 ae0 02:05:86:71:68:00 xe-0/0/5 vQFX-2
xe-0/0/3 ae0 02:05:86:71:fa:00 xe-0/0/3 vQFX-1
{master:0}
The MC-LAG to vQFX-3 provides normal layer 3 gateway services. We use MAC address synchronization here. The MC-LAG toward vQFX-4, however, runs OSPF. For this, we remove MAC address synchronization and implement VRRP with static ARP bindings. See Layer 3 Connectivity for more details about this.
The final test will be pinging between vQFX-3 and vQFX-4’s ae0
interfaces.
Note
I don’t have Visio or Omnigraffle, and I found making diagrams in anything else highly frustrating. So for now, there are no diagrams. :( However, if you’d like to contribute some, they would be greatly appreciated!
The very first thing to do with any LAG in Junos is to set the number of LAG interfaces:
# qfx1
chassis {
aggregated-devices {
ethernet {
device-count 3;
}
}
}
# qfx2
chassis {
aggregated-devices {
ethernet {
device-count 3;
}
}
}
# vQFX-3
chassis {
aggregated-devices {
ethernet {
device-count 1;
}
}
}
# vQFX-4
chassis {
aggregated-devices {
ethernet {
device-count 1;
}
}
}
Next, we configure the Service ID on the upstream devices:
# qfx1
switch-options {
service-id 1;
}
# qfx2
switch-options {
service-id 1;
}
Next, we’ll configure the ICL. For redundancy, we’ll configure the ICL as an LACP bundle:
# qfx1
interfaces {
xe-0/0/0 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/1 {
ether-options {
802.3ad ae0;
}
}
ae0 {
aggregated-ether-options {
lacp {
active;
periodic slow;
}
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members all;
}
}
}
}
}
# qfx2
interfaces {
xe-0/0/0 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/1 {
ether-options {
802.3ad ae0;
}
}
ae0 {
aggregated-ether-options {
lacp {
active;
periodic slow;
}
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members all;
}
}
}
}
}
Note
If you have a device with different line cards/slots/ASICs, consider spreading your bundle members across them for greater resiliency.
Now that the ICL has its layer 1 and layer 2 configuration done, we need to ensure it can route packets for ICCP. Since ICCP only requires TCP/IP connectivity to establish, this could be done between loopback interfaces. However, for the example, we’ll just use an IRB interface.
# qfx1
interfaces {
irb {
unit 100 {
family inet {
address 10.0.0.0/31;
}
}
}
}
vlans {
v100 {
vlan-id 100;
l3-interface irb.100;
}
}
# qfx2
interfaces {
irb {
unit 100 {
family inet {
address 10.0.0.1/31;
}
}
}
}
vlans {
v100 {
vlan-id 100;
l3-interface irb.100;
}
}
Next, we need to configure ICCP:
# qfx1
protocols {
iccp {
local-ip-addr 10.0.0.0;
peer 10.0.0.1 {
session-establishment-hold-time 50;
redundancy-group-id-list 100;
liveness-detection {
minimum-receive-interval 300;
transmit-interval {
minimum-interval 300;
}
}
}
}
}
# qfx2
protocols {
iccp {
local-ip-addr 10.0.0.1;
peer 10.0.0.0 {
session-establishment-hold-time 50;
redundancy-group-id-list 100;
liveness-detection {
minimum-receive-interval 300;
transmit-interval {
minimum-interval 300;
}
}
}
}
}
Note that we’re configuring BFD for faster failure detection in this
example. By default, this will create a new BFD session that runs on
the control plane. However, if your platform supports it and you are
not running ICCP via loopback interfaces (and ICCP is using IPs that are
directly connected), you can push this down to the PFE with the
set protocols iccp peer <ip> liveness-detection single-hop
command.
Note
In production, a best practice would be to configure ICCP between the loopback interfaces to ensure the ICCP session stays up, even if the ICL is down.
At this point, everything has been done such that ICCP should come up.
This can be verified with the show iccp status
command:
root@vQFX-1> show iccp
Redundancy Group Information for peer 10.0.0.1
TCP Connection : Established
Liveliness Detection : Up
Redundancy Group ID Status
100 Up
Client Application: lacpd
Redundancy Group IDs Joined: None
Client Application: MCSNOOPD
Redundancy Group IDs Joined: None
Client Application: l2ald_iccpd_client
Redundancy Group IDs Joined: None
{master:0}
root@vQFX-2> show iccp
Redundancy Group Information for peer 10.0.0.0
TCP Connection : Established
Liveliness Detection : Up
Redundancy Group ID Status
100 Up
Client Application: lacpd
Redundancy Group IDs Joined: None
Client Application: MCSNOOPD
Redundancy Group IDs Joined: None
Client Application: l2ald_iccpd_client
Redundancy Group IDs Joined: None
{master:0}
Before going on, we have two minor things to finish up: configuring MC-LAG Protection and configuring Spanning Tree Protocol:
# qfx1
multi-chassis {
multi-chassis-protection 10.0.0.1 {
interface ae0;
}
}
protocols {
rstp {
interface ae0 {
disable;
}
interface all {
mode point-to-point;
}
bpdu-block-on-edge;
}
}
# qfx2
multi-chassis {
multi-chassis-protection 10.0.0.0 {
interface ae0;
}
}
protocols {
rstp {
interface ae0 {
disable;
}
interface all {
mode point-to-point;
}
bpdu-block-on-edge;
}
}
All that’s left to finish our MC-LAG configuration is to start bringing up MCAEs! First, we’ll work on the MC-LAG toward vQFX-3.
# qfx1
interfaces {
xe-0/0/2 {
ether-options {
802.3ad ae1;
}
}
ae1 {
aggregated-ether-options {
lacp {
active;
periodic slow;
system-id 00:00:00:11:11:11;
admin-key 1;
}
mc-ae {
mc-ae-id 1;
redundancy-group 100;
chassis-id 0;
mode active-active;
status-control active;
init-delay-time 15;
}
}
unit 0 {
family ethernet-switching {
interface-mode access;
vlan {
members v1000;
}
}
}
}
irb {
unit 1000 {
family inet {
address 192.168.100.0/31;
}
}
}
protocols {
rstp {
interface ae1 {
edge;
}
}
vlans {
v1000 {
vlan-id 1000;
l3-interface irb.1000;
mcae-mac-synchronize;
}
}
# qfx2
interfaces {
xe-0/0/4 {
ether-options {
802.3ad ae1;
}
}
ae1 {
aggregated-ether-options {
lacp {
active;
periodic slow;
system-id 00:00:00:11:11:11;
admin-key 1;
}
mc-ae {
mc-ae-id 1;
redundancy-group 100;
chassis-id 1;
mode active-active;
status-control standby;
init-delay-time 15;
}
}
unit 0 {
family ethernet-switching {
interface-mode access;
vlan {
members v1000;
}
}
}
}
irb {
unit 1000 {
family inet {
address 192.168.100.0/31;
}
}
}
protocols {
rstp {
interface ae1 {
edge;
}
}
vlans {
v1000 {
vlan-id 1000;
l3-interface irb.1000;
mcae-mac-synchronize;
}
}
The lines highlighted are specific to MC-LAG; everything else is just normal LACP configuration. Note that on both vQFX-1 and vQFX-2 the LACP System ID, LACP Admin Key, MCAE ID, Redundancy Group ID, and MCAE Mode are all the same.
However, the MCAE Chassis ID and Status Control are unique between the two devices.
Next we’ll configure vQFX-3, which is just standard LACP configuration:
# vQFX-3
interfaces {
xe-0/0/2 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/4 {
ether-options {
802.3ad ae0;
}
}
ae0 {
aggregated-ether-options {
lacp {
active;
periodic slow;
}
}
unit 0 {
family inet {
address 192.168.100.1/31;
}
}
}
}
routing-options {
static {
route 0.0.0.0/0 next-hop 192.168.100.0;
}
router-id 3.3.3.3;
}
We add a static default route so that we can ping vQFX-4 later once its configuration is complete. But first, let’s make sure we can ping our gateway!
root@vQFX-3> ping 192.168.100.0 rapid count 5
PING 192.168.100.0 (192.168.100.0): 56 data bytes
!!!!!
--- 192.168.100.0 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 306.100/365.522/380.486/29.711 ms
{master:0}
Success! Let’s move on to configuring the MCAE toward vQFX-4, which is slightly more complicated due to running OSPF. We’ll split up the layer 2 configuation and the layer 3 configuration to make it a little easier to digest.
First, let’s configure the upstream devices:
# qfx1
interfaces {
xe-0/0/3 {
ether-options {
802.3ad ae2;
}
}
ae2 {
aggregated-ether-options {
lacp {
active;
periodic slow;
system-id 00:00:00:22:22:22;
admin-key 2;
}
mc-ae {
mc-ae-id 2;
redundancy-group 100;
chassis-id 1;
mode active-active;
status-control standby;
init-delay-time 15;
}
}
unit 0 {
family ethernet-switching {
interface-mode access;
vlan {
members v2000;
}
}
}
}
}
protocols {
rstp {
interface ae2 {
edge;
}
}
vlans {
v2000 {
vlan-id 2000;
l3-interface irb.2000;
}
}
# qfx2
interfaces {
xe-0/0/5 {
ether-options {
802.3ad ae2;
}
}
ae2 {
aggregated-ether-options {
lacp {
active;
periodic slow;
system-id 00:00:00:22:22:22;
admin-key 2;
}
mc-ae {
mc-ae-id 2;
redundancy-group 100;
chassis-id 0;
mode active-active;
status-control active;
init-delay-time 15;
}
}
unit 0 {
family ethernet-switching {
interface-mode access;
vlan {
members v2000;
}
}
}
}
}
protocols {
rstp {
interface ae2 {
edge;
}
}
vlans {
v2000 {
vlan-id 2000;
l3-interface irb.2000;
}
}
The MC-LAG-specific configurations are highlighted above.
Note
Notice in this example that we did not configure
mcae-mac-synchronization
on the VLAN. This is because we will be
using VRRP due to the OSPF requirement, and these two configurations
are mutually exclusive on the QFX series switches.
Now let’s examine the layer 3 configuration on vQFX-1 and vQFX-2:
# qfx1
interfaces {
irb {
unit 2000 {
family inet {
address 192.168.200.2/29 {
arp 192.168.200.3 l2-interface ae0.0 mac 02:05:86:71:93:00;
vrrp-group 1 {
virtual-address 192.168.200.1;
priority 200;
accept-data;
}
}
}
}
}
}
routing-options {
router-id 1.1.1.1;
}
protocols {
ospf {
area 0.0.0.0 {
interface irb.2000;
interface irb.1000 {
passive;
}
}
}
}
# qfx2
interfaces {
irb {
unit 2000 {
family inet {
address 192.168.200.3/29 {
arp 192.168.200.2 l2-interface ae0.0 mac 02:05:86:71:62:00;
vrrp-group 1 {
virtual-address 192.168.200.1;
priority 100;
accept-data;
}
}
}
}
}
}
routing-options {
router-id 2.2.2.2;
}
protocols {
ospf {
area 0.0.0.0 {
interface irb.2000;
interface irb.1000 {
passive;
}
}
}
}
The only line that is different compared to standard layer 3 configuration is the static ARP entry. Recall from Layer 3 Connectivity that we need a static ARP entry pointing to the remote device’s IRB MAC address via the ICL. That’s all this configuration line does: create a static ARP entry for the real IP of the remote switch with its IRB MAC via the ICL.
We can now examine vQFX-4’s configuration:
# vQFX-4
interfaces {
xe-0/0/3 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/5 {
ether-options {
802.3ad ae0;
}
}
ae0 {
aggregated-ether-options {
lacp {
active;
periodic slow;
}
}
unit 0 {
family inet {
address 192.168.200.4/29;
}
}
}
}
routing-options {
router-id 4.4.4.4;
}
protocols {
ospf {
area 0.0.0.0 {
interface ae0.0;
}
}
}
This is just standard configuration. No special tricks required! Let’s check OSPF:
root@vQFX-1> show ospf neighbor
Address Interface State ID Pri Dead
192.168.200.4 irb.2000 Full 4.4.4.4 128 30
192.168.200.3 irb.2000 Full 2.2.2.2 128 36
{master:0}
root@vQFX-2> show ospf neighbor
Address Interface State ID Pri Dead
192.168.200.4 irb.2000 Full 4.4.4.4 128 32
192.168.200.2 irb.2000 Full 1.1.1.1 128 34
{master:0}
root@vQFX-4> show ospf neighbor
Address Interface State ID Pri Dead
192.168.200.2 ae0.0 Full 1.1.1.1 128 33
192.168.200.3 ae0.0 Full 2.2.2.2 128 39
{master:0}
Finally, let’s ensure we can ping between vQFX-3 and vQFX-4:
root@vQFX-3> ping 192.168.200.4 rapid count 5
PING 192.168.200.4 (192.168.200.4): 56 data bytes
!!!!!
--- 192.168.200.4 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 239.258/425.293/601.453/129.111 ms
{master:0}
root@vQFX-4> ping 192.168.100.1 rapid count 5
PING 192.168.100.1 (192.168.100.1): 56 data bytes
!!!!!
--- 192.168.100.1 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 114.584/234.628/287.023/67.643 ms
{master:0}
And that’s it! MC-LAG configuration for simple gateway and advanced layer 3 routing is working.
Conclusion¶
We’ve brushed the surface of MC-LAG, hopefully enough for the JNCIP-DC. For more detailed information, check out Juniper MX Series, Second Edition [4].
The following blogs were helpful in compiling these notes:
Footnotes
[1] | Juniper Ambassador’s Cookbook 2019, Recipe #5 |
[2] | (1, 2, 3) Juniper MX Series, Chapter 9, ICCP Hierarchy Section |
[3] | (1, 2) Understanding Multichassis Link Aggregation Groups |
[4] | (1, 2) Juniper MX Series, Second Edition |