Multi-Chassis LAG (MC-LAG)

Multi-Chassis LAG is a technology that allows two independent systems to present themselves as a single system while still operating indepedently. This is in contrast to Virtual Chassis, which makes multiple independent systems operate as a single system. At the highest level, MC-LAG works by forcing the two upstream devices to use the same LACP System ID, which makes the downstream device think that the upstream devices are just one device.

Before diving into the specifics of an MC-LAG configuration, it is critical to know which settings must match and which settings must be unique between peers. The following section describes which must match and which must be unique [1].

Must match:

  • LACP System ID
  • LACP Admin Key
  • MCAE ID
  • MCAE Mode
  • VLANs
  • Redundancy Group ID [2]
  • Service ID [2]

Must be unique:

  • MCAE Chassis ID
  • MCAE Status Control
  • Local ICCP IP
  • Peer ICCP IP
  • MC-LAG Protection

Finally, a diagram is worth a thousand words. If you have the Juniper MX Series book from O’Reilly [4], I can highly recommend Figure 9-4 as a reference. I won’t reproduce it here because that would be incredibly rude and inconsiderate, so please pick up that book if you can. Chapter 9, MC-LAG, is definitely worth it! Just in case you have a Safari subscription, here’s a link straight to the diagram. But again, definitely read the entire chapter. It’s absolutely worth it.

Note

I’m hoping the authors don’t mind me including a link directly to the diagram in the Safari Online book. But if you’re one of the authors or the publisher and want that removed, please reach out to me and let me know and I will gladly remove it.

Terminology

This section explains some of the terminology and configuration elements listed above. [2]

Service ID

The Service ID must match between two ICCP peers. The Service ID exists for use cases where MC-LAG is used in more than one routing instance. This ID is global.

Redundancy Group ID

The Redundancy Group ID must match between two ICCP peers. It contains a grouping of MC-LAG bundles and their associated VLANs. It is useful because it allows the ICCP peers to send an update once instead of sending an update for each MC-LAG. For example, when a MAC address is learned, the MAC only needs to be sent to the ICCP peer one time for that redundancy group instead of multiple times for each MC-LAG in the group. This ID is not required to be global, but can encompass one or more MC-LAG bundles.

Note

In vQFX 18.1R1.9, on which this document is based, there can only be one Redundancy Group ID configured. Although a list is supported, the commit check will fail with an error if more than one is provided. This is probably because only one bridge domain is supported, and a Redundancy Group is a broadcast domain. In the MX router, multiple Redundancy Group IDs are supported.

MCAE ID

The Multi-Chassis Aggregated Ethernet Identifier must match between peers. This number must be unique per MC-LAG.

MCAE Chassis ID

The MCAE Chassis ID uniquely identifies a chassis. Therefore, this ID must be unique between ICCP peers.

LACP System ID

This ID is used by LACP to identify the system to which a remote belongs. This ID must match between peers. It is used to “trick” the remote LACP peer into thinking it is talking to only one system.

LACP Admin Key

This is used in conjunction with the LACP System ID to uniquely identify an LACP peer. It must match between peers.

MCAE Mode

active-active or active-standby. Must match between peers.

Note

Only active-active is supported on the QFX and EX Series. However, both active-active and active-standby are supported on the MX series with MPCs. active-active with a DPC on an MX is not supported; you must use active-standby in those scenarios. [3]

Status Control

Controls whether the node will be the active or standby node if ICCP fails. Only two options exist: active or standby. One node must be active and the other must be standby. The standby node will change its LACP System ID.

MC-LAG Protection

Multi-chassis Link Protection ensures the appropriate behavior of the node configured as Status Control standby when the ICL goes down. For example, if the ICL goes down, should the standby node change its LACP System ID, thereby taking its MC-LAG link down, or should it leave it as-is, making it actively receive traffic? Link Protection uses an out-of-band mechanism to determine if the remote ICCP peer is still up or not. If the ICL is down but the remote ICCP peer is still up, then the standby node will change its LACP System ID. However, if the ICL is down and the remote ICCP peer is also down, then the standby node will not change its LACP System ID. This will ensure that there is no outage.

Guidelines

The follow sections describe high-level, general guidelines when deploying MC-LAG. None of these recommendations are hard-and-fast rules, but they should be taken into consideration with any MC-LAG deployment.

Compensating for Lack of LACP support

Generally speaking, your downstream LACP bundle will be a single-interface LAG per upstream device. If the downstream device does not support LACP, you can use the force-up flag. However, if the downstream device does support LACP, it is strongly recommended to use LACP and leave the force-up option off.

Layer 3 Connectivity

For simple layer 3 connectivity, such as when the downstream device is a server, both of the upstream MC-LAG peers can have their gateways configured with the same IP address when mcae-mac-synchronization is enabled on the VLAN. Unlike VRRP, this will keep traffic local to the switch that receives the packet. This works by allowing one upstream switch to respond with the peer switch’s MAC address. The end result is that each upstream switch treats a packet as if it is its own. mcae-mac-synchronization in this scenario is required because, without it, ARP requests are not sniffed or replicated by the peers.

Note

DHCP Relay is not supported with mcae-mac-synchronization. If a DHCP Relay is required, you must use VRRP.

When routing protocols are required with an MC-LAG, mcae-mac-synchronization is no longer an option. This is because the MAC synchronization option works similarly to an anycast service, but routing protocols require 1:1 direct relationships. For this reason, when protocol adjacency is required (such as OSPF between the aggregation and core layers), VRRP is required. In these instances, the devices should peer to the VRRP VIP, not the real IP. To compensate for potential issues, a static ARP address for the remote peer’s IRB MAC and real IP should be configured.

Spanning Tree Protocol

Although the goal of MC-LAG is to remove the need for a Spanning Tree Protocol, it is highly recommended that STP is enabled to prevent loops caused by miswiring as well as to prevent unintentional propagation of BPDUs.

When configuring STP, disable STP on the ICL. This is because STP could cause the traffic on the ICL to be blocked, thereby breaking the MC-LAGs. Configure all MC-LAG interfaces as edge. A downstream device should not be able to cause a loop in a properly designed network. Turn on bpdu-block-on-edge. This is to prevent malicious or unintentional BPDU propagation.

Note

Do not configure MSTP or VSTP. This can cause loops when not configured appropriately on all devices, including downstream devices. [3]

Configuration

In this example, there are four vQFX switches: vQFX-1 through vQFX-4. vQFX-1 and vQFX-2 are our MC-LAG head-end devices. Their ICL is ae0, which consists of members xe-0/0/0 and xe-0/0/1.

They run an MC-LAG down to vQFX-3 on ae1 and to vQFX-4 on ae2. ae1 consists of xe-0/0/2 on vQFX-1 and xe-0/0/4 on vQFX-2. ae2 is made up of xe-0/0/3 on vQFX-1 and xe-0/0/5 on vQFX-2. The interface numbers are the same on vQFX-3 and vQFX-4 as their upstream devices with the exception of the bundle interface. On vQFX-3 and vQFX-4, that is ae0. You can see this below in the show lldp neighbors output for each device:

root@vQFX-1> show lldp neighbors
Local Interface    Parent Interface    Chassis Id          Port info          System Name
xe-0/0/0           ae0                 02:05:86:71:68:00   xe-0/0/0           vQFX-2
xe-0/0/1           ae0                 02:05:86:71:68:00   xe-0/0/1           vQFX-2
xe-0/0/2           ae1                 02:05:86:71:bc:00   xe-0/0/2           vQFX-3
xe-0/0/3           ae2                 02:05:86:71:c5:00   xe-0/0/3           vQFX-4
{master:0}

root@vQFX-2> show lldp neighbors
Local Interface    Parent Interface    Chassis Id          Port info          System Name
xe-0/0/4           ae1                 02:05:86:71:bc:00   xe-0/0/4           vQFX-3
xe-0/0/5           ae2                 02:05:86:71:c5:00   xe-0/0/5           vQFX-4
xe-0/0/1           ae0                 02:05:86:71:fa:00   xe-0/0/1           vQFX-1
xe-0/0/0           ae0                 02:05:86:71:fa:00   xe-0/0/0           vQFX-1

{master:0}

root@vQFX-3> show lldp neighbors
Local Interface    Parent Interface    Chassis Id          Port info          System Name
xe-0/0/4           ae0                 02:05:86:71:68:00   xe-0/0/4           vQFX-2
xe-0/0/2           ae0                 02:05:86:71:fa:00   xe-0/0/2           vQFX-1

{master:0}

root@vQFX-4> show lldp neighbors
Local Interface    Parent Interface    Chassis Id          Port info          System Name
xe-0/0/5           ae0                 02:05:86:71:68:00   xe-0/0/5           vQFX-2
xe-0/0/3           ae0                 02:05:86:71:fa:00   xe-0/0/3           vQFX-1

{master:0}

The MC-LAG to vQFX-3 provides normal layer 3 gateway services. We use MAC address synchronization here. The MC-LAG toward vQFX-4, however, runs OSPF. For this, we remove MAC address synchronization and implement VRRP with static ARP bindings. See Layer 3 Connectivity for more details about this.

The final test will be pinging between vQFX-3 and vQFX-4’s ae0 interfaces.

Note

I don’t have Visio or Omnigraffle, and I found making diagrams in anything else highly frustrating. So for now, there are no diagrams. :( However, if you’d like to contribute some, they would be greatly appreciated!

The very first thing to do with any LAG in Junos is to set the number of LAG interfaces:

# qfx1
chassis {                               
    aggregated-devices {
        ethernet {
            device-count 3;
        }
    }
}
# qfx2
chassis {                               
    aggregated-devices {
        ethernet {
            device-count 3;
        }
    }
}
# vQFX-3
chassis {                               
    aggregated-devices {
        ethernet {
            device-count 1;
        }
    }
}
# vQFX-4
chassis {                               
    aggregated-devices {
        ethernet {
            device-count 1;
        }
    }
}

Next, we configure the Service ID on the upstream devices:

# qfx1
switch-options {
    service-id 1;
}
# qfx2
switch-options {
    service-id 1;                       
}

Next, we’ll configure the ICL. For redundancy, we’ll configure the ICL as an LACP bundle:

# qfx1
interfaces {
    xe-0/0/0 {
        ether-options {
            802.3ad ae0;
        }
    }
    xe-0/0/1 {
        ether-options {
            802.3ad ae0;                
        }
    }
    ae0 {
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members all;
                }
            }
        }
    }
}
# qfx2
interfaces {
    xe-0/0/0 {
        ether-options {
            802.3ad ae0;
        }
    }
    xe-0/0/1 {
        ether-options {
            802.3ad ae0;
        }
    }
    ae0 {
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members all;
                }
            }
        }
    }
}

Note

If you have a device with different line cards/slots/ASICs, consider spreading your bundle members across them for greater resiliency.

Now that the ICL has its layer 1 and layer 2 configuration done, we need to ensure it can route packets for ICCP. Since ICCP only requires TCP/IP connectivity to establish, this could be done between loopback interfaces. However, for the example, we’ll just use an IRB interface.

# qfx1
interfaces {
    irb {
        unit 100 {
            family inet {
                address 10.0.0.0/31;    
            }
        }
    }
}
vlans {
    v100 {
        vlan-id 100;
        l3-interface irb.100;
    }
}
# qfx2
interfaces {
    irb {
        unit 100 {
            family inet {
                address 10.0.0.1/31;    
            }
        }
    }
}
vlans {
    v100 {
        vlan-id 100;
        l3-interface irb.100;
    }
}

Next, we need to configure ICCP:

# qfx1
protocols {
    iccp {
        local-ip-addr 10.0.0.0;         
        peer 10.0.0.1 {
            session-establishment-hold-time 50;
            redundancy-group-id-list 100;
            liveness-detection {
                minimum-receive-interval 300;
                transmit-interval {
                    minimum-interval 300;
                }
            }
        }
    }
}
# qfx2
protocols {
    iccp {
        local-ip-addr 10.0.0.1;
        peer 10.0.0.0 {
            session-establishment-hold-time 50;
            redundancy-group-id-list 100;
            liveness-detection {
                minimum-receive-interval 300;
                transmit-interval {
                    minimum-interval 300;
                }
            }
        }
    }
}

Note that we’re configuring BFD for faster failure detection in this example. By default, this will create a new BFD session that runs on the control plane. However, if your platform supports it and you are not running ICCP via loopback interfaces (and ICCP is using IPs that are directly connected), you can push this down to the PFE with the set protocols iccp peer <ip> liveness-detection single-hop command.

Note

In production, a best practice would be to configure ICCP between the loopback interfaces to ensure the ICCP session stays up, even if the ICL is down.

At this point, everything has been done such that ICCP should come up. This can be verified with the show iccp status command:

root@vQFX-1> show iccp

Redundancy Group Information for peer 10.0.0.1
  TCP Connection       : Established
  Liveliness Detection : Up
  Redundancy Group ID          Status
    100                         Up

Client Application: lacpd
  Redundancy Group IDs Joined: None

Client Application: MCSNOOPD
  Redundancy Group IDs Joined: None

Client Application: l2ald_iccpd_client
  Redundancy Group IDs Joined: None

{master:0}

root@vQFX-2> show iccp

Redundancy Group Information for peer 10.0.0.0
  TCP Connection       : Established
  Liveliness Detection : Up
  Redundancy Group ID          Status
    100                         Up

Client Application: lacpd
  Redundancy Group IDs Joined: None

Client Application: MCSNOOPD
  Redundancy Group IDs Joined: None

Client Application: l2ald_iccpd_client
  Redundancy Group IDs Joined: None

{master:0}

Before going on, we have two minor things to finish up: configuring MC-LAG Protection and configuring Spanning Tree Protocol:

# qfx1
multi-chassis {
    multi-chassis-protection 10.0.0.1 { 
        interface ae0;
    }
}
protocols {
    rstp {
        interface ae0 {
            disable;
        }
        interface all {
            mode point-to-point;
        }
        bpdu-block-on-edge;
    }
}
# qfx2
multi-chassis {
    multi-chassis-protection 10.0.0.0 { 
        interface ae0;
    }
}
protocols {
    rstp {
        interface ae0 {
            disable;
        }
        interface all {
            mode point-to-point;
        }
        bpdu-block-on-edge;
    }
}

All that’s left to finish our MC-LAG configuration is to start bringing up MCAEs! First, we’ll work on the MC-LAG toward vQFX-3.

# qfx1
interfaces {
    xe-0/0/2 {
        ether-options {
            802.3ad ae1;
        }
    }
    ae1 {                               
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
                system-id 00:00:00:11:11:11;
                admin-key 1;
            }
            mc-ae {
                mc-ae-id 1;
                redundancy-group 100;
                chassis-id 0;
                mode active-active;
                status-control active;
                init-delay-time 15;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode access;
                vlan {
                    members v1000;
                }
            }                           
        }
    }
    irb {
        unit 1000 {
            family inet {
                address 192.168.100.0/31;
            }
        }
}
protocols {
    rstp {
        interface ae1 {
            edge;                       
        }
}
vlans {
    v1000 {                             
        vlan-id 1000;
        l3-interface irb.1000;
        mcae-mac-synchronize;
    }
}
# qfx2
interfaces {
    xe-0/0/4 {
        ether-options {
            802.3ad ae1;
        }
    }
    ae1 {                               
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
                system-id 00:00:00:11:11:11;
                admin-key 1;
            }
            mc-ae {
                mc-ae-id 1;
                redundancy-group 100;
                chassis-id 1;
                mode active-active;
                status-control standby;
                init-delay-time 15;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode access;
                vlan {
                    members v1000;
                }
            }                           
        }
    }
    irb {
        unit 1000 {
            family inet {
                address 192.168.100.0/31;
            }
        }
}
protocols {
    rstp {
        interface ae1 {
            edge;
        }
}
vlans {
    v1000 {
        vlan-id 1000;
        l3-interface irb.1000;
        mcae-mac-synchronize;
    }
}

The lines highlighted are specific to MC-LAG; everything else is just normal LACP configuration. Note that on both vQFX-1 and vQFX-2 the LACP System ID, LACP Admin Key, MCAE ID, Redundancy Group ID, and MCAE Mode are all the same.

However, the MCAE Chassis ID and Status Control are unique between the two devices.

Next we’ll configure vQFX-3, which is just standard LACP configuration:

# vQFX-3
interfaces {
    xe-0/0/2 {
        ether-options {
            802.3ad ae0;
        }
    }
    xe-0/0/4 {
        ether-options {                 
            802.3ad ae0;
        }
    }
    ae0 {
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
            }
        }
        unit 0 {
            family inet {               
                address 192.168.100.1/31;
            }
        }
    }
}
routing-options {
    static {
        route 0.0.0.0/0 next-hop 192.168.100.0;
    }
    router-id 3.3.3.3;
}

We add a static default route so that we can ping vQFX-4 later once its configuration is complete. But first, let’s make sure we can ping our gateway!

root@vQFX-3> ping 192.168.100.0 rapid count 5 
PING 192.168.100.0 (192.168.100.0): 56 data bytes
!!!!!
--- 192.168.100.0 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 306.100/365.522/380.486/29.711 ms

{master:0}

Success! Let’s move on to configuring the MCAE toward vQFX-4, which is slightly more complicated due to running OSPF. We’ll split up the layer 2 configuation and the layer 3 configuration to make it a little easier to digest.

First, let’s configure the upstream devices:

# qfx1
interfaces {
    xe-0/0/3 {
        ether-options {
            802.3ad ae2;
        }
    }
    ae2 {
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
                system-id 00:00:00:22:22:22;
                admin-key 2;
            }
            mc-ae {
                mc-ae-id 2;
                redundancy-group 100;
                chassis-id 1;
                mode active-active;
                status-control standby;
                init-delay-time 15;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode access;
                vlan {                  
                    members v2000;
                }
            }
        }
    }
}
protocols {
    rstp {
        interface ae2 {
            edge;
        }
}
vlans {
    v2000 {
        vlan-id 2000;
        l3-interface irb.2000;
    }
}
# qfx2
interfaces {
    xe-0/0/5 {
        ether-options {
            802.3ad ae2;
        }
    }
    ae2 {
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
                system-id 00:00:00:22:22:22;
                admin-key 2;
            }
            mc-ae {
                mc-ae-id 2;
                redundancy-group 100;
                chassis-id 0;
                mode active-active;
                status-control active;
                init-delay-time 15;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode access;
                vlan {                  
                    members v2000;
                }
            }
        }
    }
}
protocols {
    rstp {
        interface ae2 {
            edge;
        }
}
vlans {
    v2000 {
        vlan-id 2000;
        l3-interface irb.2000;
    }
}

The MC-LAG-specific configurations are highlighted above.

Note

Notice in this example that we did not configure mcae-mac-synchronization on the VLAN. This is because we will be using VRRP due to the OSPF requirement, and these two configurations are mutually exclusive on the QFX series switches.

Now let’s examine the layer 3 configuration on vQFX-1 and vQFX-2:

# qfx1
interfaces {
    irb {
        unit 2000 {
            family inet {
                address 192.168.200.2/29 {
                    arp 192.168.200.3 l2-interface ae0.0 mac 02:05:86:71:93:00;
                    vrrp-group 1 {
                        virtual-address 192.168.200.1;
                        priority 200;
                        accept-data;
                    }
                }
            }
        }
    }
}
routing-options {
    router-id 1.1.1.1;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface irb.2000;
            interface irb.1000 {
                passive;
            }
        }
    }
}
# qfx2
interfaces {
    irb {
        unit 2000 {
            family inet {
                address 192.168.200.3/29 {
                    arp 192.168.200.2 l2-interface ae0.0 mac 02:05:86:71:62:00;
                    vrrp-group 1 {
                        virtual-address 192.168.200.1;
                        priority 100;
                        accept-data;
                    }
                }
            }
        }
    }
}
routing-options {
    router-id 2.2.2.2;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface irb.2000;
            interface irb.1000 {
                passive;
            }
        }
    }
}

The only line that is different compared to standard layer 3 configuration is the static ARP entry. Recall from Layer 3 Connectivity that we need a static ARP entry pointing to the remote device’s IRB MAC address via the ICL. That’s all this configuration line does: create a static ARP entry for the real IP of the remote switch with its IRB MAC via the ICL.

We can now examine vQFX-4’s configuration:

# vQFX-4
interfaces {
    xe-0/0/3 {
        ether-options {
            802.3ad ae0;
        }
    }
    xe-0/0/5 {
        ether-options {                 
            802.3ad ae0;
        }
    }
    ae0 {
        aggregated-ether-options {
            lacp {
                active;
                periodic slow;
            }
        }
        unit 0 {
            family inet {
                address 192.168.200.4/29;
            }
        }
    }
}
routing-options {
    router-id 4.4.4.4;
}
protocols {
    ospf {
        area 0.0.0.0 {
            interface ae0.0;
        }
    }
}

This is just standard configuration. No special tricks required! Let’s check OSPF:

root@vQFX-1> show ospf neighbor 
Address          Interface              State     ID               Pri  Dead
192.168.200.4    irb.2000               Full      4.4.4.4          128    30
192.168.200.3    irb.2000               Full      2.2.2.2          128    36

{master:0}

root@vQFX-2> show ospf neighbor    
Address          Interface              State     ID               Pri  Dead
192.168.200.4    irb.2000               Full      4.4.4.4          128    32
192.168.200.2    irb.2000               Full      1.1.1.1          128    34

{master:0}

root@vQFX-4> show ospf neighbor    
Address          Interface              State     ID               Pri  Dead
192.168.200.2    ae0.0                  Full      1.1.1.1          128    33
192.168.200.3    ae0.0                  Full      2.2.2.2          128    39

{master:0}

Finally, let’s ensure we can ping between vQFX-3 and vQFX-4:

root@vQFX-3> ping 192.168.200.4 rapid count 5    
PING 192.168.200.4 (192.168.200.4): 56 data bytes
!!!!!
--- 192.168.200.4 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 239.258/425.293/601.453/129.111 ms

{master:0}

root@vQFX-4> ping 192.168.100.1 rapid count 5 
PING 192.168.100.1 (192.168.100.1): 56 data bytes
!!!!!
--- 192.168.100.1 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 114.584/234.628/287.023/67.643 ms

{master:0}

And that’s it! MC-LAG configuration for simple gateway and advanced layer 3 routing is working.

Conclusion

We’ve brushed the surface of MC-LAG, hopefully enough for the JNCIP-DC. For more detailed information, check out Juniper MX Series, Second Edition [4].

The following blogs were helpful in compiling these notes:

Footnotes

[1]Juniper Ambassador’s Cookbook 2019, Recipe #5
[2](1, 2, 3) Juniper MX Series, Chapter 9, ICCP Hierarchy Section
[3](1, 2) Understanding Multichassis Link Aggregation Groups
[4](1, 2) Juniper MX Series, Second Edition