Data Center Deployment and Management¶
This blueprint item covers a number of topics:
Unfortunately, the blueprint doesn’t indicate to what level those items should be known, and all of those options seem to require physical hardware, which I do not have. So this guide will only be theory until someone contributes configuration sections.
Zero-Touch Provisioning¶
ZTP is a process for upgrading software and applying configuration
automatically on first boot. This is accomplished by using a stadard
DHCP server with a little bit of extra configuration so that each
DISCOVER
is associated with the correct switch so that the correct
configuration can be retrieved.
When configuring the DHCP server, the following DHCP options can be used:
12
: Switch Hostname42
: NTP Server43.00
: Software image filename43.01
: Configuration file filename43.02
: Symbolic link flag for filename43.03
: Transfer mode (HTTP, FTP, TFTP)43.04
: Alternate software image filename66
: DNS FQDN for the HTTP/FTP/TFTP server150
: IP Address for the HTTP/FTP/TFTP server
The reason that both 44.00
and 44.04
can both be used is that
some DHCP servers may not support 44.00
. If both options are
defined, 44.00
takes precedence.
When setting the 44.00
or 44.04
location to a symlink, you also
need to set the 44.02
option to the value symlink
so that the
system knows that the image is a symlink.
44.03
lets you specify whether the file transfer method will be FTP,
TFTP, or HTTP. The default is TFTP.
If DHCP options 66
and 150
are both specified, then 150
takes precedence.
When the configuration file starts with a shebang (#!
), Junos will
attempt to run the file as a script. This means you can use one Python
or shell script to dynamically configure your switches instead of
handcrafting a configuration file for each switch.
By default, a QFX5100 will attempt to perform ZTP through both its management interface and its revenue ports. This means you can perform ZTP through a dedicated OOB management network (recommended) or in-band.
After the switch knows the details of its ZTP process, it first downloads the required configuration file and then the required software image (if necessary). If a software image was downloaded, then the switch performs the software upgrade. Finally, it applies the configuration file that was downloaded.
ZTP can also be performed by Junos Space with Network Director. When this is done, the switch is automatically added to Network Director for future management.
For more information on ZTP (including configuration examples), see the following blogs/articles:
- Zero Touch Provisioning: How to Build a Network Without Touching Anything
- Zero Touch Provisioning on Juniper devices using Linux
You can also read Chapter 6 of the QFX5100 Series book from O’Reilly [1].
High Availability¶
The QFX5100 has a virtualized control plane. There is a Linux host running KVM, and Junos runs as a VM. The hypervisor can run up to four VMs: two of them are reserved for Junos, one is a guest of the operator’s choice, and the last is reserved.
Each Junos VM has four management interfaces. The first two, em0
and em1
, map to the physical management ports on the switch. The
third, em2
, is used for communicating with the hypervisor. The last
interface, em3
, is used when performing a Topology-independent
In-Service Software Upgrade, or TISSU. The second Junos VM is only
created during a TISSU. In order to perform a TISSU, NSR, NSB, and GRES
must be configured.
The high-level process for TISSU is as follows [2]:
1: Create the backup Junos VM running the new version requested
2: Synchronize state between the Junos VMs using ksyncd
3: Makes the new VM the master RE
4: Renames the slot ID of the new VM from 1
to 0
5: The former master Junos VM is shut down
When performing a TISSU, keep the following in mind:
- Downgrades and rollbacks are not supported
- TISSU should not be used when transitioning between different base
images (e.g.,
standard
toenhanced-automation
) - The CLI is inaccessible during a TISSU
- Log files are located in
/var/log/vjunos-log.tgz
- BFD timers need to be >= 1 second [3]
system internet-options no-tcp-reset drop-all-tcp
must not be configured [3]
A configuration sample for enabling NSR, NSB, and GRES is below.
system {
commit synchronize;
}
chassis {
redundancy {
graceful-switchover;
}
}
routing-options {
nonstop-routing;
}
protocols {
layer2-control {
nonstop-bridging;
}
}
Once the configuration is in place, you can perform a TISSU with the
request system software in-service-upgrade <path-to-image>
command.
For more information, see Chapter 2 of the Juniper QFX5100 Series book [1].
Monitoring¶
Junos supports streaming telemetry as well as more traditional methods of monitoring such as SNMP.
TODO: Add configuration for SNMPv2c, SNMPv3, and Streaming Telemetry.
Analytics¶
The QFX5100 supports two methods of analytics: sFlow and Enhanced Analytics. These are described below, but it’s important to understand that they should be used together as neither provides the entire picture on its own.
sFlow¶
The QFX5100 supports sFlow, which samples every n
packets. This
sampled data is exported every 1500 bytes or every 250ms. Any alerting
on this data must be performed off-box with the sFlow collector. On the
QFX5100, only switchports can be sampled. Layer 3 interfaces cannot be
sampled. The first 128 bytes of the packet are sampled, and this
includes information such as the source and destination MACs, IPs, and
Ports. Higher sampling rates require more processing power.
To combat this on high-traffic switches, the QFX5100 can dynamically adjust sample rates based on interface traffic. This is known as adaptive sampling. An agent checks the interfaces every 5 seconds. A list of the top five interfaces is created. An algorithm reduces the load by half for the top five interfaces and allocates those samples to lower traffic interfaces.
An sFlow configuration is shown below.
protocols {
sflow {
polling-interval 10;
sample-rate {
ingress 50;
egress 50;
}
collector 10.0.0.30 {
udp-port 9000;
}
interfaces xe-0/0/0.0;
}
}
Note
The polling-interval
tells Junos how frequently, in seconds, to
poll for data; the sampling-rate
tells Junos how many packets to
sample.
Juniper Enhanced Analytics¶
The QFX5100 also supports Juniper Enhanced Analytics. This system can poll as frequently as every 8ms, and data is exported as soon as it is collected. You can set thresholds on-box down to 1ns. Enhanced Analytics can monitor:
- Traffic statistics
- Queue depth
- Latency
- Jitter
This data can be streamed using the following formats:
- Protobuf
- JSON
- CSV
- TSV
Enhanced Analytics is performed by two systems: the Analytics Daemon and the Analytics Manager.
The Analytics Daemon (analyticsd
) collects information from the
Analytics Manager’s ring buffers and exports it to collectors.
The Analytics Manager runs in the PFE and collects the data that is
placed into ring buffers for analyticsd
to collect.
TODO: Add configuration examples for Enhanced Analytics.
For more information, see Chapter 9 of the Juniper QFX5100 Series book [1].
Footnotes
[1] | (1, 2, 3) Juniper QFX5100 Series |
[2] | Understanding In-Service Software Upgrade (ISSU) |
[3] | (1, 2) Performing an In-Service Software Upgrade (ISSU) with Non-Stop Routing |