Tuesday 21 March 2017

VxLAN Underlay Fabric Convergence

VxLan underlay fabric convergence is based on 3 factors

  • IGP Convergence 
  • PIM Convergence 
  • BGP Convergence 
Factors must be addressed separately to achieve high availability for VxLan overlay flow

Generally 4 factors affect IGP convergence time
  1. Failure Detection time           : is the neighbor down ?
  2. Event propagation time          : Tell neighbor about the change
  3. Recalculation Time                 : Run SPF/DUAL/ etc calculation 
  4. Forwarding table update time : Install new paths
  • Failure Detection Time 
  • How long does it take me to realize there is a failure ?
  • Example failure detection 
  • Link up/down event
  • Routing protocol hello/dead timers
  • IP SLA & EEM
  • Bidirectional Forwarding Detection (BFD) 
  • Event Propagation Time
  • How long does it take to tell everyone else ?
  • Example event propagation 
  • Eigrp Query/Reply
  • OSPF LSA Flooding Procedure 
  • BGP update / withdraw

  • Recalculation Time
  • How long does it take me to decide on the new topology
  • Example recalculation Time 
  • Eigrp DUAL/ OSPF SPF / BGP Best path selection 

  • Forwarding Table update Time
  • How long does it take me to install the changes 
  • Example update time 
  • Eigrp topology to RIB download 
  • RIB to S/W FIB download
  • S/W FIB to H/W TCAM download 

Example ospf re convergence 

ospf failure detection 
   - Neighbor dead interval expires
ospf event propagation 
   - LSA flooding procedure 
ospf recalculation time 
    - SPF runtime 
forwarding table update time
    - ospf database to RIB installation , RIB to FIB , FIB to TCAM

How do we affect convergence 
Some factors are s/w & configuration dependent 
e.g smaller eigrp query domain is better 
e.g ospf stub areas are better 
e.g unnumbered fabric links or prefix-suppresison is better 

Some factor are h/w dependent 
e.g SPF runtime is function of CPU Size
e.g TCAM download is function of the line card

Methods of modifying convergence time 
Can be both reactive & proactive .

Reactive optimizations:
e.g Carrier delay & link de bounce timer
e.g Fast Hellos & BFD
e.g ospf LSA & SPF pacing
e.g FIB prefix prioritization 

Proactive optimizations 
Eigrp feasible successors
ospf loop free alternate (LSA)
BGP prefix independent convergence 
MPLS traffic engineering fast reroute ( TE FRR)

--------------------------------------+++++++++++++++--------------------------

Vx Lan Routing

Asymmetric Vs Symmetric IRB : 


  • EVPN integrated Routing (IRB) has two options: 
  • asymmetric IRB
  • symmetric IR

Asymmetric IRB 

  • Ingress VTEP does both L2 & L3 lookup
  • Egress VTEP does L2 lookup only
  • i.e Bridge - Route - Bridge 

Symmetric IRB

  • Ingress VTEP does both L2 & L3 lookup
  • Egress VTEP does both L3 & L2 Lookup
  • i.e Bridge-Route-Bridge

      A----(L2)--SVI--(L3)---SVI-------(L3)-------SVI----(L2)------B

Asymmetric IRB issues

VTEP must have all VNIs configured that require routing , result is increased ARP Cache and CAM
sizes , i.e Control plane scaling issue . 

VxLAN BGP EVPN With L3 VNIs : 

install feature-set virtualization 
install feature-set fabric
feature-set fabric
feature fabric forwarding 
feature nv overlay evpn
feature ospf 
feature bgp 
feature pim
feature interface-vlan
feature nv overlay
feature vn-segment-vlan-based
vlan 11
   vn-segment 11111
vlan 33
   vn-segment 33333
vrf context CUSTOMER1
   vni 33333
   rd auto
address-family ipv4 unicast
   route-target both auto
   route-target both auto evpn

interface vlan 11 
no shut
vrf member 11.0.0.0.254/24
fabric forwarding mode anycast-gateway
!
interface vlan 33 
no shut
vrf member CUSTOMER1
ip forward 
!
interface nve1 
no shut
source-interface loopback0
host-reachability protocol bgp 
member vni 11111
mcast-group 224.11.11.11
member vni 33333 associate-vrf 
route-map PERMIT permit 10 
!
router bgp 12345
    neighbor 1.1.1.71
    remote-as 12345
update-source loopback0
address-family l2vpn evpn
send-community both 
!
neighbor 1.1.1.72 
remote-as 12345
update-source loopback0
address-family l2vpn evpn 
send-community both
!
vrf CUSTOMER1

evpn 
  vni 11111 l2 
   rd auto
  route-target import auto
  route-target export auto
vrf defalt 


                                      vPC & VxLAN

vPC & VxLAN BGP traffic Flow problems : 
  • Vxlan traffic is tunneled over the overlay network using the BGP next-hop address of the remote VTEP
  • NVE source interface (i.e loopback 0 ) is the default BGP next-hop for advertised routes
  • In a vPC , both vPC peers advertise duplicate EVPN MAC / IP routes to spine RRs.
  • With other attributes equal , next-hop is tie breaker in BGP best path selection 
  • Implies that one vPC peer is always preferred for dual attached hosts.
  • Result is the egress traffic from vPC Member is load balanced , but ingress traffic is polarized
  • Workaround is to use Anycast VTEP address 

                                        vPC Anycast VTEP

vPC peers share duplicate ip address on NVE source interface 
  • Peer1 - interface loopback0 ; ip address 1.1.1.51/32
  • Peer2 - interface loopback0 ; ip address 1.1.1.52/32
  • Both peers - interface loopback0 ; ip address 1.1.1.111/32 secondary
BGP Next-hop is automatically set to secondary address for locally originated routes .
  • i.e L2VPN EVPN MAC/IP routes for vPC Member ports

                         Nexus 5600 & NVE Peer-Link-vlan 

  • On nexus 5600 , all traffic across the vPC Peer Link must be Vxlan encapsulated due to ASIC implementation 
  • Normal vPC Peer Link is a classical ethernet trunk 
  1. Result is that East/West flows over vPC Peer Link all broken by default 
  2. i.e , the VNI number is lost when pkt is sent out peer link 
  • Peer Link is normally only used for orphans or in failure scenarios 
  • Result is that everything looks fine until the failure occures 
  • Traffic to orphans & single attached members black holed over vPC Peer Link
  • Workaround is to maintain VxLan encapsulation across peer link
  • implemented as "vpc nve peer-link-vlan"

                Configuring NVE Peer-Link-Vlan

  • Create new VLAN & Specify as NVE Peer Link VLAN ( vlan 999; vpc nve peer-link-vlan 999)
  • Establish layer 3 peering across NVE peer link VLAN (interface vlan 999 ; ip router ospf 1 area 0 )
  • Traffic engineering so other vPC Peer's VTEP loopback is preferred over vPC Peer link 
  1. ip ospf cost 10
  2. isis metric 10 level-2


--------------------------------------ooooooooooooooo----------------------------------------------------
































Sunday 19 March 2017

VxLAN Configuration || Vxlan flood & Learn || BGP EVPN on NX-OS

VxLAN Prerequisites:

Prerequisites are hardware/Software Specific:
For Nexus 5600 as hardware VTEP 
Set switching mode to store-and-forward
  • h/w Ethernet store-and-forward-switching
  • Requires a reboot

Establish IP unicast reachability between VTEPs
Establish PIM BIDIR reachability RPs for redundancy

Enable features:
  1. feature vn-segment-vlan-based (VNI to VLAN mapping)
  2. feature nv overlay (nve interface ) 
Bidirectional PIM is scaling technique of multicast control plan do not install (S,G) entry, only we have (*,G) & all traffic collected by RP, in BIDIR PIM RP is in data plan .

example: conf)# ip pim rp-address 1.1.1.72 group-list 224.0.0.0/4 bidir ; in all the devices

Note: Vxlan & Fabricpath are mutually exclusive , asic will not take both 

VxLAN Flood & Learn Config Steps 

  1. Map vlan to Vxlan (vn-segment under vlan config more ; vlan 10 ; vn-segment 11111
  2. Create Network Virtualization Edge (NVE) interface (interface nve0)
  3. Specify VTEP Source ( source interface loopback 0)
  4. Specify VNI membership ( member vni [vnid] ; member vni 11111
  5. Specify multicast group for BUM replication ( mcast-group [group] ( 228.9.10.11 ) 

VxLAN Flood & Learn Verification 

  • show interface nve id 
  • show platform fwm info nve peer [all]
  • show mac address-table
  • show nve peer
  • show nve vni
  • show platform fwm info nve vni

Config Summary : 

feature nv overlay 
feature vn-segment-vlan-based
!
vlan 10
   vn-segment 11111
!
interface nve1 ; no shut ; source-interface loopback 0; member vni 11111; mcast-group 228.9.10.11
!
show ip route | in /32

Implementing VxLAN BGP EVPN on NX-OS

Note: We make sure unicast / multicast control plane is working otherwise arp will not work

Prerequisites anre hw/sw specific 
  • For Nexus 5600 as h/w VTEP
  • set switching mode to store-and-forward (hw ethernet store-and-forward)
  • requires reboot
  • Establish IP unicast reachability between VTEPs
  • Establish PIM BIDIR reachability between VTEPs (Spines can be phantom RPs for redundancy)
  • features to be enabled:  
                           install feature-set virtualization
                           install feature-set fabric
                           feature-set fabric
                           feature fabric forwarding
                           nv overlay evpn
                           feature nv overlay
                           feature vn-segment-vlan-based 


-----------  Config 
  • Map vlan to vxlan (vn-segment under vlan config mode )
  • Create n/w virtualization edge (NVE) (interface nve 0 )
  • Specify VTEP source ( Source interface loopback 0 ) 
  • Specify VNI Membership ( member vni [vnid] )
  • Specify multicast group for BUM replication ( mcast-group [group] )
  • Specify BGP as control plane protocol ( host-reachability protocol bgp )
  • Establish BGP EVPN Peering ( address-family l2vpn evpn )
  • extended community required 
SPINE SIDE CONFIGURATION : 
feature bgp ; router bgp 1 ; nei 1.1.1.71/32 ; remote-as 1 ; update-source lo0
address-family l2vpn evpn ; send-community extended ; route-reflector-client 

LEAF SIDE CONFIGURATION : 
feature bgp ; router bgp 1 ; nei 1.1.1.51/32 ; remot-as 1 ; update-source lo0
address-family l2vpn evpn ; send-community extended 

Verification : 
show interface nve id
show platform fwm info nve peer [all]
show mac address-table
show nve peer
show nve vni
show platform fwm info nve vni
show bgp l2vpn evpn summary 
show bgp l2vpn evpn
show bgp l2vpn evpn neighbor [neighbor] advertised-routes
!
evpn 
vni 11111  l2 
  rd auto
   route-target input auto
   route-target export auto 




























Saturday 18 March 2017

What is VxLAN


Virtual extensible Local Area Network
A L2 in L3 overlay tunnel

  • Specifically an ethernet in UDP tunnel 
  • technically agnostic to the data plane encapsulation 


Why use VxLAN
  • expands vlan name space : vlan 2power12  
  •                                             vxlan 2power24
  • Allows layer 2 multipathing 
  1. don't need stp for loop prevention
  2. uses layer 3 ECMP over CLOS fabric
  • similar login to fabricpath
  • Includes scaling enhancements 
  1.  Optimizes control plane, e.g  MAC learning , ARP Tables, BUM replication etc.
  • Dose not break layer 2 adjacency requirements
  1. Allows for any to any stateless layer 2 & layer 3 transport E.g vMotion
  • Allows for multi tenancy  
  • Separations of customer traffic over shared underlay fabric 
  • Allows for overlapping L2 & L3 addresses e.g VLANs & ips are locally significant 

VxLAN Terminology 
  • Underlay Network: Provides transport for VxLAN  , i.e ospf , eigrp , is-is routed fabric
  • overlay Network  : Uses the service provided by VxLAN 
  • VNI/VNID : VxLAN network Identifier
  • VTEP : VxLAN tunnel end point
  1. Box that performs VxLAN encap/decap
  2. Could by H/W or S/W
  3. E.g Nexus 5600 vs Nexus 1000V
  • VxLAN segment : - the resulting layer 2 overlay n/w
  • VxLAN Gateway 
  1. Device that forwards traffic between Vxlans
  2. Can be both L2 & L3 forwarding 
  • NVE : Network Virtualization Edge 
  1.  Logical representation of the VTEP
  2. i.e NVE is the tunnel interface 


VxLAN Encapsulation : 




Basic VxLAN Workflow:

  • Receive ARP From local host
  • Assume a miss occurs
  • Find the remote VTEP
  1. Multicast flood & Learn
  2. Ingress replication
  3. MP-BGP L2VPN EVPN
  • Unicast encap frame towards the VTEP
  •  Throws away the VLAN
  • Replace it with the VNID






Wednesday 15 March 2017

vPC Initialization Order of Operations

vPC Processes :


  1.  vPC Processes Starts
  2. IP/UDP 3200 Peer Keepalive connectivity established
  3.  Peer-link adjacency form 
  4. vPC Primary/ Secondary role election 
  5. vPC consistency check performed 
  6. Layer 3 SVIs move to UP/UP state
  7. vPC member ports move to UP/UP state

vPC Consistency Check 
  1.  vPC Peers sync control plane over peer link with Cisco Fabric Service (CFS)
  2. Includes advertisement of "consistency parameters" that must match for vPC to form successfully e.g line card type (M or F), speed , duplex , trunking, LACP mode , STP configs etc
3 types of Consistency checks: 
Type 1 Global:- 
  •  Mismatch results in vPC failing to form 
  • E.g STP mode Rapid-PVST vs MST
Type 1 interface 
  • Mismatch results in VLAN being suspended or vPC member 
  • E.g STP port type network Vs. Normal
Type 2 
  • Mismatch results in syslog message but not vPC Failure can results in failure in data plane
  • E.g MTU Mismatch 
Peer Keepalive & Peer link Fate Sharing

  • Keepalive configured via layer 3 SVI's
  • SVI Vlan is allowed on peer link
  • STP always prefers peer link
  • Peer Link fails , but primary still up 
  • No Layer 2 path for SVI exists or Secondary disables SVI
  • Secondary Cannot ping primary
  • Secondary promoted to operational primary 
  • Split brain occurs
vPC Peer link failure detection 
  • vPC Peer Link Fails (e.g line card outage)
vPC Secondary Pings Primary over peer keepalive 
  • if vPC primary is alive 
  • Disable vPC member ports on secondary 
  • Disable SVI's on secondary 
  • Goal is to force end host to forward via primary 
If vPC primary is dead 
  • Promote vPC secondary to operational primary 
  • Continue to forward traffic on new primary \
Peer Keepalive and peer link must not share fate in order to prevent split brain
  • E.g Separate MGMT Switch , Separate Port Channels on Separate line cards.