Segment Routing | Control and Data plane review
Hi all!
Today I’m going to talk about Segment Routing, especially SR-MPLS. Exactly the best source of theoretical information is RFC. But Segment Routing is a huge topic and it's difficult to sort things out. I will provide basic concepts of SR-MPLS and we will go through basic control plane and data plane tasks of SR.
A good network engineer always tries to optimize network, operation tools and workflow. And I’m sure, engineers who develop Segment Routing concepts follow the same idea.
Why do I think so? Look SR-MPLS short facts:
SR is an alternative of main label distribution protocols - LDP and RSVP.
SR decreases control plane entities because it’s a part of IGP protocols (IS-IS or OSPF)
SR uses stateless paradigm unlike RSVP (It helps to reduce CPU consumption)
Let’s investigate basic SR concepts.
Segment and routing. Take the first definition. What is a "segment"? What types of segments do we have?
Segments are instructions. Head-end encodes these instructions into MPLS headers. It's an interesting concept. We can steer traffic flow by data plane units that contain a stack of MPLS labels - stack of instructions. It helps to eliminate states for every MPLS LSP on the LSR.
I will define the test topology and the same terms for further explanation. All labs I will perform using Nokia SR-OS VMs.
Topology description and basic SR concepts
SRGB
SRGB - Segment Routing Global block. It's a block of MPLS labels. SRGB is a configurable range. But common sense tells us that we should use the same SRGB on all nodes inside SR domain. This approach makes troubleshooting and operation easier.
Our example:
- start label 20000, end label 20200.
Configuration:
/configure { router "Base" mpls-labels sr-labels start 20000 }
/configure { router "Base" mpls-labels sr-labels end 20200 }
Segments types
Our topology has three segments:
R1 node segment= interface Lo0
R2 node segment= interface Lo0
Adjacency segment between R1 and R2
Take a look into segments types. The first one is the Node segment (it's a particular case of Prefix Segment). Typically it’s lo0 (aka “system”) interface. We should explicitly define segment ID (SID). E.g Node SID can be equal the last octet of Lo IP address. And Node SID MUST be unique, because it has global scope.
Our example:
- R1 lo0= 10.10.10.1, SID=1
- R2 lo0= 10.10.10.2, SID=2
Configuration:
/configure { router "Base" isis 0 interface "system" ipv4-node-sid index 1 }
/configure { router "Base" isis 0 interface "system" ipv4-node-sid index 2 }
TLVs :
Supp Protocols:
Protocols : IPv4
IS-Hostname : R1
Router ID :
Router ID : 10.10.10.1
Router Cap : 10.10.10.1, D:0, S:0
TE Node Cap : B E M P
SR Cap: IPv4 MPLS-IPv6
SRGB Base:20000, Range:201
SR Alg: metric based SPF
Node MSD Cap: BMI : 12 ERLD : 15
Below we can see new sub-TLV - Prefix-SID sub-TLV [RFC 8667]. There are index value, algo number and flags. This information helps correctly calculate out label toward target node. Further we will look into calculation process.
R1 Node SID
TE IP Reach : Default Metric : 10 Control Info: , prefLen 31 Prefix : 10.0.0.0 Default Metric : 0 Control Info: S, prefLen 32 Prefix : 10.10.10.1 Sub TLV : Prefix-SID Index:1, Algo:0, Flags:NnP
R2 Node SID
TE IP Reach : Default Metric : 10 Control Info: , prefLen 31 Prefix : 10.0.0.0 Default Metric : 0 Control Info: S, prefLen 32 Prefix : 10.10.10.2 Sub TLV : Prefix-SID Index:2, Algo:0, Flags:NnP
LSRs generate Adjacency labels for every IGP adjacency. Adjacency label has local scope. Label may be the same for one IGP adjacency on the different neighbors. Our example proves that. Routers generate the same label.
R1 Adj-SID
TE IS Nbrs : Nbr : R2.00 Default Metric : 10 Sub TLV Len : 19 IF Addr : 10.0.0.0 Nbr IP : 10.0.0.1 Adj-SID: Flags:v4VL Weight:0 Label:524287
R2 Adj-SID
TE IS Nbrs : Nbr : R1.00 Default Metric : 10 Sub TLV Len : 19 IF Addr : 10.0.0.1 Nbr IP : 10.0.0.0 Adj-SID: Flags:v4VL Weight:0 Label:524287
Label calculation rules
'A:admin@R1# tools dump router segment-routing tunnel in-label 20001 =================================================================================================== Legend: (B) - Backup Next-hop for Fast Re-Route (D) - Duplicate label stack is ordered from top-most to bottom-most =================================================================================================== --------------------------------------------------------------------------------------------------+ Prefix | Sid-Type Fwd-Type In-Label Prot-Inst(algoId) | Next Hop(s) Out-Label(s) Interface/Tunnel-ID | --------------------------------------------------------------------------------------------------+ 10.10.10.1 Node Terminating 20001 ISIS-0 --------------------------------------------------------------------------------------------------+ No. of Entries: 1 --------------------------------------------------------------------------------------------------+ '
2) Global label (Node segment) = target node SRGB start label + Node SID
Our example (calculation label for R2 Node Segment on R1):
20000+2 = 20002
R1 may use this label as a transit and in the case when R1 originates traffic flows toward R2. Out interface is selected by SPF.
'A:admin@R1# tools dump router segment-routing tunnel in-label 20002 =================================================================================================== Legend: (B) - Backup Next-hop for Fast Re-Route (D) - Duplicate label stack is ordered from top-most to bottom-most =================================================================================================== --------------------------------------------------------------------------------------------------+ Prefix | Sid-Type Fwd-Type In-Label Prot-Inst(algoId) | Next Hop(s) Out-Label(s) Interface/Tunnel-ID | --------------------------------------------------------------------------------------------------+ 10.10.10.2 Node Orig/Transit 20002 ISIS-0 10.0.0.1 20002 to_R2 --------------------------------------------------------------------------------------------------+ No. of Entries: 1 --------------------------------------------------------------------------------------------------+
3) As we know Adjacency label has local scope. Routers install all local Adjacency labels to LFIB as an In-Label with a Transit type. Out interfaces are selected toward IGP neighbor. If LSR receives traffic flow with Adj label, LSR will pop label and forward to out-interface.
'A:admin@R1# tools dump router segment-routing tunnel in-label 524287 =================================================================================================== Legend: (B) - Backup Next-hop for Fast Re-Route (D) - Duplicate label stack is ordered from top-most to bottom-most =================================================================================================== --------------------------------------------------------------------------------------------------+ Prefix | Sid-Type Fwd-Type In-Label Prot-Inst(algoId) | Next Hop(s) Out-Label(s) Interface/Tunnel-ID | --------------------------------------------------------------------------------------------------+ 10.0.0.1 Adjacency Transit 524287 ISIS-0 10.0.0.1 3 to_R2 --------------------------------------------------------------------------------------------------+ No. of Entries: 1 --------------------------------------------------------------------------------------------------+
'A:admin@R1# show router tunnel-table =============================================================================== IPv4 Tunnel Table (Router: Base) =============================================================================== Destination Owner Encap TunnelId Pref Nexthop Metric Color ------------------------------------------------------------------------------- 10.0.0.1/32 isis (0) MPLS 524289 11 10.0.0.1 0 10.10.10.2/32 isis (0) MPLS 524290 11 10.0.0.1 10 -------------------------------------------------------------------------------
A:R1# oam lsp-ping sr-isis prefix 10.10.10.2/32 send-count 3 LSP-PING 10.10.10.2/32: 80 bytes MPLS payload Seq=1, send from intf to_R2, reply from 10.10.10.2 udp-data-len=32 ttl=255 rtt=3.25ms rc=3 (EgressRtr) Seq=2, send from intf to_R2, reply from 10.10.10.2 udp-data-len=32 ttl=255 rtt=3.37ms rc=3 (EgressRtr) Seq=3, send from intf to_R2, reply from 10.10.10.2 udp-data-len=32 ttl=255 rtt=3.25ms rc=3 (EgressRtr)
---- LSP 10.10.10.2/32 PING Statistics ----
3 packets sent, 3 packets received, 0.00% packet loss
round-trip min = 3.25ms, avg = 3.29ms, max = 3.37ms, stddev = 0.055ms
16:00:04.668820 MPLS (label 20002, exp 0, [S], ttl 255) IP 10.10.10.1.49166 > 127.0.0.1.3503: LSP-PINGv1, MPLS Echo Request, seq 1, length: 48 16:00:05.675398 MPLS (label 20002, exp 0, [S], ttl 255) IP 10.10.10.1.49166 > 127.0.0.1.3503: LSP-PINGv1, MPLS Echo Request, seq 2, length: 48 16:00:06.685005 MPLS (label 20002, exp 0, [S], ttl 255) IP 10.10.10.1.49166 > 127.0.0.1.3503: LSP-PINGv1, MPLS Echo Request, seq 3, length: 48
Already we can use this MPLS infrastructure for service deployment. But I'm going to test more interesting scenarios and SR features.
Segment Routing Traffic Engineering
Well, I think we've done with basic SR concepts. This time we can move forward and turn to SR-TE.
Every MPLS network (e.g. backbone, mobile backhaule, DCI) requires more flexible tools for traffic steering compared to LDP or simple SR-MPLS. And SR-TE can give these possipilities. Let's take a look into theoretical part and then try it in the lab environment.
Application-Specific Link Attributes
What do we need for performing TE LSP? TE database.
Of course, current IGP protocols support TE and there is no problem here. But historically we should keep RSVP-TE enable for TE link attributes generation (at least it's true for SR-OS but I guess it's true and for other vendors)
I mentioned control plane optimization as one of the reasons to move to SR from RSVP/LDP, but we come back to RSVP again. Fortunately, IETF continues development and as a result, we have RFC 8919 (It's for IS-IS). Now IGP protocols can carry information which MPLS control plane entity may use - TE link attributes - e.g. SR, RSVP-TE, or both simultaneously. This feature obviates the need for keeping RSVP.
Here's a new topology.
Let's investigate RFC8919( IS-IS Application-Specific Link Attributes) in the lab environment. For example, we can look inside IS-IS LSP by PE2 and play with PE2 configuration. For illustrative purposes, I've created "admin" and SRLG groups on PE2 interfaces.
1)
- RSVP is disabled
- "IS- IS application-specific link" knob is disabled.
Obviously, we don't see any link attributes
TE IS Nbrs : Nbr : P4.00 Default Metric : 10 Sub TLV Len : 19 IF Addr : 10.0.0.15 Nbr IP : 10.0.0.14 Adj-SID: Flags:v4VL Weight:0 Label:524285 TE IS Nbrs : Nbr : P3.00 Default Metric : 10 Sub TLV Len : 19 IF Addr : 10.0.0.13 Nbr IP : 10.0.0.12 Adj-SID: Flags:v4VL Weight:0 Label:524284
2)
- RSVP is enabled
- "IS- IS application-specific link" knob is disabled.
Here we can see the standard scope of RSVP-TE link attributes. But I don't want to keep RSVP-TE in the enable state. So let's go ahead.
TE IS Nbrs : Nbr : P4.00 Default Metric : 10 Sub TLV Len : 76 IF Addr : 10.0.0.15 Nbr IP : 10.0.0.14 MaxLink BW: 99999997 kbps Resvble BW: 99999997 kbps Unresvd BW: BW[0] : 99999997 kbps BW[1] : 99999997 kbps BW[2] : 99999997 kbps BW[3] : 99999997 kbps BW[4] : 99999997 kbps BW[5] : 99999997 kbps BW[6] : 99999997 kbps BW[7] : 99999997 kbps Admin Grp : 0x100000 TE Metric : 10 Adj-SID: Flags:v4VL Weight:0 Label:524285 TE SRLGs : Nbr : P4.00 Lcl Addr : 10.0.0.15 Rem Addr : 10.0.0.14 Num SRLGs : 1 20 TE IS Nbrs : Nbr : P3.00 Default Metric : 10 Sub TLV Len : 76 IF Addr : 10.0.0.13 Nbr IP : 10.0.0.12 MaxLink BW: 99999997 kbps Resvble BW: 99999997 kbps Unresvd BW: BW[0] : 99999997 kbps BW[1] : 99999997 kbps BW[2] : 99999997 kbps BW[3] : 99999997 kbps BW[4] : 99999997 kbps BW[5] : 99999997 kbps BW[6] : 99999997 kbps BW[7] : 99999997 kbps Admin Grp : 0x400 TE Metric : 10 Adj-SID: Flags:v4VL Weight:0 Label:524284 TE SRLGs : Nbr : P3.00 Lcl Addr : 10.0.0.13 Rem Addr : 10.0.0.12 Num SRLGs : 1 10
3)
- RSVP is disabled
- "IS- IS application-specific link" knob is enabled.
This time we can see "TE app link attr" and "TE app SRLG". Pay attention to the flag SABM-flags: S. In this case Head-end can use link attributes for SR-TE LSP calculation.
TE IS Nbrs : Nbr : P4.00 Default Metric : 10 Sub TLV Len : 36 IF Addr : 10.0.0.15 Nbr IP : 10.0.0.14 TE APP LINK ATTR : SABML-flag:Non-Legacy SABM-flags: S MaxLink BW: 99999997 kbps Admin Grp : 0x100000 Adj-SID: Flags:v4VL Weight:0 Label:524285 TE APP SRLGs : Nbr : P4.00 SABML-flag:Non-Legacy SABM-flags: S IF Addr : 10.0.0.15 Nbr IP : 10.0.0.14 Num SRLGs : 1 SRLGs : 20 TE IS Nbrs : Nbr : P3.00 Default Metric : 10 Sub TLV Len : 36 IF Addr : 10.0.0.13 Nbr IP : 10.0.0.12 TE APP LINK ATTR : SABML-flag:Non-Legacy SABM-flags: S MaxLink BW: 99999997 kbps Admin Grp : 0x400 Adj-SID: Flags:v4VL Weight:0 Label:524284 TE APP SRLGs : Nbr : P3.00 SABML-flag:Non-Legacy SABM-flags: S IF Addr : 10.0.0.13 Nbr IP : 10.0.0.12 Num SRLGs : 1 SRLGs : 10
In the next chapter I'm going to investigate how Head-end can use TE DB.
SR-TE LSP
We should use CSPF calculation type for LSP when we want to add different path constraints. Nothing new.
In the first attempt we will examine LSP without CSPF for a better understanding of SR-TE calculation.
Let's create LSP from PE1 to PE2. Path is "totally loose".
/configure { router "Base" mpls path "loose" } /configure { router "Base" mpls path "loose" admin-state enable } /configure { router "Base" mpls lsp "to_PE1_no_CSPF" } /configure { router "Base" mpls lsp "to_PE1_no_CSPF" admin-state enable } /configure { router "Base" mpls lsp "to_PE1_no_CSPF" type p2p-sr-te } /configure { router "Base" mpls lsp "to_PE1_no_CSPF" to 10.10.10.10 } /configure { router "Base" mpls lsp "to_PE1_no_CSPF" primary "loose" }
*A:PE2# oam lsp-ping sr-te "to_PE1_no_CSPF" send-count 3 detail LSP-PING to_PE1_no_CSPF: 80 bytes MPLS payload Seq=1, send from intf to_P3, reply from 10.10.10.10 udp-data-len=32 ttl=255 rtt=5.78ms rc=3 (EgressRtr) Seq=2, send from intf to_P3, reply from 10.10.10.10 udp-data-len=32 ttl=255 rtt=5.55ms rc=3 (EgressRtr) Seq=3, send from intf to_P3, reply from 10.10.10.10 udp-data-len=32 ttl=255 rtt=6.77ms rc=3 (EgressRtr) ---- LSP to_PE1_no_CSPF PING Statistics ---- 3 packets sent, 3 packets received, 0.00% packet loss round-trip min = 5.55ms, avg = 6.03ms, max = 6.77ms, stddev = 0.528ms
12:52:12.013172 MPLS (label 20010, exp 0, [S], ttl 255) IP 10.10.10.20.49163 > 127.0.0.1.3503: LSP-PINGv1, MPLS Echo Request, seq 1, length: 48 12:52:13.016037 MPLS (label 20010, exp 0, [S], ttl 255) IP 10.10.10.20.49163 > 127.0.0.1.3503: LSP-PINGv1, MPLS Echo Request, seq 2, length: 48 12:52:14.026498 MPLS (label 20010, exp 0, [S], ttl 255) IP 10.10.10.20.49163 > 127.0.0.1.3503: LSP-PINGv1, MPLS Echo Request, seq 3, length: 48
Go ahead and let's enable CSPF calculation. And investigate how labels stack will be changed. LSP has "totally loose" path again.
/configure { router "Base" mpls lsp "to_PE1_CSPF" } /configure { router "Base" mpls lsp "to_PE1_CSPF" admin-state enable } /configure { router "Base" mpls lsp "to_PE1_CSPF" type p2p-sr-te } /configure { router "Base" mpls lsp "to_PE1_CSPF" to 10.10.10.10 } /configure { router "Base" mpls lsp "to_PE1_CSPF" path-computation-method local-cspf } /configure { router "Base" mpls lsp "to_PE1_CSPF" primary "loose" }
First thing, we can try to investigate control plane and how PE2 is calculating label's stack.
This output shows label's stack. But let's look inside IS-IS DB to fully understanding. We will cover all segments one by one.
show router mpls sr-te-lsp "to_PE1_CSPF" path detail | match "Actual Hops" post-lines 3
Actual Hops :
10.0.0.14(10.10.10.4)(A-SID) Record Label : 524284
-> 10.0.0.8(10.10.10.2)(A-SID) Record Label : 524287
-> 10.0.0.2(10.10.10.10)(A-SID) Record Label : 524287
The first segment is - PE2 - P4. PE2 takes own Adjacency label toward P4. PE2 doesn't install this label to LFIB as we will see in the further steps.
*A:PE2# show router isis database PE2.00-00 detail TE IS Nbrs : Nbr : P4.00 Default Metric : 10 Sub TLV Len : 30 IF Addr : 10.0.0.15 Nbr IP : 10.0.0.14 TE APP LINK ATTR : SABML-flag:Non-Legacy SABM-flags: S MaxLink BW: 99999997 kbps Adj-SID: Flags:v4VL Weight:0 Label:524284
The second segment is - P4-P2 - Adjacency label from P4 toward P2
*A:PE2# show router isis database P4.00-00 detail TE IS Nbrs : Nbr : P2.00 Default Metric : 10 Sub TLV Len : 30 IF Addr : 10.0.0.9 Nbr IP : 10.0.0.8 TE APP LINK ATTR : SABML-flag:Non-Legacy SABM-flags: S MaxLink BW: 99999997 kbps Adj-SID: Flags:v4VL Weight:0 Label:524287
The third segment is - P2-PE1 - Adjacency label from P2 toward PE1
*A:PE2# show router isis database P2.00-00 detail TE IS Nbrs : Nbr : PE1.00 Default Metric : 10 Sub TLV Len : 30 IF Addr : 10.0.0.3 Nbr IP : 10.0.0.2 TE APP LINK ATTR : SABML-flag:Non-Legacy SABM-flags: S MaxLink BW: 99999997 kbps Adj-SID: Flags:v4VL Weight:0 Label:524287
Let's take a look into LFIB. As we can see PE2 is going to use label stack which contains only two label/instructions - 524287/524287.
'A:admin@PE2# show router fp-tunnel-table 1 protocol sr-te =============================================================================== IPv4 Tunnel Table Display Legend: label stack is ordered from bottom-most to top-most B - FRR Backup =============================================================================== Destination Protocol Tunnel-ID Lbl NextHop Intf/Tunnel Lbl (backup) NextHop (backup) ------------------------------------------------------------------------------- 10.10.10.10/32 SR-TE 655363 524287/524287 10.0.0.14 SR ------------------------------------------------------------------------------- Total Entries : 1 -------------------------------------------------------------------------------
Now we will follow datapath.
PE2 doesn't push the first label - 524284 (look at the out-label column) and just sends frame with two label toward P4.
*A:PE2# tools dump router segment-routing tunnel in-label 524284
===================================================================================================
Legend: (B) - Backup Next-hop for Fast Re-Route
(D) - Duplicate
label stack is ordered from top-most to bottom-most
===================================================================================================
--------------------------------------------------------------------------------------------------+
Prefix |
Sid-Type Fwd-Type In-Label Prot-Inst(algoId) |
Next Hop(s) Out-Label(s) Interface/Tunnel-ID |
--------------------------------------------------------------------------------------------------+
10.0.0.14
Adjacency Transit 524284 ISIS-0
10.0.0.14 3 to_P4
--------------------------------------------------------------------------------------------------+
No. of Entries: 1
--------------------------------------------------------------------------------------------------+
P4 pops top label 524287 and forwards data units to P2.
A:P4# tools dump router segment-routing tunnel in-label 524287 =================================================================================================== Legend: (B) - Backup Next-hop for Fast Re-Route (D) - Duplicate label stack is ordered from top-most to bottom-most =================================================================================================== --------------------------------------------------------------------------------------------------+ Prefix | Sid-Type Fwd-Type In-Label Prot-Inst(algoId) | Next Hop(s) Out-Label(s) Interface/Tunnel-ID | --------------------------------------------------------------------------------------------------+ 10.0.0.8 Adjacency Transit 524287 ISIS-0 10.0.0.8 3 to_P2 --------------------------------------------------------------------------------------------------+ No. of Entries: 1 --------------------------------------------------------------------------------------------------+
P2 makes the same action - pops top label 524287 and forwards to PE1. It's a finish of packet journey.
A:P2# tools dump router segment-routing tunnel in-label 524287 =================================================================================================== Legend: (B) - Backup Next-hop for Fast Re-Route (D) - Duplicate label stack is ordered from top-most to bottom-most =================================================================================================== --------------------------------------------------------------------------------------------------+ Prefix | Sid-Type Fwd-Type In-Label Prot-Inst(algoId) | Next Hop(s) Out-Label(s) Interface/Tunnel-ID | --------------------------------------------------------------------------------------------------+ 10.0.0.2 Adjacency Transit 524287 ISIS-0 10.0.0.2 3 to_PE1 --------------------------------------------------------------------------------------------------+ No. of Entries: 1 --------------------------------------------------------------------------------------------------+
Head-end calculates full path to target FEC when we use local-cspf method. It looks like RSVP-TE and ERO object. Head-end builds full instruction and using Adjacency segments for this purposes. We've examened only LSP with loose path, but the control/data plane concepts will be the same and for LSPs with different path constraints:
1) Routers fill IGP and TE DB
2) Head-end calculates label stack according to path constraints
3) Head-end pushes labels and forwards data units toward target FEC
4) Transit LSRs make typical work - look into LFIB and forward data units
Conclusion
I'm scrolling down and see a lot of CLI outputs which can be boring, but I was trying to explain control and data plane of SR as clear as possible. And again, I advice - "to get your hands dirty!". Create you own labs and it will help understand SR concepts(or any other concepts) better than every book or blog post. No one knows where you will have to face with a new technology.
SR remains to be a hot topic and development is still going. There are a lot of interesting features of SR - SR-policies, ECMP, T-LFA, seamless BFD, etc, which remained unanswered. Always waiting feedback and thank you for reading.
Comments
Post a Comment