ECMP next hop on Juniper M T and SRX series routers

If like me, you have to jump around customer requirements, you may one day find yourself in a situation where you need to utilise capacity on 2 or more links between locations. My preference is to bond my uplinks with 802.1ax/802.3ad/LACP and let the upstream provider deal with the rest. Sometimes the providers let you down and can do nothing. They cant run LACP from their edge device to you, and they can’t transit your LACP frames so that you can perform your own LACP between locations. Also sometimes you have multiple links for different providers.

In this situation your last resort is Equal Cost Multi Path(ECMP) next hop. If you have a 2 or more routes in your routing table with exactly the same metrics and there are none that are more preferred, an ECMP decision is triggered. On Juniper routing platforms this is quite rudimentary in that one of the routes will be chosen (at random or based on src/dst hashes) for a particular route and installed in the FIB (the hardware forwarding engine). This means that the effectiveness of the traffic spread is limited to the number of routes in your table in a particular direction.

My typical implementation involves running OSPF between routers on each link with identical metrics. From the “remote” end of the network I do not aggregate the advertised prefixes, as this would reduce the pool of routes, and instead advertise all prefixes individually. This is often a whole bunch of /32 point to point customer IP’s and this is also partially why I choose to use OSPF for this.

Advertising from the core however is a bit more of a problem. Here typically you are advertising mainly the default route. There may be some peering routes that you have on either end and you may include those too, but typically you do not want to be sending a full table to some remote end of the network, as usually the reason you are here in the first place is that you are resource constrained.

The practical upshot is that traffic will balance ok in the direction towards the “remote” node, but very little or not at all inbound from the “remote” node. Typically this is the “download” direction and usually the direction most of the load is in in any case, but our situation is not ideal.

To achieve a better spread, and to not have to worry to much about how many routes you are using, you need to implement a policy on the forwarding table. I know it sounds like I made that up, but yes, thats a real thing. If you do not do this then your traffic spread/diversity will be constrained by the points discussed above.

So we create the policy..

set policy-options policy-statement my-default-balancing-policy then load-balance consistent-hash

And then apply it to the forwarding table..

set routing-options forwarding-table export my-default-balancing-policy

This will now let your traffic use all equal routes instead of just the selected one.

Your 2 balancing options are consistent-hash and per-packet. Per packet will send packets down each link in a round robin fashion and will result in nearly perfect load spread. However, this will cause out of order packet delivery between the sites as there will always be performance differences on the links which is why I never use it. The performance impact of out of order packets, on TCP specifically, is significant. The consistent-hash looks at the traffic IP source, destination and protocol fields and uses those values to calculate which link to use. This is good at keeping traffic flows on one path and packet delivery consistent.

ECMP algorithm choice on the MX series platform is performed quite differently, but many of the points discussed above are still valid. This is to be expected as the MX is a routing and switching platform so hashing at multiple layers is possible (L2/L3/L4) There are many more options to consider and we will leave that for another time.

A final note, the above hash looks at L3 information as a key for hashing and on an MPLS enabled network this may not be enough. You can also set ECMP options for MPLS with the following statement.

set chassis maximum-ecmp 16

Options are 16/32/64 and allow for up to that many alternate LSP to load balance across (thats if you have multiple LSP’s to your destination).