How to ensure business continuity through WAN redundancy

Written by Daniel Noworatzky | Sep 18, 2019 2:48:00 PM

For most organizations these days, WAN connectivity is so mission-critical that investing in a redundant WAN infrastructure is well worth the extra cost and effort. In this article, we explore the available options for WAN redundancy so you can help your customers decide which is the best for them.

WAN technologies

The term Wide Area Network (WAN) is used to refer to the physical circuit and the related technology through which connectivity to the internet is achieved for a particular building or campus. The WAN also refers to the technology through which remote sites are interconnected. WAN technologies that deliver internet connectivity include cable modem, xDSL, fiber optic connections, and wireless microwave links. WAN technologies used to interconnect remote sites in a multisite deployment include VPNs, Metro Ethernet, and MPLS. Regardless of what technology or medium is being used, and regardless of whether the purpose is for internet or inter-site connectivity, the WAN is the gateway of the network to the outside world. In all cases, its availability is critical for internal users to achieve connectivity with any network entities outside of the local network.

The physical location of the WAN is always at the edge of the local network; that is, the border between the internal enterprise network and the outside world.

Why redundancy?

Why bother implementing a redundant WAN topology? Isn’t it the ISP’s responsibility to provide you with a resilient network? Well, yes, but no matter how robust an ISP’s network is, there is always the possibility of it failing. In addition, it may be a component of your network that fails, rather than the ISP. Redundancy is necessary to ensure that no matter what happens (power failure, edge router failure, ISP failure) connectivity to the outside world as well as connectivity from the outside world is maintained. A disruption in WAN services can be costly. For example, if a company with 1,000 employees relying on the internet or VoIP services experiences 15 minutes of WAN downtime, it would translate into 250 lost man-hours. If each hour of work costs $50 on average, that’s $12.5K down the drain. The larger the company, the bigger the potential losses, and the more beneficial redundancy becomes.

Think of it as an insurance policy. In the event of a failure, there will be no downtime, simple as that.

WAN redundancy deployment scenarios

WAN redundancy can be applied in multiple ways, and it depends on the type of WAN connection that is being used. These scenarios include:

SD-WAN

Software-defined WAN technology is one way of leveraging WAN redundancy. It involves the interconnection of multiple WAN links onto one or more SDN (software-defined networking) devices. The SDN device employs algorithms to appropriately distribute WAN traffic across all links, which results in both load balancing and redundancy.

Multiple WAN connection scenarios

Having two or more WAN connections increases the redundancy for outbound services (services existing on the internet). Think web, email, cloud services, and hosted VoIP services, to name a few. Such a scenario requires either equal-cost routing, a feature of dynamic routing protocols that distributes traffic between multiple connections, or a gateway redundancy protocol such as HSRP or VRRP that allows internal devices to connect via multiple WAN links.

There are several WAN configurations that can be applied at the edge of the network to deliver various levels of redundancy. These configurations are further described below:

Single homed – This is a typical connection where you have a single edge device on the enterprise network connected to a single ISP device. Such a configuration does not provide redundancy at any level.

Dual homed – This scenario has a second physical connection between the enterprise edge device and the ISP’s device. This scenario provides redundancy only in the event that one of the links fails.

Additional dual homed scenarios can be seen below that increase redundancy through the duplication of edge and ISP devices. Notice however, that the service is still provided by a single ISP.

Single multi-homed – This scenario involves a single enterprise edge device that connects to two independent ISPs. If one of the two ISPs fails, redundancy is maintained.

The following is also a single multi-homed scenario that employs redundant enterprise edge devices. This scenario increases redundancy at the enterprise edge device.

Dual multi-homed – This is the scenario with the highest level of redundancy, since it has dual connections to each of two ISP circuits.

Additional dual multi-homed scenarios can be seen below with redundant enterprise network edge devices.

The duplication that takes place at each part of the above topologies adds to the level of redundancy provided. By duplicating the enterprise edge devices, in the event of a device failure, connectivity will not be lost. By duplicating the links between each edge device and ISP router, in the event of a link failure, connectivity will not be lost. By using two independent ISPs, even if the whole network of one ISP fails (which has been known to happen more than once), connectivity will not be lost.

Needless to say, as the number of redundant components increases, both the equipment procurement and subscription costs increase. For this reason, the appropriate redundancy topology should be chosen for the requirements of the organization in question.

Special treatment of incoming traffic

While redundancy is relatively simple to deploy for outbound traffic, the issue becomes more complicated for inbound traffic. Redundancy is necessary for incoming traffic when an organization hosts services on its internal network, such as a web portal, a software update service, or even a voice server that is made available to external users on the internet at large. When an outside host attempts to communicate with an internal service, which route does it choose?

The routing of incoming connections to internal services is primarily configured using the Border Gateway Protocol (BGP), which is the fundamental routing protocol that runs the internet. The ISP equipment communicates with the customer’s equipment using BGP. This is why it is important to negotiate with the ISPs the routing policy by which external users will be routed to internal services.

Enterprise and ISP edge devices communicate with each other using BGP to define the primary and secondary (backup) routes for incoming traffic, based on the agreed-upon policies. An example of this can be seen below.

In this way, if one ISP fails, the second ISP is still an option for incoming traffic.

Questions to ask to obtain the most suitable solution

In order to determine the most appropriate solution, questions like the following should be answered:

How much does a WAN disruption cost me?
If I choose to use two different ISPs:
- Which ISP connection will be the primary one?
- Will both be used simultaneously for incoming traffic?
- Will the load be equally balanced between the two?
If I choose to use a single ISP:
- Can I afford to duplicate my edge equipment and my edge links?
- Can I afford to ask the ISP to duplicate their edge equipment?

Other considerations

Beyond the redundant design of the edge of your network, as well as the reliability of the ISP’s network, it is also important to duplicate other aspects of an organization’s infrastructure to ensure redundancy. These include:

Redundant power – Network equipment must be powered by a UPS (uninterruptible power supply), which in turn is powered by a generator that supplies electricity during power outages. This is especially important in regions where extreme weather or a poor-quality power grid results in prolonged periods without power.
Redundant power supplies – Individual network components such as routers and switches should have two or more power supplies. In the event one fails, the network device can continue to function by drawing power from the backup supply.
Multiple physical links should take alternate routes – Many of the above scenarios involve multiple physical links between devices. These links should take different routes, so if a cable gets cut or is damaged, only one of the two redundant links is disrupted.

Conclusion

There are a multitude of options when it comes to WAN redundancy. For both incoming and outgoing traffic, as well as for inter-site connectivity, it is vital to maintain a functioning WAN for mission-critical services. By applying these principles, and weighing the cost against the potential cost of WAN failures, the appropriate WAN redundancy scenario can be designed that will best suit your requirements.