Author: Tim Fiola
Synopsis: this article will discuss the parts of a network model, how the network model works, and why a network model is necessary and strategic.
Let's briefly touch on that last point first: why do I need a network model? Modern, carrier-class wide area networks today are a strategic asset for a company and can cost millions and sometimes billions of dollars to build and millions more to maintain. As these networks have grown and become increasingly meshy, understanding failover becomes much less intuitive, as does understanding where it makes sense to augment. If there's a link break between KCY and DEN, where will the traffic from SEA to MIA route? Will the SEA to WDC traffic fail over to that same path, or take a different path? What existing links on the network warrant an augment? What is driving utilization on a given link? If I want to modify my topology by adding a link, where is the most cost-effective place to do that? How large should that new link be?
Adding to this uncertainty is the often-used RSVP auto-bandwidth LSP overlay, wherein RSVP LSPs can make reservations for a certain amount of bandwidth across the entire network or a subset of the network (RSVP and how it works is outside the scope of this article, but you can always purchase the book This Week: Deploying MPLS* that I wrote with my buddy Jamie Panagos on Amazon.com or download the FREE pdf here or from Juniper's website if you have a login). These RSVP LSPs, when faced with an increase or decrease in required bandwidth or a change in topology, can reroute their paths such that traffic from LAX to NYC can be taking a specific path across the network at one moment, and be taking a very different path the next moment. Assuming there's a large amount of LAX to NYC traffic, it's in the carrier's best interest to understand what path that traffic will take across the network during steady state and failover.
*The ebook costs $1.99 (paper versions are also available for varying prices). For each book purchased I receive exactly $0.00. So, I figure if it sells one million copies I should get about . . . . what's one million times $0.00? And DON'T say zero. The pdf is free, so the same math applies.
Enter the network model, the solution that can answer questions like the ones above. The network model itself is a strategic asset because it can help answer questions like the ones above and more, allowing carriers to make better decisions about how to plan and maintain the network. It can give Planners, Architects, Engineers, and Operations the information and data to make decisions based on evidence from simulations, versus (sometimes educated) guessing and spreadsheets. In the next sections, we'll examine the parts of the network model and how those parts work together to produce the simulation results that can answer those questions.
A network model captures the state of the network and allows users to perform simulations of changes in that state. By state of the network, I mean all information required to derive how each traffic demand and LSP will route during steady state and failover such that the amount of traffic on each RSVP LSP (if applicable) and interface can be derived programatically. Below is a list of items that are commonly needed to capture network state in a model:
The above is not meant to be a comprehensive list, and the requirements for each to be present in a model entirely depends on the use cases for the model. For example, if the model's purpose is only to understand layer 3 failover, independent of layer 1 failures, then the layer 1 items are not necessary. Nor would the RSVP/LSP items be required for an IGP routed network.
A network model requires two main inputs, the topology and the traffic matrix. We'll examine what each of those is and how they work together.
In reality, the items listed above for understanding network state comprise the much of the information about the topology of the network.
Here is a sample layer 3 topology for a network:
Understand that the topology pictured only includes the layer 3 adjacencies. A modeled topology would include information for the entire state of the network. Items such as RSVP LSP paths and traffic carried by each LSP for a given time period would need to be discovered and incorporated into the topology model, along with the basics such as layer 3 link metrics. Many commercially-offered network modeling solutions provide the capability to discover some or all of the network topology items listed above and create a topology model that reflects the network state.
The wide area network (WAN) traffic matrix describes how much traffic is flowing from a given source to a given sink on the WAN. Here is a simle example for a traffic matrix for the sample network:
A simple, sample traffic matrix. Each entry on the traffic matrix is called a demand
This traffic matrix example lists the source nodes vertically on the left and the destination nodes horizontally at the top. Where those nodes intersect on the graph shows the amount of traffic sourced from a specific node and the traffic's desintation. For example, there are 2,389 Mpbs of traffic sourced from LAX destined to NYC. A traffic matrix can be more complex than this and take on a different format (such as modeling different traffic classes), but such complexities are not necessary here.
The traffic matrix tends to change over time. For example, the amount of traffic from LAX to NYC may be relatively small at 6am LAX time, but that amount of traffic would surely be much greater at 10:30am LAX time as the business day progresses. The path(s) that LAX to NYC traffic transits the network may be very different between those times as well. As such, when someone refers to your network's traffic matrix, it's entirely appropriate to ask for a certain time period. And, because I know enough to know that I don't know it all, I'm quite certain that the data science community would have some input as to what other types of statistical values would populate a traffic matrix. For the purposes of this article, we will use the simple traffic matrix and assume it's a snapshot in time.
A fair question at this point is how to create the traffic matrix. The answer to that question alone is a book unto itself and getting a reliable traffic matrix is indeed the toughest part of this whole exercise. There are a few general methods to do this:
Once we have the topology and traffic matrix information, we are ready to run a simulation. Not mentioned explicitly until now is the need for the simulation engine (the third part of the model). There are many commercial options offered; this can be coded/produced internally as well. This engine must be capable of reading the topology and traffic matrix. Once that is done, generating a network simulation is simply a matter of applying the traffic matrix entries to the topology. The behavior of routers for any given topology is very well understood and fairly trivial to code, so as the traffic matrix is applied to the topology, the simulated routers in the engine make the same decisions as a real router would. As each traffic matrix entry is processed, the simulation takes shape. After the model converges, it should resemble very closely the state of the actual network for that time period. Markers for this would include:
A quick note, before we talk about simulations, regarding the difference between simulation and emulation. In short, a simulation attempts to accurately replicate the end state; it does not necessarily attempt to replicate how that end state is achieved. In contrast, emulation attempts not only to get to the accurate end state, but also replicate how exactly that end state is achieved. An example to illustrate this: during my time as a Sales Engineer for a network modeling/simulation product, network engineers would sometimes ask how the model handles RSVP timers associated with LSP resignaling and IGP timers associated with adjacency formation. The answer is this: a simulator does not need to account for these things to arrive at a correct converged state; a model can arrive at an accurate converged state without accounting for specific timers for things such as exactly when a given LSP is resignaled. A network model that simulates the end state after a failure can follow any process to get to that end state, as long as it matches the end state as it would be in reality. A network model that emulates the network will have to account for all the associated timers (among other things) and will attempt to replicate every step along the way to reach the end state; this can be expensive in terms of time and computation.
Illustrating this with an example, take a layer 3 OSPF network with an RSVP overlay, link-bypass LSPs, and LSP metrics defaulting to the shortest path between their source and destination routers. There are 4 phases this type of network can go thru after a link failure:
An emulator would attempt to go thru each of these phases, to include the resignaling timers for the LSPs and any relevant OSPF timers. This can be computationally intensive and take time. A simulator, on the other hand, can derive the end state after the failure and jump right to that result, skipping the intermediate steps. However, understanding the network state at each of those intermediate phases can be very interesting and valuable, since each phase can have a markedly different associated state. A simulator can be configured to model each of those phases if that information is valuable, however. In this type of scenario, we see the simulator, by making more granular incremental simulations on the way to the end state, acting more like an emulator. Looking at it this way, you could start to argue that a simulator with enough granularity in the incremental stages of convergence, starts to take on attributes of an emulator.
Now, with an understanding of our network model and what a simulation is, let's discuss failure simulations. In base terms, a simulated failure or group of failures in the model is simply a topology change. Understanding that, creating a failure simulation is simply presenting a new topology to the traffic matrix. The results of that simulation, however, can provide amazing insights as to how the network will act under a failure or a series of simultaneous failures. The results of these simulations can often be counter-intuitive and/or impossible to comprehend without a model, but that's why they provide so much value: with a complex traffic matrix and a network with more than a handful of links, it's almost impossible to understand how that network will converge around a given failure by simply using a spreadsheet, understanding of the network topology, and intuition. With the knowledge provided by the model, Planners, Engineers, and Operators can start to address actual problems in the network that they did not even know existed and could not even comprehend before.
Another valuable use case for the network model is to simulate topology changes. In the sample topology above, notice that there is an express link between SJC and WDC. If you were considering adding that express link to the network, a network model can answer some very important questions to let you determine if such a change makes sense:
Another powerful use case for the network model is understanding what traffic is driving utilization on an interface. A simple example would be an interface showing 90% egress traffic utilization. What is driving the utilization on that interface? Since a network model derives its results by routing each traffic matrix entry (called a demand) thru the modeled topology, it has the data for every path that a demand takes across the topology, and thus it understands all the demands that egress a given interface in the model. So a user could, for example, fail a link in the model, see high utilization on a given interface and determine the top 3 demands driving the high utilization. This type of insight is often impossible without a network model.
Another use case for the network model is planning for additional traffic increments. For example, you have just signed a new customer and are planning for that customer's traffic to result in an additional peak of 1,200 Mbps (1.2 Gbps) from NYC to WDC. It may in fact be trivial to understand how that may affect steady state utilization and which links would carry that traffic (but then again, with RSVP LSPs, it may NOT). But what about failover? If the NYC-WDC link fails, how will that traffic fail over? What other traffic may drop as a result of that traffic failing over to a given path? Could that NYC-WDC failure affect traffic on the West Coast? A network model can answer these questions, and the answers are often surprising.
Finally, consider organic traffic growth. If an operator is expecting 5% organic traffic growth across the network for the next year, how do they plan for that? With a network model, this exercise is fairly trivial: simply increment the magnitude of each demand in the traffic matrix by 5% and run the simulation. More complex scenarios can involve increasing different demands by different percentages. The resultant simulation should go a great way in helping to understand where to augment in steady state. Running failure simulations on that model can then lend insight as to where to augment for failures.
A major pain point for any network operator is documenting the network: what is out there? How big is the direct link between CHI and DEN? What is the metric on any given interface? How many LSPs are provisioned between any two given points? It goes on and on. Often times, some unlucky soul is tasked to document the network, an almost impossible task due to the constant stream of maintenances, augments, and decommissions required on a wide area network of any reasonable size. Very often this situation devolves into this: the person tasked with documenting the network maintains a combination of spreadsheets, word documents, and visio diagrams on the network. However, due to the constant changes on the network, these documents are never up to date and become untrusted. In order to get their questions answered, people just eventually resort to logging into routers to get the information they need. As Raymond Hettinger would say, there MUST be a better way!
The network model is the better way. Some of the off-the-shelf network modeling solutions provide a dynamic network discovery process wherein they discover the network elements, collect the required data, and create a model multiple times in a day. For example, Cisco MATE can create multiple models per day, sometimes up to 48 models throughout the day. Each new model reflects the new state of the network, so any changes will be picked in the next model created. If the network model has a reasonable user interface (UI), the network becomes its own documentation: each new model contains the updated network state with no human intervention or need for an engineer to manually do anything, with the information in each model easily accessible and viewed using the UI. For example, if an engineer is tasked with finding the deployed capacity between NYC and WDC, finding how many LSPs are provisioned between the two sites, and then determining if a link metric change makes sense, she can pull up the most recent model and perform all these tasks in minutes, versus hours via legacy methods. A network modeling solution that periodically creates updated models frees the people charged with documenting the network to better spend time in more technical matters where they can better use their expertise.
For many network operators, it is simply not an option anymore to throw money and people at the problem of understanding the production WAN; this only leads to waste, frustration, and a network that may not be any more resilient. A network model is a force multiplier in that it allows Planners, Engineers, Operations, and others to ask questions and get answers that may not otherwise be available or available only after a good amount of work and time. With an asset worth millions and perhaps billions in capital costs, operators need a solution that allows agile and intelligent planning and operation of that asset. The network model is a critical piece in that solution.
There are several off the shelf products for network modeling. Cisco (formerly Cariden) MATE, Juniper WANDL, and Aria Networks' products are the three network modeling vendors I am currently aware of. For full disclosure, I did formerly work for Cariden and then Cisco via the Cariden acquisition.
You can also take a look at my open source network modeling engine (pyNTM). There is full documentation and simple training modules are available.
BACK TO TOP