Network Working Group Greg Bernstein Internet Draft Grotto Networking Intended status: Informational Young Lee Huawei July 16, 2012 Use Cases for High Bandwidth Query and Control of Core Networks draft-bernstein-alto-large-bandwidth-cases-02.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 16, 2011. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Bernstein & Lee, et al. Expires January 16, 2013 [Page 1] Internet-Draft Cross Stratum Optimization Use-cases July 2012 carefully, as they describe your rights and restrictions with respect to this document. Abstract This draft describes two generic use-cases that illustrate application layer traffic optimization applied to high bandwidth core networks. The type of information and interactions needed to perform various optimizations is described. In addition extensions to the existing ALTO protocol widely applicable to any high bandwidth applications are suggested. These include bandwidth constraint representations for a diverse range of control and data plane technologies as well as advanced filtering based on constraints. Table of Contents 1. Introduction...................................................3 1.1. Computing Clouds, Data Centers, and End Systems...........4 2. End System Aggregate Networking................................5 2.1. Aggregated Bandwidth Scaling..............................5 2.2. Cross Stratum Optimization Example........................6 2.3. Data Center and Network Faults and Recovery...............7 3. Data Center to Data Center Networking..........................8 3.1. Cross Stratum Optimization Examples.......................9 3.2. Network and Data Center Faults and Reliability............9 4. Cross Stratum Control Interfaces..............................10 5. Potential ALTO Protocol Extensions............................11 6. Bandwidth Constraint Information..............................12 6.1. Introduction.............................................12 6.1.1. Example Network: Providers View.....................13 6.2. Data and Control Plane Path Choices......................14 6.3. ALTO Extensions..........................................15 6.3.1. Mutually Constrained Paths..........................15 6.3.1.1. Simple IP Network Example......................16 6.3.1.2. TDM Network Example............................16 6.3.1.3. JSON Encoding..................................18 6.3.2. Cost-Capacity Graphs................................18 6.3.2.1. Simple TDM Example with Graph Reduction........19 6.3.2.2. Ethernet MSTP Example with Multiple Graphs.....20 6.3.2.3. JSON Encoding..................................23 7. Constraint Based Filtering....................................24 8. Conclusion....................................................24 9. Security Considerations.......................................24 10. IANA Considerations..........................................25 11. References...................................................25 11.1. Informative References..................................25 Bernstein & Lee Expires January 16, 2013 [Page 2] Internet-Draft Cross Stratum Optimization Use-cases July 2012 Author's Addresses...............................................27 Intellectual Property Statement..................................27 Disclaimer of Validity...........................................27 1. Introduction Cloud Computing, network applications, software as a service (SaaS), Platform as a service (PaaS), and Infrastructure as a Service (IaaS), are just a few of the terms used to describe situations where multiple computation entities interact with one another across a network. When the communication resources consumed by these interacting entities is significant compared with link or network capacity then opportunities may exist for more efficient utilization of available computation and network resources if both computation and network stratums cooperate in some way. The application layer traffic optimization (ALTO) working group is tackling the similar problem of "better-than-random peer selection" for distributed applications based on peer to peer (P2P) or client server architectures [1]. In addition, such optimization is important in content distribution networks (CDNs) as illustrated in [2]. In the network stratum, particularly at the lower layers such as MPLS and optical, there are many restoration and recovery mechanisms to deal with network faults. The emergence of network based applications or cloud based disaster recovery/business recovery brings a new dimension to fault management, but also opportunities to more efficiently deliver higher levels of reliability. For example, the reliability requirements for mission critical applications are typically quantified by two key time parameters. The first is the Recovery Time Objective (RTO) which is the time to get the application back up and functioning and is similar to network recovery time notions. The second is the Recovery Point Objective (RPO) which quantifies in terms of time the amount of data loss that can be tolerated when a disaster occurs. Different applications and organizations can have greatly different demands from miliseconds to 12 hours. In addition, the amount of data that may need to be transferred to meet these objectives can vary greatly amongst different application types. With recover point objectives of, say an hour or more, a dynamic optical network layer could be very efficiently shared so as to reduce the overall cost to achieve a given layer of reliability. However, to do so requires cooperation between application and network stratum. General multi-protocol label switching (GMPLS) [3] can and is being applied to various core networking technologies such as SONET/SDH and wavelength division multiplexing (WDM) [4]. GMPLS provides Bernstein & Lee Expires January 16, 2013 [Page 3] Internet-Draft Cross Stratum Optimization Use-cases July 2012 dynamic network topology and resource information, and the capability to dynamically allocate resources (provision label switched paths). Furthermore, the path computation element (PCE) [5] provides for traffic engineered path optimization. However, neither GMPLS nor PCE provide interfaces that are appropriate for an application layer entity to use for the following reasons: . GMPLS routing exposes full network topology information which tends to be proprietary to a carrier or require specialized knowledge and techniques to make use of, e.g., the routing and wavelength assignment (RWA) problem in WDM networks [4]. . Core networks typically consist of two or more layers, while applications are typically only know about the IP layer and above. Hence applications would not be able to make direct use of PCE capabilities. . GMPLS signaling interfaces are defined for either peer GMPLS nodes or via a user network interface (UNI) [6]. Neither of these are appropriate for direct use by an application entity. In this paper we discuss two general use-cases that can generate core network flows with significant bandwidth and may vary significantly over time. The "cross stratum optimization" problems generated by these use cases are discussed. Finally, we look at interfaces between the application and network "stratums" that can enable these types of optimizations and how they can be created via extensions to the current ALTO protocol[7]. 1.1. Computing Clouds, Data Centers, and End Systems While the definition of cloud computing or compute clouds is somewhat nebulous (or "foggy" if you will) [8], the physical instantiation of compute resources with network connectivity is very real and bounded by physical and logical constraints. For the purposes of this draft, we will call any network connected compute resources a data center if its network connectivity is significant compared either to the bandwidth of an individual WDM wavelength or with respect to the network links in which it is located. Hence we include in our definition very large data centers that feature multiple fiber access and consume more than 10MW of power, moderate to large content distribution network (CDN) installations located in or near major internet exchange points, medium sized business centers, etc... Bernstein & Lee Expires January 16, 2013 [Page 4] Internet-Draft Cross Stratum Optimization Use-cases July 2012 We will refer to those computational entities that don't meet our bandwidth criteria for a data center as an "end system". 2. End System Aggregate Networking In this section we consider the fundamental use case of end systems communicating with data centers as shown in Figure 1. In this figure the "clients" are end systems with relatively small access bandwidth compared to a WDM wavelength, e.g., under 100Mbps. We show these clients roughly partitioned into three network related end user regions ("A", "B", and "C"). Given a particular network application, in a static network application situation, each client in a region would be associated with a particular data center. Region B +---------+ +------+ | Data | |Client| |Center 2 | | B1 |+------+ +------+ +----+----+ +--+---+|Client| |Client| | / | B2 | | A1 `. _.-+--------+-. +--+---+ Region A +------+ `-. ,-'' `--. / ... +------+ ,`: `+. +------+ |Client| / \ |Client| | A2 +------+ \---+ BM | +------+ ( Network ) +------+ ... .-' / +------+ _.-' \ `. |Client|.-' `=. ,-' `. | AN | _.-'' `--. _.-\ +---`.----+ +------+ +----'----+ `----+------+'' \ | Data | | Data | | \ | |Center 3 | |Center 1 | +--+---+ +--+---+ \ +---------+ +---------+ |Client| |Client| \------+ | C1 | | C2 | |Client| +------+ +------+ | CK | Region C +------+ Figure 1. End system to data center communications. 2.1. Aggregated Bandwidth Scaling One of the simplest examples where the aggregation of end system bandwidth can quickly become significant to the "network" is for video on demand (VoD) streaming services. Unlike a live streaming service where IP or lower layer multicast techniques can be generally applied, in VoD the transmissions are unique between the data center and clients. For regular quality VoD we'll use an Bernstein & Lee Expires January 16, 2013 [Page 5] Internet-Draft Cross Stratum Optimization Use-cases July 2012 estimate of 1.5Mbps per stream (assuming H.264 coding), for HD VoD we'll use an estimate of 10Mbps per stream. To fill up a 10Gbps capacity optical wavelength requires either 6,666 or 1,000 clients for regular or high definition respectively. Note that special multicasting techniques such as those discussed in [9] and peer assistance techniques such as provided in some commercial systems [10] can reduce the overall network bandwidth requirements. With current high speed internet deployment such numbers of clients are easily achieved; in addition demand for VoD services can vary significantly over time, e.g., new video releases, inclement weather (increases number of viewers), etc... 2.2. Cross Stratum Optimization Example In an ideal world both data centers and networks would have unlimited capacity, however in actuality both can have constraints and possibly varying marginal costs that vary with load or time of day. For example suppose that in Figure 1 that Data Center 3 has been primarily serving VoD to region "C" but that it has, at a particular period in time, run out of computation capacity to serve all the client requests coming from region "C". At this point we have a fundamental cross stratum optimization (CSO) problem. We want to see if we can accommodate additional client request from region "C" by using a different data center than the fully utilized data center #3. To answer this questions we need to know (a) available capacity on other data centers to meet a request, (b) the marginal (incremental) cost of servicing the request on a particular data center with spare capacity, (c) the ability of the network to provide bandwidth between region "C" to a data center, and (d) the incremental cost of bandwidth from region "C" to a data center. Bernstein & Lee Expires January 16, 2013 [Page 6] Internet-Draft Cross Stratum Optimization Use-cases July 2012 Region B +---------+ +------+ | Data | |Client| |Center 2 | | B1 |+------+ +------+ +----+----+ +--+---+|Client| |Client| | / | B2 | | A1 `. _.-+--------+-. +--+---+ Region A +------+ `-. ,-'' XXXXX XX `--. / ... +------+ ,`: ``---..__ XXXX `+. +------+ |Client| / X | ```--XX \ |Client| | A2 +------+..X`. \ XX--+---+ BM | +------+ ( X `-/ \ ) +------+ ... .-' .' | +----.X / +------+ _.-' \ X/ \ | X `. |Client|.-' `=.X \ XXXX ,-' `. | AN | _.-'' `--. XXXXXXXXX _.-\ +---`.----+ +------+ +----'----+ `----+------+'' \ | Data | | Data | | \ | |Center 3 | |Center 1 | +--+---+ +--+---+ \ +---------+ +---------+ |Client| |Client| \------+ | C1 | | C2 | |Client| +------+ +------+ | CK | Region C +------+ Figure 2. Aggregated flows between end systems and data centers. In Figure 2 we show a possible result of solving the previously mentioned CSO problem. Here we show the additional client requests from region "C" being serviced by data center #2 across the network. Figure 2 also illustrates the possibility of setting up "express" routes across the network at the MPLS level or below. Such techniques, known as "optical grooming" or "optical bypass"[11],[12] at the optical layer, can result in significant equipment and power savings for the network by "bypassing" higher level routers and switches. 2.3. Data Center and Network Faults and Recovery Data center failures, whether partial or complete, can have a major impact on revenues in the VoD example previously described. If there is excess capacity in other data centers within the network associated with the same application then clients could be redirected to those other centers if the network has the capacity. Moreover, MPLS and GMPLS controlled networks have the ability to reroute traffic very quickly while preserving QoS. As with general network recovery techniques [13] various combinations of pre- Bernstein & Lee Expires January 16, 2013 [Page 7] Internet-Draft Cross Stratum Optimization Use-cases July 2012 planning and "on the fly" approaches can be used to tradeoff between recovery time and excess network capacity needed for recovery. In the case of network failures there is the potential for clients to be redirected to other data centers to avoid failed or over utilized links. 3. Data Center to Data Center Networking There are a number of motivations for data center to data center communications: on demand capacity expansion ("cloud bursting"), cooperative exchanges between business partners, offsite data backup, "rent before building", etc... In Figure 3 we show an example where a number of businesses each with an "internal data center" contracts with a large external data center for additional computational (which may include storage) capacity. The data centers may connect to each other via IP transit type services or more typically via some type of Ethernet virtual private line or LAN service. +-------------------+ | | | Large Data Center | | | +----------+--------+ | _.+-----------. ,--'' `---. ,-' `-. ,' `. ,' `. +--------+ ; Network : |Business| __..+ | | #1 DC +-' : ; +--------+ `. ,' `. ;: `-. ,-' \ `---. _.--' +--`.----+ `+-----------'' |Business| / | #N DC | | +--------+ +----+---+ |Business| | #2 DC | +--------+ Figure 3. Basic data center to data center networking. Bernstein & Lee Expires January 16, 2013 [Page 8] Internet-Draft Cross Stratum Optimization Use-cases July 2012 3.1. Cross Stratum Optimization Examples In the DC-to-DC example of Figure 3 we can have computational constraints/limits at both local and remote data centers; fixed and marginal computational costs at local and remote data centers; and network bandwidth costs and constraints between data centers. Note that computing costs could vary by the time of day along with the cost of power and demand. Some cloud providers have quite sophisticated compute pricing models including: reserved, on demand, and spot (auction) variants. In addition, to possibly dynamically changing pricing, traffic loads between data centers can be quite dynamic. In addition, data movement between data centers is another source of large network usage variation. Such peaks can be due to scheduled daily or weekly offsite data backup, bulk VM migration to a new data center, periodic virtual machine migration, etc... 3.2. Network and Data Center Faults and Reliability For networked applications that require high levels of reliability/availability the network diagram of Figure 4 could be enhanced with redundant business locations and external data centers as shown in Figure 4. For example cell phone subscriber databases and financial transactions generally require what is called geographic database replication and results in extra communication between sites supporting high availability. For example if business #1 in Figure 4 required a highly available database related service then there would be an additional communication flows from the data center "1a" to data center "1b". Furthermore, if business #1 has outsourced some of its computation and storage needs to independent data center X then for resilience it may want/need to replicate (hot-hot redundancy) this information at independent data center Y. Bernstein & Lee Expires January 16, 2013 [Page 9] Internet-Draft Cross Stratum Optimization Use-cases July 2012 +-------------+ +-------------+ |Independent | |Independent | |Data Center X| |Data Center Y| +-----+-------+ +------+------+ \ / `. _.------------. .' \--'' `-+-. ,-' `-. +--------+ ,' `. .'Business| ,' `.-' |#N DC-a | ; Network : +--------+ +--------+ | | |Business+--- ; |#1 DC-a | `. +: +--------+ `. ;/ \ `-. ,-' `. .'`---. _.--' +--`.----+ +--------+ / `+-+---------\' |Business| |Business| .' | \ |#N DC-a | |#1 DC-b .' / \ +--------+ +--------+ | \ +----+---+ +--------+ |Business| |Business| |#2 DC-a | |#2 DC-b | +--------+ +--------+ Figure 4. Data center to data center networking with redundancy. 4. Cross Stratum Control Interfaces Two types of load balancing techniques are currently utilized in cloud computing. The first is load balancing within a data center and is sometimes referred to as local load balancing. Here one is concerned with distributing requests to appropriate machines (or virtual machines) in a pool based on the current machine utilization. The second type of load balancing is known as global load balancing and is used to assign clients to a particular data center out of a choice of more than one within the network and is our concern here. A number of commercial vendors offer both local and global load balancing products. Currently global load balancing systems have very little knowledge of the underlying network. To make better assignments of clients to data centers many of these systems use geographic information based on IP addresses. Hence we see that current systems are attempting to perform cross stratum optimization albeit with very coarse network information. A more Bernstein & Lee Expires January 16, 2013 [Page 10] Internet-Draft Cross Stratum Optimization Use-cases July 2012 complete interface for CSO in the client aggregation case that is also applicable in the "data center to data center" case would be: 1. A Network Query Interface - Where the global load balancer can inquire as to the bandwidth availability between "client regions" and data centers. 2. A Network Resource Reservation Interface - Where the global load balancer can make explicit requests for bandwidth between client regions and data centers. 3. A Fault Recovery Interface - For the global load balancer to make requests for expedited bulk rerouting of client traffic from one data center to another. Or for the network layer to make requests to the application to help deal with network faults. The network query interface can be considered a superset of the functionality supported by the current ALTO protocol [7]. Potential extensions to ALTO for this purpose are given in the next section. 5. Potential ALTO Protocol Extensions This section discusses the applicability of the ALTO protocol and necessary extensions to support a network query interface suitable for high bandwidth consuming applications. Before doing so we discuss general properties of the high bandwidth scenarios that may differ significantly from other uses of the ALTO protocol. The first has to do with scope and scale. The consumer of high bandwidth alto extensions is typically some type of application controller within a data center, as opposed to an individual end user. The number of such entities with a need for the high bandwidth related information is orders of magnitude smaller than, say, peer to peer networking users, or applications closer to the end user. Since a network provider may consider this information sensitive, there may be a desire to limit its distribution to a "pre- registered" set of entities. Hence these extensions would be applicable to controlled or partially controlled environments. Secondly, there is the notion of time scales. In cloud services we already see variants such as "on demand" compute instances and "reserved" compute instances. For network resource queries we may be concerned with (a) current bandwidth availability, (b) bandwidth availability at a future time, or (c) bandwidth for a bulk data Bernstein & Lee Expires January 16, 2013 [Page 11] Internet-Draft Cross Stratum Optimization Use-cases July 2012 transfer of a given amount that must take place within a given time window. Time-dependent bandwidth information can be and typically are considered in network planning and provisioning systems. For example, a VoD provider knows ahead of time when the latest "blockbuster" film will be available via its service and can make estimates based on historical data on the bandwidth that it will need to deal with the subsequent demand. The following discussions, however, are restricted to "current time" for now. Finally another goal in the design of an interface between the application and networking stratums is to minimize the need for either stratum to know too much about the inner workings of the other. Hence as much as possible it is desired to insulate the applications stratum from technology specifics of the network. That said, data centers providing IaaS may prefer to specify flows and connectivity at a layer below IP such as Ethernet. The key ALTO extensions useful for querying the network for high bandwidth consuming applications are: (a) Bandwidth Constraint Information (b) Constraint Based Filtering (c) Multi-cost information [MultiCost] (d) Endpoint Access Bandwidth Capacity (a new endpoint property) In the following sections we discuss (a) and (b). 6. Bandwidth Constraint Information 6.1. Introduction The amount of bandwidth of available between two entities or two sets of entities can be of prime interest to applications that have stringent bandwidth requirements relative to a networks capacity. Such entities can be communicating across a WAN, a metro area, a LAN, or even within a compute cluster. One may want to query the network as to the available bandwidth in a number of different cases: (a) Bandwidth available between a single source destination pair (b) Bandwidth between one particular source and several other destinations Bernstein & Lee Expires January 16, 2013 [Page 12] Internet-Draft Cross Stratum Optimization Use-cases July 2012 (c) Bandwidth between one set of sources and another set of destinations. Case (a), bandwidth between two points, is well defined, however, in cases (b) and (c) there is some ambiguity. In cases (b) and (c) one may want to the query for the bandwidth available to a single "flow" at a time, or for multiple simultaneous "flows" between sources and destinations. If the bandwidth query is for potentially simultaneous flows then there is the possibility that the flows of interest would (or could) share network resources, e.g., link capacity. Such a situation leads to what is known as a multi-commodity flow problem [NetOpt]. General formulations of this problem [NetOpt] allow for arbitrary path selection and can permit splitting of user demands across multiple paths if inverse multiplexing like techniques are available. Alternative formulations of multi-commodity flow problems exist [RWA] when path choices between a source and destination are restricted to an explicit list of paths (or a single path). In both formulations link capacities form a key optimization constraint. To perform better application layer traffic optimization, the presence and capacity of such "mutual bottleneck" links would need to be considered by "large bandwidth applications". This draft shows how a combination of abstract path link vectors and/or constrained cost graph can be used to enable enhanced application layer traffic optimization. These techniques are illustrated with connectionless technologies such as IP and Ethernet, as well as MPLS and circuit switched technologies that can be controlled via GMPLS. 6.1.1. Example Network: Providers View In Figure 1 we show an example network consisting of five nodes and six links. This is the network provider's view of the network and not necessarily information to be shared in detail with applications. We will use this same network to illustrate bandwidth constraint representations for different technologies. For illustrative purposes we only consider a single weight (cost) and bandwidth constraint per link. The units of bandwidth could be Mbps, Gbps, or wavelengths depending upon the technology. These costs and constraints are from the network provider's perspective and may or may not be the sole guidance in path selection, e.g., non-shortest paths may be chosen depending upon data and control plane technologies. However, when considering a path between a source and destination across this network we sum the weights for each link along the path to obtain the total cost for the path. Bernstein & Lee Expires January 16, 2013 [Page 13] Internet-Draft Cross Stratum Optimization Use-cases July 2012 +----+ L0 Wt=10,BW=50 +----+ | N0 |-----------------------------------------| N3 | +----+ `. +----+ | `. L4 Wt=7 | | `-. BW=40 | | `. +----+ | | `.| N4 | | | L1 .' +----+ | | Wt=10 / L2 | | BW=45 / Wt=12 | | /L5 Wt=10 BW=30 | | .' BW=45 | | / | | / | +----+ .' L3 Wt=15 BW=42 +----+ | N1 |.........................................| N2 | +----+ +----+ Figure 1 Generic Constrained Network Example 6.2. Data and Control Plane Path Choices In this section we survey common data and control plane technologies with respect to the path choices that they may allow as well as the methods one can use to infer available paths. Methods for inferring paths influence how efficient the network layer can convey cost and constraint information to the application layer, i.e., even if the control plane limits us to a single fixed path between a source an destination, if we need many paths between many sources and destinations it can be very efficient if such information can be derived from a simple graph representation. Technologies that allow arbitrary placement of paths across a network include: circuit switched technologies (WDM, TDM), strictly connection oriented packet technologies (MPLS, ATM, and Frame Relay), and connection oriented modes of multi-purpose protocols such as InfiniBand's CO service. In these cases a network provider can furnish a graph representation of the network suitable for the application optimizer to choose routes. In some cases, for example, in WDN networks due to optical impairments, the usable paths may be restricted in a way not readily discerned from a simple graph representation. In such a case a list of possible paths would need to be furnished. Bernstein & Lee Expires January 16, 2013 [Page 14] Internet-Draft Cross Stratum Optimization Use-cases July 2012 For IP, a connectionless technology, one typically thinks of a single path between each source and destination (not considering equal cost multipath). Although no choice in path selection is available, in the case of single area OSPF the paths can be derived from a graph, while BGP [BGP4] uses techniques based on policies and path vectors (AS_PATH) as part of its route selection process and these are not derived from graphs. Multi-Topology Routing enhancements to OSPF[MT-OSPF] can allow multiple path choices between a source and destination and such paths could be derived from their corresponding graphs. Ethernet switching offers the greatest variety of path selection capabilities depending upon the control plane employed. The basic Ethernet Bridge specifications in 802.1D [802.1D] utilizes a single tree structure as the communication backbone between all nodes. Hence, one has no choice in path between nodes and the paths can be easily derived from a graph of the spanning tree. We will also see that such graphs are easy to reduce. IEEE 802.1Q [802.1Q] includes virtual LANs (VLANs) and allows for multiple spanning trees. The multiple spanning tree protocol (MSTP) allows for the assignment of VLANs to trees. Hence we have more than one choice in paths but all flows within the same VLAN have to share the same tree. Note that trees can be given as graphs so this is a case where we may want multiple graphs. OpenFlow [OpenFlow] capable switches permit general forwarding behavior based on general packet header matching. These can include Ethernet destination and source addresses, IP destination and source addresses, as well as other protocol related fields. Since both source and destination information can be utilized in forwarding OpenFlow can enable traffic engineering like a connection oriented packet switching technology. Hence arbitrary path selection based on a graph is possible. 6.3. ALTO Extensions In this section we show give two different models for representing bandwidth constraints, give several examples of both approaches, and furnish an initial JSON encoding for both approaches. We end this section with a discussion of which approach a network provider may want to choose within a given context. 6.3.1. Mutually Constrained Paths As discussed in section 6.2. the network's data or control plane may dictate the paths taken between a source and destination. Even if such paths could be derived from a graph, the network provider may choose to provide information about the paths to promote information Bernstein & Lee Expires January 16, 2013 [Page 15] Internet-Draft Cross Stratum Optimization Use-cases July 2012 hiding or to minimize the amount of information needed to be transferred via ALTO. For example if the application is asking for cost/capacity information between a few sources and destinations providing path information for these few paths may take much less space than a corresponding graph. In the following we give examples of paths with shared link bandwidth constraints for two different technologies then we provide a tentative JSON encoding for use with the ALTO protocol. 6.3.1.1. Simple IP Network Example Consider Figure 1 as a single OSPF area with N0 representing a large data center and nodes N2 and N3 as potential clients. The corresponding path link vectors with their corresponding cost (sum of weights) and link bandwidth constraints: Path Src-Dest Path Vector Path Cost P1 N0-N2: {L0, L2} 22 P2 N0-N3: {L0} 10 ---------------------------------- Link Bandwidth L0 50 L2 30 Table 1. Path Vectors for paths P1 and P2, and used link capacities. From an optimization perspective each (capacitated) link is a potential traffic constraint. From Table 1 since the paths from N0- N2 and N0-N3 shared a common link, L0, the sum of their bandwidth flows must be less than the capacity of L0 (50 units). In addition, the capacity constraint on link L2 tell us that the bandwidth of the traffic from N0-N2 must be less than 30 units. This information, as well as the total costs of the two paths, is all that is needed for a constrained joint optimization to proceed. Detailed information on link costs (as seen by the network) is not necessary, nor is information on unused links. 6.3.1.2. TDM Network Example Now suppose the network of Figure 1 is a TDM network controlled by GMPLS. Once again N0 representing a large data center and nodes N2 and N3 as potential clients. However in this case the network provider offers an additional path, P3, for getting from N0-N2. Path Src-Dest Path Vector Path Cost Bernstein & Lee Expires January 16, 2013 [Page 16] Internet-Draft Cross Stratum Optimization Use-cases July 2012 P1 N0-N2 {L0, L2} 22 P2 N0-N3 {L0} 10 P3 N0-N2 {L1,L3} 25 ---------------------------------- Link Bandwidth L0 50 L1 45 L2 30 L3 42 Table 2. Path Vectors for P1-P3 and used link capacities. Once again no information in addition to that shown in Table 2 is required to perform a constrained optimization. However, path P3 is the only path using links L1 and L3. Link L3's capacity is 42 units and is less that link L1's capacity of 45 units. Satisfying link L3's capacity constraint (for the set of paths P1-P3) implies that link L1's capacity constraint is always satisfied and hence no information on link L1 needs to be sent from the network. In particular the network could send the information shown in Table 3 where we have replaced links L1 and L3 with an "abstract link" (AL13) with capacity equal to that of link L3. Path Src-Dest Path Vector Path Cost P1 N0-N2 {L0, L2} 22 P2 N0-N3 {L0} 10 P3 N0-N2 {AL13} 25 ---------------------------------- Link Bandwidth L0 50 L2 30 AL13 42 Table 3. Path Vectors for P1-P3 and abstract link capacities. Note that simplifications such as the previous can frequently be performed and can result in significant information savings. Also this constraint information reduction was performed without the network provider having knowledge of the application layers traffic demands. Methods for performing these reductions may be specific to service providers and not subject to standardization. Bernstein & Lee Expires January 16, 2013 [Page 17] Internet-Draft Cross Stratum Optimization Use-cases July 2012 6.3.1.3. JSON Encoding In some cases there may be more than one path given between a source and destination. In this case the network needs to furnish with each path the following information: (source, destination), (path id if more than one between source and destination), costs, overall path constraint (if any), and list of mutual abstract links for this path. In addition we need to furnish capacities for all mutual abstract links mentioned. object { PIDName source; PIDName dest; JSONNumber wt; //A numerical path cost JSONNumber delay; //A numerical path latency, optional JSONNumber bw; //A numerical bandwidth constraint, optional LIDName mutual-links<1..*>; //shared constrained links, optional } PathData; Note that "mutual-links" is a JSON array that contains the names of the shared links that this path depends upon (may be empty). Note that all costs are associated with path entities, while constraints may be associated with paths or links. object { JSONNumber bw; //A numerical bandwidth constraint, optional } SharedAbstractLink; Note that the shared abstract link only contains capacity information. This is much different from the case where a graph is shared. object { PathData [pathname]<0..*>; // The individual path info SharedAbstractLink [linkname]<0..*>; //Shared link info } NetworkPathData; 6.3.2. Cost-Capacity Graphs As discussed in section 6.2. the network's data or control plane may allow arbitrary path selection and hence a cost-capacity graph Bernstein & Lee Expires January 16, 2013 [Page 18] Internet-Draft Cross Stratum Optimization Use-cases July 2012 representation would be needed for the optimization to fully take advantage of this network flexibility. In the case where path choice is limited, but the paths can be derived from a graph, it may be useful for the network to supply a graph to reduce the amount of information transferred via the ALTO protocol. Suppose the application is interested in many source destination pairs. In this case the amount of path information including abstract link constraints could significantly exceed the information size of a graph. In the following we give examples of cost-capacity graphs for a technology (TDM) that can offer arbitrary path choice, and for a technology (MSTP Ethernet) that offers limited path choice but where specifying graphs can result in significant efficiencies, we then provide a tentative JSON encoding of cost-capacity graphs for use with the ALTO protocol. 6.3.2.1. Simple TDM Example with Graph Reduction Consider again where Figure 1 represents a TDM network and in this case the provider will permit the application to make path choices. Suppose that the application only involves nodes N0, N1, and N2, and not N3 or N4. By studying the structure of the graph of Figure 1 one can derive the reduced graph shown in Figure 2 that maintains all relevant cost and capacity information from the point of view of nodes N0, N1, and N2. In particular we were able to remove nodes N2 and N4, substitute abstract link AL0M2 for links L0 and L2, and substitute abstract link AL4M5 for link L4 and L5. Note that any such reductions, approximate or exact, are at the network providers discretion. +----+ | N0 |-------------------------------------------+ +----+ `. AL0M2 | | `. Wt=22,BW=30 | | `-. | | `. | | | AL4M5 | | L1 . Wt=17,BW=40 | | Wt=10 / | | BW=45 / | | / | | .' | | / | Bernstein & Lee Expires January 16, 2013 [Page 19] Internet-Draft Cross Stratum Optimization Use-cases July 2012 | / | +----+ .' L3 Wt=15 BW=42 +----+ | N1 |.........................................| N2 | +----+ +----+ Figure 2. Reduced graph of Figure 1 from the perspective of nodes N1-N3. The resulting information to be conveyed concerning this reduced graph is shown in Table 4. Link End Nodes Bandwidth Cost AL0M2 (N0, N2) 50 22 L1 (N0, N1) 45 10 L3 (N1, N2) 42 15 AL4M5 (N0, N1) 40 17 Table 4. Representation of the graph of Figure 2. 6.3.2.2. Ethernet MSTP Example with Multiple Graphs Consider the Ethernet network shown in Figure 3 running the MSTP with three multiple spanning tree instances define. Suppose the application is interested in connectivity between nodes N1, N3, N5, N6, and N7. In Figures 4-6 we show the spanning tree instances along with a high fidelity graph reduction that removes nodes that are not of interest and abstracts links as needed. Let's compare these reduced graph representations with that of a path representation. Since we have n=5 communicating nodes of interest this leads to n*(n-1)/2 = 10 potential paths per MSTI that the network would need to furnish cost and constraint information as in section 6.3.1. In the case of graphs reduced for the nodes of interest from tree structures it can be proved that the number of links in the graph is equal to (n-1), e.g., the reduced graph consists of 5 nodes and 4 links. +----+ L4 /| N3 |..______ +----+ | +----+ `````----| N4 |..__ L6 / .-'+----+ ``--.__ +----+ / .-' | ``--..| N7 | | L2 .-' | +----+ / .-' / .' | / .' | / / | .-' / .' | Bernstein & Lee Expires January 16, 2013 [Page 20] Internet-Draft Cross Stratum Optimization Use-cases July 2012 / .-' L9 | .' | +-+--+ .-' | L11 / / | N2 |.-' L5 / .' | +----+ | / /L8 \ | .' | \ L1 / .' | \ | / / \ / .' | +----+ | .' / | N1 |.__ L3 | / +----+ +----+ `--._ / .' __..| N6 | ``-.._ +----+ __..--'' +----+ ``-.| N5 |.--'' L7 +----+ Figure 3. Ethernet Network supporting MSTP. L4 AL4M6 +--+ +--+ +--+ __..--|N4|`. +--+ __..--|N7| |N3|--' +--+ \ L6 |N3|--' +--+ +--+ `. +--+ | / `. / \ L2 / +--+ / | .' |N7| .'AL1M2 \ L8 / +--+ / | +--+ MSTI #1 / +--+ \ |N2| / |N1| | +--+ L8| +--+ \ \ (a) / (b) +--+ | L1 / .'|N6| \ +--+ +--+ .' +--+ \ .'|N6| |N5|.' L7 +--+ +--+ .' +--+ +--+ |N1| |N5|.' L7 +--+ +--+ Figure 4. (a) Spanning tree instance #1, (b) Reduced graph from the perspective of notes N1, N3, N5, N6, N7. Bernstein & Lee Expires January 16, 2013 [Page 21] Internet-Draft Cross Stratum Optimization Use-cases July 2012 +--+ +--+ L4_..-|N4| +--+ |N3|.--'' +--+ |N3|| +--+ .-' | +--+\ .-' / | _.-' | +--+ \ +--+ .-' L9 | |N7| | |N7| .-' / +--+ \ +--+ +--+ | + AL4M5 \ + |N2| L5 / | | | +--+ MSTI #2 | L8 / \ L8 / | / | / (a) / / (b) \ / | +--+ | +--+ L3 / .'|N6| \ .'|N6| +--+ +--+ .' +--+ +--+ L3 +--+ .' +--+ |N1|-------|N5|.' L7 |N1|-------|N5|.' L7 +--+ +--+ +--+ +--+ Figure 5. (a) Spanning tree instance #2, (b) Reduced graph from the perspective of notes N1, N3, N5, N6, N7. +--+ +--+ L4 __.|N4|`. +--+ AL4M6 |N3|---' +--+ \L6 |N3|.__ +--+ `. +--+ ``--...__ / `. ``--.. L2 / +--+ +--+ .' MSTI #3 /|N7| /|N7| / .' +--+ .' +--+ +--+ L11 / | L11 / | |N2| / / / / +--+ (a) .' L8/ (b) .' L8/ / | / | / / / / .' +--+ .' +--+ / |N6| / |N6| +--+ L3 +--+ +--+ +--+ L3 +--+ +--+ |N1|.......|N5| |N1|.......|N5| +--+ +--+ +--+ +--+ Figure 6. (a) Spanning tree instance #2, (b) Reduced graph from the perspective of notes N1, N3, N5, N6, N7. Bernstein & Lee Expires January 16, 2013 [Page 22] Internet-Draft Cross Stratum Optimization Use-cases July 2012 In many data center applications all communicating virtual machines (VM) need to be place within the same VLAN. MSTP allows the assignment of VLANs to MSTIs hence a reduced graph representation can provide a very good mechanism for determining an optimum fit between communicating VM traffic patterns and MSTI VLAN assignment. 6.3.2.3. JSON Encoding Like the current ALTO filtered cost map, a request for a cost- capacity graph would take source and destination PIDs as inputs. In JSON notation we could represent the return graph or graphs as an JSON object containing link objects. As we saw in the Ethernet case it may be useful to supply more than one graph. In addition restrictions on routing such as only the shortest path between source and destination is a valid route, e.g., OSPF routing for IP, or that all routes come from the same graph, e.g., VLAN assignment to MSTI in MSTP Ethernet. Hence we are led to a tentative JSON encoding which includes named link objects, named graph objects, an a versioned container for holding graphs and any other general information such as the previously mentioned restrictions. object { NIDName aend; // Node ids are similar to PIDs but NIDName zend; // may not have end points JSONNumber wt; //A numerical routing cost JSONNumber delay; //A numerical latency cost, optional JSONNumber bw; //A numerical bandwidth "cost", optional // Other costs private or experimental could be added // for example stuff related to reliability or economic cost. // Only one cost of each type would be permitted. // Note a multi-cost like mechanism could be used. } LinkData // Collection of links each identified by link id (LID) name. object { LinkData [lidname]<0..*>; // Link id (LID) would be an identifier ... // similar to a PID or NID and identifies the // link } NetworkGraphData; Bernstein & Lee Expires January 16, 2013 [Page 23] Internet-Draft Cross Stratum Optimization Use-cases July 2012 // Finally Multiple graph encapsulation and versioning object { VersionTag map-vtag; NetworkGraphData [graphname]<1..*>; //named graphs ... // other information such as graph choice restrictions // or routing restrictions. } InfoResourceNetwork; Where a graph name is formatted like a PIDName, but names a graph. 7. Constraint Based Filtering Young's stuff here. 8. Conclusion In this draft we have discussed two generic use cases that motivate the usefulness of general interfaces for cross stratum optimization in the network core. In our first use case network resource usage became significant due to the aggregation of many individually unique client demands. While in the second use case where data centers were communicating with each other bandwidth usage was already significant enough to warrant the use of private line/LAN type of network services. Both use cases result in optimization problems that trade off computational versus network costs and constraints. Both featured scenarios where advanced reservation, on demand, and recovery type service interfaces could prove beneficial. In the later section of this document we showed how ALTO concepts [1] and the ALTO protocol could be used and extended to support joint application network optimization for large network bandwidth consuming applications. 9. Security Considerations TBD Bernstein & Lee Expires January 16, 2013 [Page 24] Internet-Draft Cross Stratum Optimization Use-cases July 2012 10. IANA Considerations This informational document does not make any requests for IANA action. 11. References 11.1. Informative References [1] "draft-ietf-alto-reqs-09." [Online]. Available: http://datatracker.ietf.org/doc/draft-ietf-alto-reqs/. [Accessed: 17-May-2011]. [2] J. Medved, N. Bitar, S. Previdi, B. Niven-Jenkins, and G. Watson, "Use Cases for ALTO within CDNs." [Online]. Available: http://tools.ietf.org/html/draft-jenkins-alto-cdn-use-cases-02. [Accessed: 06-Mar-2012]. [3] E. Mannie, Ed., "Generalized Multi-Protocol Label Switching (GMPLS) Architecture, RFC 3945." Oct-2004. [4] Y. Lee, G. Bernstein, and W. Imajuku, Eds., "Framework for GMPLS and PCE Control of Wavelength Switched Optical Networks (WSON), RFC 6163." Apr-2011. [5] A. Farrel, J. P. Vasseur, and J. Ash, "A Path Computation Element (PCE)-Based Architecture, RFC 4655." Aug-2006. [6] G. Swallow, J. Drake, H. Ishimatsu, Y. Rekhter,, "Generalized Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI): Resource ReserVation Protocol-Traffic Engineering(RSVP-TE) Support for the Overlay Model, RFC 4208," Oct-2005. [7] Y. R. Yang, R. Alimi, and R. Penno, "ALTO Protocol." [Online]. Available: http://tools.ietf.org/html/draft-ietf-alto-protocol-10. [Accessed: 05-Mar-2012]. [8] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A view of cloud computing," Commun. ACM, vol. 53, pp. 50- 58, Apr. 2010. [9] K. A. Hua and S. Sheu, "Skyscraper broadcasting: a new broadcasting scheme for metropolitan video-on-demand systems," in Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication, Cannes, France, 1997, pp. 89-100. [10] "Adobe Flash Media Server 4.0 * Building peer-assisted networking applications." [Online]. Available: http://help.adobe.com/en_US/flashmediaserver/devguide/WSa4cb07693d12 3884520b86f312a354ba36d-8000.html. [Accessed: 13-May-2011]. Bernstein & Lee Expires January 16, 2013 [Page 25] Internet-Draft Cross Stratum Optimization Use-cases July 2012 [11] Rudra Dutta and George N. Rouskas, "Traffic grooming in WDM networks: Past and future," IEEE Network, vol. 16, no. 6, pp. 46 - 56, 2002. [12] Keyao Zhu and B. Mukherjee, "Traffic grooming in an optical WDM mesh network," Selected Areas in Communications, IEEE Journal on, vol. 20, no. 1, pp. 122-133, 2002. [13] G. Bernstein, B. Rajagopalan, and D. Saha, Optical Network Control: Architecture, Protocols, and Standards. Addison-Wesley Professional, 2003. [14] B. Awerbuch and Y. Shavitt, "Topology aggregation for directed graphs," Networking, IEEE/ACM Transactions on, vol. 9, no. 1, pp. 82-90, 2001. [15] S. Uludag, K.-S. Lui, K. Nahrstedt, and G. Brewster, "Analysis of Topology Aggregation techniques for QoS routing," ACM Comput. Surv., vol. 39, Sep. 2007. [16] K. Nichols, D. L. Black, S. Blake, and F. Baker, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers." RFC2747. Available: http://tools.ietf.org/html/rfc2474. [17] D. O. Awduche and J. Agogbua, "Requirements for Traffic Engineering Over MPLS." RFC2702. Available: http://tools.ietf.org/html/rfc2702. Bernstein & Lee Expires January 16, 2013 [Page 26] Internet-Draft Cross Stratum Optimization Use-cases July 2012 Author's Addresses Greg M. Bernstein Grotto Networking Fremont California, USA Phone: (510) 573-2237 Email: gregb@grotto-networking.com Young Lee Huawei Technologies 1700 Alma Drive, Suite 500 Plano, TX 75075 USA Phone: (972) 509-5599 Email: ylee@huawei.com Intellectual Property Statement The IETF Trust takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in any IETF Document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Copies of Intellectual Property disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement any standard or specification contained in an IETF Document. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity All IETF Documents and the information contained therein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION Bernstein & Lee Expires January 16, 2013 [Page 27] Internet-Draft Cross Stratum Optimization Use-cases July 2012 HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Bernstein & Lee Expires January 16, 2013 [Page 28]