Network Working Group Y. Gu Internet-Draft Y. Li Intended status: Standards Track Huawei Expires: April 22, 2013 Oct 19, 2012 The mechanism and signalling between TES and NVE draft-gu-nvo3-tes-nve-mechanism-01 Abstract his draft introduces the interaction required between TES to NVE when NVE is located in an external box to TES . The signaling between TES and NVE has to be designed carefully to reflect all the interaction requirements. This document describes the relevant considerations for such design and also provides a basic analysis of the potential reusable protocols. Currently this draft focuses on the general interaction procedures with relevant parameters and the signaling design consideration. It may be extended to show more detailed signalling design recommendation and/or solution recommendation in the future with the progress of NVO3's work. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 22, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Gu & Li Expires April 22, 2013 [Page 1] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminologies and concepts . . . . . . . . . . . . . . . . . . 6 3. TES to NVE Interaction . . . . . . . . . . . . . . . . . . . . 9 3.1. Interaction Intentions . . . . . . . . . . . . . . . . . . 9 3.2. VM Lifetime Events . . . . . . . . . . . . . . . . . . . . 9 3.2.1. VM Creation . . . . . . . . . . . . . . . . . . . . . 9 3.2.2. VM Pre-associate with NVE . . . . . . . . . . . . . . 10 3.2.3. VM Associate with NVE . . . . . . . . . . . . . . . . 10 3.2.4. VM Suspension . . . . . . . . . . . . . . . . . . . . 10 3.2.5. VM Resume . . . . . . . . . . . . . . . . . . . . . . 11 3.2.6. VM Migration . . . . . . . . . . . . . . . . . . . . . 11 3.2.7. VM Termination . . . . . . . . . . . . . . . . . . . . 11 3.2.8. VM Full Lifecycle Sketch . . . . . . . . . . . . . . . 11 3.3. Events,Interaction and Parameters . . . . . . . . . . . . 13 3.3.1. VM Pre-association . . . . . . . . . . . . . . . . . . 13 3.3.2. VM Association . . . . . . . . . . . . . . . . . . . . 14 3.3.3. VM Suspension . . . . . . . . . . . . . . . . . . . . 15 3.3.4. VM Resume . . . . . . . . . . . . . . . . . . . . . . 15 3.3.5. VM Emigration . . . . . . . . . . . . . . . . . . . . 16 3.3.6. VM Immigration . . . . . . . . . . . . . . . . . . . . 16 3.3.7. VM Termination . . . . . . . . . . . . . . . . . . . . 17 3.3.8. Keep-alive . . . . . . . . . . . . . . . . . . . . . . 17 3.3.9. NVE Local Changes . . . . . . . . . . . . . . . . . . 18 3.4. Signalling Design Considerations . . . . . . . . . . . . . 18 3.4.1. General Requirements . . . . . . . . . . . . . . . . . 18 3.4.2. Consideration . . . . . . . . . . . . . . . . . . . . 19 3.4.3. Signalling States Machine . . . . . . . . . . . . . . 19 4. Security Considerations . . . . . . . . . . . . . . . . . . . 20 5. Appendix 1: Mechanism Analysis . . . . . . . . . . . . . . . . 20 5.1. IEEE 802.1Qbg . . . . . . . . . . . . . . . . . . . . . . 20 5.1.1. Brief Introduction . . . . . . . . . . . . . . . . . . 21 5.2. BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.3. External Controller . . . . . . . . . . . . . . . . . . . 23 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.1. Normative Reference . . . . . . . . . . . . . . . . . . . 23 6.2. Informative Reference . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 Gu & Li Expires April 22, 2013 [Page 2] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 1. Introduction Tenant End System (TES) is the physical host where tenant deploys their applications. Tenants' applications can be deployed on a physical server directly or on a virtual machine resided on a physical server. Tenant's virtual network, or say virtual data center, is an overlay network which is built on the underlying network, but logically independent of the underlying network. Network Virtualization Edge (NVE) is implemented with virtualization functions to encapsulate or decapsulate a tenant's packet that allow for L2 and/or L3 tenant separation and for hiding tenant addressing information (MAC and IP addresses). A Tenant End System attaches to a Network Virtualization Edge (NVE) node, either directly or via a switched network (typically Ethernet). TES and NVE can be on the same physical server or on the separate devices. Fig1 to Fig3 show different NVE location cases. While TES and NVE are on the same physical server, the interaction between TES and NVE is via some proprietary internal interface which does not require a standard signaling protocol. Therefore such scenario is not the target of this document.For all the other scenarios, as long as the signaling between TES and NVE is visibile to network developer, it is in the scope of this draft. We tried to examine the different locations of NVE to make sure the signaling interaction between NVE and TES cover as possible scenarios as possible. o (NVE Location 1) NVE and TES are co-located in a physical server. VM connects to NVE on Hypervisor. In this case, there should be some mechanism to assist Hypervisor know of VM changes, including adding, deleting and migration. Both VM and Hypervisor, as well as network service appliance, are controlled by VM Manager. VM Manager is aware of any VM identity and event, hence it can easily notify NVE about the information through some internal interface. A publicaly available standard protocol is not necessary in this case. Refer to Fig1. Gu & Li Expires April 22, 2013 [Page 3] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 +-------------+------------+ | +--------------------+ | | | +--------------+ | | | | |Overlay Module| | | | | +----+---------+ | | | | | VN context| | | | +-----+-------+ | | | | | VNI | | | | | +-+---------+-+ | | | | | VAPs | | | | +----+---------+-----+ | | | | | | +--+---------+---+ | | | VM | | | +----------------+ | | | +--------------------------+ Tenant End Systems Figure 1 o (NVE Location 2) TES connects to NVE on an external network entity next to it(Figure 2). VM is controlled by VM Manager, while NVE is controlled by some other management entity like network management system. Hence proprietary protocol between TES and NVE may not fit all the scanarios. A standard protocol to signal between TES and NVE is mandatory in this case. Refer to Fig2. Gu & Li Expires April 22, 2013 [Page 4] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 +------- L3 Network --------+ | | | Tunnel Overlay | +------------+---------+ +---------+------------+ | +----------+-------+ | | +---------+--------+ | | | Overlay Module | | | | Overlay Module | | | +---------+--------+ | | +---------+--------+ | | |VN context| | VN context| | | | | | | | | +--------+-------+ | | +--------+-------+ | | | VNI | | | | VNI | | NVE1 | +-+------------+-+ | | +-+-----------+--+ | NVE2 | | VAPs | | | | VAPs | | +----+------------+----+ +----+-----------+-----+ | | | | -------+------------+-----------------+-----------+------- | | Tenant | | | | Service IF | | +----+------------+--------+ +---+-----------+-------+ | +----------------+ | | +---------------+ | | | Hypervisor | | | | Hypervisor | | | +--------+-------+ | | +-------+-------+ | | | | | | | | +-------+------+ | | +------+------+ | | | VM | | | | VM | | | +--------------+ | | +-------------+ | | | | | +--------------------------+ +-----------------------+ Tenant End Systems Tenant End Systems Figure 2: NVE Location3: VM connects to NVE on external network entity o (NVE Location 3) TES and NVE are indirectly connected. Refer to Fig3. Gu & Li Expires April 22, 2013 [Page 5] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 +------- L3 Network ------+ | | | Tunnel Overlay | +------------+--------+ +--------+------------+ | +----------+------+ | | +------+----------+ | | | Overlay Module | | | | Overlay Module | | | +--------+--------+ | | +--------+--------+ | | |VN Context| | |VN Context| | | | | | | | +-------+-------+ | | +------+-------+ | | | VNI | | | | VNI | | NVE1 | +-+-----------+-+ | | +-+----------+-+ | NVE2 | | VAPs | | | | VAPs | | +----+-----------+----+ +----+-----------+----+ /\ | | | | | ................... ................... | -----: switched network: : switched network: |signalling ................... ................... | | | Tenant | | | | | Service IF | | \/ Tenant End Systems Tenant End Systems Figure 3: Reference model when TES and NVE are indirectly connected In the mail list discussion, more than one mechanisms to be used between TESand NVE were discussed, including VDP (VSI Discovery and Configuration Protocol ), BGP and others.. This draft is not going to make assertion about which protocol is better. We believe that each candidate protocol can, with some revision or updating, be used to exchange necessary events and information between TES and NVE. The final decision on which one to be used does not only depend on functionalities, but also some other aspects, e.g. lightweight to be implemented on server, widely deployment in the industry, efficiency and performance etc. This draft first presents the recommended procedures of the TES and NVE signalling, key parameters of each step, and issues need to be addressed. Then a set of signaling design considerations are provided, which can be used as design requirements for the future signalling definition. In the appendix, we give a brief analysis on two existing protocols and also show how they can be revised to adapt to TES and NVE signaling. 2. Terminologies and concepts The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", Gu & Li Expires April 22, 2013 [Page 6] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The document uses terms defined in [framework]. VN: Virtual Network. This is a virtual L2 or L3 domain that belongs a tenant. VNI: Virtual Network Instance. This is one instance of a virtual overlay network. Two Virtual Networks are isolated from one another and may use overlapping addresses. Virtual Network Context or VN Context: Field that is part of the overlay encapsulation header which allows the encapsulated frame to be delivered to the appropriate virtual network endpoint by the egress NVE. The egress NVE uses this field to determine the appropriate virtual network context in which to process the packet. This field MAY be an explicit, unique (to the administrative domain) virtual network identifier (VNID) or MAY express the necessary context information in other ways (e.g. a locally significant identifier). VNID: Virtual Network Identifier. In the case where the VN context has global significance, this is the ID value that is carried in each data packet in the overlay encapsulation that identifies the Virtual Network the packet belongs to. NVE: Network Virtualization Edge. It is a network entity that sits on the edge of the NVO3 network. It implements network virtualization functions that allow for L2 and/or L3 tenant separation and for hiding tenant addressing information (MAC and IP addresses). An NVE could be implemented as part of a virtual switch within a hypervisor, a physical switch or router, a Network Service Appliance or even be embedded within an End Station. Underlay or Underlying Network: This is the network that provides the connectivity between NVEs. The Underlying Network can be completely unaware of the overlay packets. Addresses within the Underlying Network are also referred to as "outer addresses" because they exist in the outer encapsulation. The Underlying Network can use a completely different protocol (and address family) from that of the overlay. Data Center (DC): A physical complex housing physical servers, network switches and routers, Network Service Appliances and networked storage. The purpose of a Data Center is to provide application and/or compute and/or storage services. One such service is virtualized data center services, also known as Infrastructure as Gu & Li Expires April 22, 2013 [Page 7] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 a Service. VM: Virtual Machine. Several Virtual Machines can share the resources of a single physical computer server using the services of a Hypervisor (see below definition). Hypervisor: Server virtualization software running on a physical compute server that hosts Virtual Machines. The hypervisor provides shared compute/memory/storage and network connectivity to the VMs that it hosts. Hypervisors often embed a Virtual Switch (see below). Virtual Switch: A function within a Hypervisor (typically implemented in software) that provides similar services to a physical Ethernet switch. It switches Ethernet frames between VMs' virtual NICs within the same physical server, or between a VM and a physical NIC card connecting the server to a physical Ethernet switch. It also enforces network isolation between VMs that should not communicate with each other. Tenant: A customer who consumes virtualized data center services offered by a cloud service provider. A single tenant may consume one or more Virtual Data Centers hosted by the same cloud service provider. Tenant End System: It defines an end system of a particular tenant, which can be for instance a virtual machine (VM), a non-virtualized server, or a physical appliance. Virtual Access Points (VAPs): Tenant End Systems are connected to the Tenant Instance through Virtual Access Points (VAPs). The VAPs can be in reality physical ports on a ToR or virtual ports identified through logical interface identifiers (VLANs, internal VSwitch Interface ID leading to a VM). VN Name: A globally unique name for a VN. The VN Name is not carried in data packets originating from End Stations, but must be mapped into an appropriate VN-ID for a particular encapsulating technology. Using VN Names rather than VN-IDs to identify VNs in configuration files and control protocols increases the portability of a VDC and its associated VNs when moving among different administrative domains (e.g. switching to a different cloud service provider). VSI: Virtual Station Interface. Typically, a VSI is a virtual NIC connected directly with a VM. [Qbg] Gu & Li Expires April 22, 2013 [Page 8] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3. TES to NVE Interaction 3.1. Interaction Intentions While TES is a non-virtualized physical server, a single physical interface on NVE is exclusively attached to a single tenant and the attachement doesn't change very frequently. In this case, NVE can be pre-configured with tenant's network properties and policies to execute appropriate packet proccessing. And when a physical server moves, which means a server change its attach point to the network, the new NVE, to which the server is going to attach with in the new location, can also be preconfigured. In this case, there is no need to proceed signalling between TES and NVE. While TES is a virualized server with multiple VMs, the interaction between TES and NVE becomes necessary. A physical interface on NVE can be attached to multiple VMs, which could belong to the same or different tenants, and VMs can be moved to new locations without physical shutdown, which means NVE not able to know VMs' attachemnt and/or detachment by checking the physical port. As described in [framework], NVE need to establish Virtual Network Instance for each tenant virtual network attached to it through physical interface, NVE must be able to know which tenants are attached to it and the corresponding VMs belongs to each tenants. So that NVE must be able to 1) identify and distinguish VMs attched to NVE through the same physical interface; 2) identify which tenant the VM belongs to; 3) get the network policies that is associated with the tenant. That's why a interaction signalling between TES and NVE is needed. Of course the signalling between TES and NVE are not limited to the above intentions. While looking into the detail proccessing of VM events, we will find more signalling functionalities and proccessing on TES and NVE. 3.2. VM Lifetime Events Not every VM has to pass through all the listed VM lifetime events. Any VM can have at least two or a combination of the following events. 3.2.1. VM Creation VM Manager indicates the hypervisor to schedule resources on server for a particular VM, including CPU, Memory, Storage and Network resources. After the VM is created on the server, the VM has necessary resource and is ready to be launched. The creation of VM doesn't necessarily mean the VM is running. The VM can created but not launched for some while as long as the manager would like. The VM can be created and launched at once. Launching a VM just like Gu & Li Expires April 22, 2013 [Page 9] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 startup a physical computer. Though VM creation is a very important events for VM, but the attached NVE needn't be aware of this event. 3.2.2. VM Pre-associate with NVE VM Manager can decide when to luanch a VM and connect the VM to the network. Before VM connects to network, operator need to provision VM's network properties and policies to the NVE that the VM is attached to. The examples of network properties are VM MAC address, tenant virtual network identifier. The exmaples of policies are ACL and QoS. But these properties and policies are not immediately activated on NVE unless the VM Manager indicate the VM to connect to network. This is called Pre-association. Pre-association is optional event. 3.2.3. VM Associate with NVE This event means the VM is going to connect to the network. NVE has to get VM's network properties and policies, assign resources and install these properties and policies. If there is Pre-association before Association, NVE can reduce the time for Association. While VM is associated, it can use network resources as a physical server does. Association can happen with or without pre-association. If there is Pre-association before Association, NVE has already the net work properties and policies restored, or even installed. If the network properties and policies in Association message is the same as the pre-association, NVE can activate the installed network properties and policies. If they are different, the old reserved resources should be released and the new network properties and policies are installed and activated. 3.2.4. VM Suspension Creating and terminating VM may take a considerable amount of time. Instead of performing these operations, operators can suspend a virtual machine for the required time and quickly resume it later. Suspending a VM is similar to putting a real computer into the sleep mode. When suspending a VM, VM's current state (including the state of all applications and processes running in the VM) is stored. When the suspended virtual machine is resumed, it continues operating at the same point the virtual machine was at the time of its suspending. Gu & Li Expires April 22, 2013 [Page 10] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3.2.5. VM Resume To activate the suspended VM. The suspended applications will start again at the state the VM was suspended. It's not always predictable on when a suspended VM will be resumed. 3.2.6. VM Migration Two kinds VM migration, i.e. hot migration (or live migraiton) and offline migration. The proccessing of offline migration is similar to terminating the VM on one server and creating it on another server. The running applications on the VM will be broken and then be restarted again on the new location. For live migration, VM is lively migrated from one location to another, and the running applications should not be visibly disrupted. There is no termination or creation during live migration, so it's highly important to let NVE be aware of the migration so that corresponding network properties and policies can be correctly obtained, installed and activated on new location, and removed from the old location. Otherwise, there might be security risk and will influence or even interrupted running applications. There are two sub-type for VM migration: VM emigration and VM immigration. o VM Emigrating: VM is emigrating from this server. Hence, all the relevant resources on the server and attached NVE are disabled, but not removed right now, and is ready to be removed once VM is successfully migrated. If VM is failed to immigrate on the new location, VM has to be resumed on old location with the states and policies disabled by old NVE. o VM Immigrating: VM is immigrating to this server. The srever and attached NVE has prepared the necessary resources and is ready to enable the VM's properties and policies once VM is successfully migrated. 3.2.7. VM Termination All applications and processing on VM is terminated. All VM's resources on server, including CPU, Memory, Storage and network resources, are released. There is no such a VM any more. 3.2.8. VM Full Lifecycle Sketch Not every VM has to pass through all the lifetime events emulated in above. A simplest VM life has only VM Creation, VM Associating with NVE and VM Termination. A most complex VM life has all the events Gu & Li Expires April 22, 2013 [Page 11] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 listed in above. In this section, we show a sketch for a VM's full lifecycle with all listed events. This is helpful for the signalling designation in the future. /~~~~~~~~~~~~\ /~~~~~\ |VM Terminate|--Aged out-->|NULL | \~~~~~~~~~~~~/ \~~~~~/ ^ | VM Terminate v | /~~~~~~~~~~~\ +-----------------|VM Creation|<---------. | \~~~~~~~~~~~/ | | | Fail | v | | /~~~~~~~~~~~~~~~~\ | +--------------|VM Pre-Associate|--------. | |with NVE |<-------. | \~~~~~~~~~~~~~~~~/ | | | Fail | v | +----------------/~~~~~~~~~~~~~\<--------|-----------------. | .----------->|VM Associate |---------. | | | |with NVE |<--------. | | | \~~~~~~~~~~~~~/ | Successful Immigraiton |VM Resume | or | or | | to this server | | | .---. .---. | | | | v | | | /~~~~~~~~~~~~~~\ +---|-----/~~~~~~~~~~~~~\ | .------|---------->|VM Immigrating| | .-----|VM Suspension| | | \~~~~~~~~~~~~~~/ | \~~~~~~~~~~~~~/ | | | | | Failed Immigration | | | to other server | | v | | | /~~~~~~~~~~~~~\ | Failed Immigration +--------------------|VM Emigrating|-----. to this server | \~~~~~~~~~~~~~/ | | | | | Successful Immigration to other server | | | | +---------------------------. | | | +-----------------------------------------------------------. Figure 4: VM Full Lifecycle Sketch Gu & Li Expires April 22, 2013 [Page 12] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3.3. Events,Interaction and Parameters In this section, we will present description of interaction, parameters and special concerns for each VM events are provided. The interaction has strong relationship with VM lifetime events, but is not one-to-one mapping, for example, there is no interaction for VM Creation. For VM events, the interaction is initiated by hypervisor on behalf of a VM and sent to VNI on attached NVE. But this is not always the case, since NVE may also initiate interaction if there is some changes happen on NVE and those changes must be learned by particular VMs. 3.3.1. VM Pre-association o Interaction: This event will trigger Hypervisor to compose a pre- association message, and then Hypervisor sends the message to NVE. While receives the pre-association message, NVE needs to authorize the VM and/or Hypervisor, obtain VM's network properties and policies, and install the properties and policies on NVE. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. * Operation, i.e. Pre-association. * VMID, a global unique ID in Data Center for a VM. A VM can have more than one MAC addresses and belongs to more than one VNID, so a VMID is necessary for NVE to accosicate the VNIDs and MACs with the particular VM. * VNID(s), a global unique ID in Data Center for a tenant's virtual network. * MAC addresses, a VM may have more than one MAC addresses. A VM may also belongs to more than one virtual network. So the MAC address(s) and VNID should be presented in a way that NVE can identify which MAC addresses belongs to which VNID. * Policies, including ACL, QoS, Priority and etc. In the case there are more than one VNID associated with the VM, Policies should be explicitely indicated to belong to which VNID. o Response: After NVE processes pre-association message, it repond to TES with processing result. The response can be SUCCESS or FAIL with such indicated reasons as FAILED AUTHORIZTION, CONFLICT POLICIES(e.g. the provisioned policies are conflict with other existed policies on NVE), NON-SUFFICIENT RESOURCES(e.g. the NVE has not enough resources to install the provisioned policies). Gu & Li Expires April 22, 2013 [Page 13] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3.3.2. VM Association o Interaction: This event will trigger Hypervisor to compose an Association message, and then Hypervisor sends the message to NVE. Association can happen with or without a Pre-association message. * If there is a Pre-association message before Association, NVE needs to compare the information provided by Pre-association and Association. If they are same, NVE can activate the pre- installed resources. If they are different, NVE needs to do some additional work depending on what information has been changed from pte-association to association. For example, if policy or VNID is changed, NVE needs to update its memory. * If there is no Pre-association message before Association, NVE needs to do authorization, obtain VM's network properties and policies, and install and activate the properties and policies on NVE. * If there is another successful Association message before this Association, NVE needs to compare the information provided by previous provisioned Association and this Association. If all is the same, NVE do nothing except for update the VM's timer. If there is different in comparision, NVE needs to do some additional work, depends on what information is changed. For example, if policies or VNID is changed, NVE needs to update its memory. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. * Operation, i.e. Association. * VMID * VNID(s) * MAC addresses * Policies o Response: After NVE processes Association message, it repond to TES with processing result. The response can be SUCCESS or FAIL with such indicated reasons as FAILED AUTHORIZTION, CONFLICT POLICIES(e.g. the provisioned policies are conflict with other existed policies on NVE), NON-SUFFICIENT RESOURCES(e.g. the NVE has not enough resources to install the provisioned policies). Gu & Li Expires April 22, 2013 [Page 14] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3.3.3. VM Suspension o Interaction: This event will trigger Hypervisor to compose an Suspension message or an Association message with Suspension indication, and then Hypervisor sends the message to NVE. Suspension must happen after Successful Association. On receiving a Suspension message, NVE inactivate, but not remove, the VM's resources and prepare for the next Resume message. In the state of suspension, NVE acts similar as it in Pre-association state. The FDB can be aged out during VM suspension. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. * Operation, i.e. Suspension or an Association message with Suspension indication * VMID o Response: After NVE processes Suspension message, it repond to TES with processing result. The response can be SUCCESS or FAIL . If it's FAIL, it may be because the NVE is too busy to process the message. 3.3.4. VM Resume o Interaction: This event will trigger Hypervisor to compose an Resume message or an Association message with Resume indication, and then Hypervisor sends the message to NVE. Resume is supposed to happen after a successful Suspension message, otherwise, it will be responded with a SUCCESS message and NVE will do nothing to the message.. On receiving a Resume message, NVE activates the VM's resources and prepare. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. * Operation, i.e. Resume or an Association message with Resume indication * VMID o Response: After NVE processes Resume message, it repond to TES with processing result. The response can be SUCCESS or FAIL. If it's FAIL, it may be because the NVE is too busy to process the message. Gu & Li Expires April 22, 2013 [Page 15] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3.3.5. VM Emigration o Interaction: This event will trigger Hypervisor to compose an Emigration message or an Association message with Emigration indication, and then Hypervisor sends the message to NVE. Emigration can happen after Pre-association, Association, Suspension or Resume. o On receiving VM Emigration message or indication, NVE inactivate VM's resources. But NVE doesn't immediately reomve VM's resources and states, because an emigration maybe fail if the immigration on the remote server or NVE is failed. In that case, the emigrating VM may need to continue its work on the current server. NVE will wait for a next Termination message to remove the VM's resources or states on NVE. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. * Operation, i.e. Association. * VMID o Response: After NVE processes VM Emigration, it repond to TES with processing result. The response can be SUCCESS or FAIL. If it's FAIL, it may be because the NVE is too busy to process the message. 3.3.6. VM Immigration o Interaction: This event will trigger Hypervisor to compose an Immigration message, or an Pre-association/Association message with Immigration indication, call them immigration(Pre-asso) and Immigration(Asso). NVE's reaction to VM Immigration is silimar to its reaction to Pre-association or Association. If the result of Immigration processing is FAIL, the VM will not migrate to the new location and continue its work on old server. VM Manger may have to find another new location for the VM to migrate to. o To distinguish Immigration from Pre-association and Association is meaningful, [statemigration-framework]shows the problem of VM's flow-coupled state migration in case of VM live migration. The Immigration message can be a indication or trigger for the flow- coupled state migration on middleboxes. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. Gu & Li Expires April 22, 2013 [Page 16] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 * Operation, i.e. Immigration or an (Pre-)Association message with Immigration indication. * VMID * VNID(s) * MAC addresses * Policies o Response: After NVE processes Immigration message, it repond to TES with processing result. The response can be SUCCESS or FAIL with such indicated reasons as FAILED AUTHORIZTION, CONFLICT POLICIES(e.g. the provisioned policies are conflict with other existed policies on NVE), NON-SUFFICIENT RESOURCES(e.g. the NVE has not enough resources to install the provisioned policies). 3.3.7. VM Termination o Interaction: This event will trigger Hypervisor to compose an Termination message. NVE' will release VM's resources on NVE and remove all state about this VM. o Parameters: The signalling from TES to NVE should at least include the following mandatory parameters. * Operation, i.e. Termination * VMID o Response: After NVE processes Termination message, it repond to TES with processing result. The response can be SUCCESS or FAIL. If it's FAIL, it maybe because NVE is too busy to process the Termination message, however the VM can be terminated on the server anyway. 3.3.8. Keep-alive This is not a VM lifetime events. Since the resources on NVE is precious, if a associated, pre-associated or suspended VM keeps idle for a pre-defined time, NVE will remove the VM's resources, so that NVE can serve other active VMs. In order to keep VM's resource on NVE, Hypervisor has to create keep-alive message, or an Pre- association/Association message with Keep-alive indication, NVE will update VM's timer upon the Keep-alive message. Parameters: The signalling from TES to NVE should at least include Gu & Li Expires April 22, 2013 [Page 17] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 the following mandatory parameters. o Operation, i.e. Keep-alive or an (Pre-)Association message with Keep-alive indication. o VMID 3.3.9. NVE Local Changes While VM associate with a VNID on NVE, NVE will generate local significant indicators for the VM and VNIDs, e.g. VID. If the indicators are sent to Hypervisor in previous response, and the indicators change later on, NVE need to create an Associate or a dedicated message with the changed indicators and send to Hypervisor, and Hypervisor will respond with processing result. Note: Although we use the VM Lifetime events names as the names of messages in this section, it does mean that there should be a dedicated message for each event in the future signalling. Some of the events can be carried in one signalled message with different operation type. For example, an Association message with Immigration indication or an Association message with Suspension indication. 3.4. Signalling Design Considerations 3.4.1. General Requirements 3.4.1.1. Basic Requirements REQUIREMENT-1: The TNS (TES to NVE Signalling) MUST support TES to notify NVE about the VM's events, including but not limited to Pre-Association, Association, Emigration, Immigration and Termination. REQUIREMENT-2: The TNS MUST support TES to notify NVE about the VM's VNID, which can be one identifier or a combination of several indentifier. REQUIREMENT-3: The TNS MUST support TES to notify NVE about the VM's address. The address MUST include one or both of MAC address of VM's virtual NIC and VM's IP address. And it SHOULD be extensible to carry new address type. REQUIREMENT-4: The TNS MUST support NVE to notify TES about the VM's local tag. The local Tag type supported by TNP MUST include IEEE 802.1Q tag. And it SHOULD be extensible to carry other type of local tag. Gu & Li Expires April 22, 2013 [Page 18] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 3.4.1.2. Extension Requirements REQUIREMENT-5: The TNS SHOULD support NVE to notify TES about the VM's traffic PCP value. In typical DC, where physical server connects to adjacent bridge, the data frame from server can be tagged with PCP or untaggged. If a data frame is untagged, it can be tagged with PCP on adjacent bridge. While in virtualized DC, the adjacent bridge is Hypervisor. There are two options to deal with PCP tag, 1) data frame is tagged with PCP by VM, 2)data frame is tagged with PCP by Hypervisor and 3) data frame is tagged with PCP by NVE. In cloud service, the VM can be anybody and it may want a higher priority than it should have. The VM can tag it's data frame with higher PCP value and get better service. Based on the assumption that PCP provided by VM is not reliable, it's more reasonable to let the network to define the PCP value based on VM's priority, and enable bridges to tag the PCP value, as 2) or 3). This problem is similar to local VID, which can be tagged either by Hypervisor or by NVE. The benefit to tag PCP by Hypervisor is to reduce the load on NVE. 3.4.2. Consideration To be added. 3.4.3. Signalling States Machine The interaction should be stateful. Both Hypervisor and NVE need to record the state of their signalling state. The main states are Pre- association, Association, Suspension, and Termination. The following diagram shows a the state machine of TES to NVE signalling. Only reasonable situations are listed in the diagram. In the future, more situation will be added to the state machine. Gu & Li Expires April 22, 2013 [Page 19] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 |------------------->/```\----------------------| | \~~~/ | | |Pre-Asso | | |or | | |Immigration(Pre-Asso) | /~~~~~~~~~~~\ Aged out v | |Termination|<----| /~~~~~~~~~~~~~~~~\ Asso \~~~~~~~~~~~/<-\ ---|Pre-Association | or ^ \ \~~~~~~~~~~~~~~~~/ Immigration(Asso) | \ | | Aged out Aged out |Asso | or or |or | Termination Termination |Immigration(Asso) | | \----| v | /~~~~~~~~~~~\Suspension/~~~~~~~~~~~~~\ | |Suspension |<---------| Association |<----------------| \~~~~~~~~~~~/--------->\~~~~~~~~~~~~~/ Resume / ^ / \ /~~~\ | | \~~~/ States |-Emigration-| or Immigration(Asso) ------ Message Figure 5: TES to NVE signalling State Machine 4. Security Considerations There are some considerations on security in [overlay-cp]. Most of the considerations are about mechanism between NVE and external controller, and the attack on underlying networks, which can not be resolved only by the mechanism between TES and NVE. One security issue related to the mechanism between TES and NVE is about the authentication of VM who announces to associate with a particular VN. There is a hypervisor between VMs and NVEs, and both VMs and hypervisor are not always reliable. For example, a poisoned hypervisor may modify the VN Name, or identification for similar intention, in order to associate with a VN that it doesn't belong to. 5. Appendix 1: Mechanism Analysis 5.1. IEEE 802.1Qbg Gu & Li Expires April 22, 2013 [Page 20] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 5.1.1. Brief Introduction VDP has four basic TLV types. o Pre-Associate: Pre-Associate is used to pre-associate a VSI instance with a bridge port. The bridge validates the request and returns a failure Status in case of errors. Successful pre- association does not imply that the indicated VSI Type will be applied to any traffic flowing through the VSI. The pre-associate enables faster response to an associate, by allowing the bridge to obtain the VSI Type prior to an association. o Pre-Associate with resource reservation: Pre-Associate with Resource Reservation involves the same steps as Pre-Associate, but on successful pre-association also reserves resources in the Bridge to prepare for a subsequent Associate request. o Associate: The Associate TLV Type creates and activates an association between a VSI instance and a bridge port. The Bridge allocates any required bridge resources for the referenced VSI. The Bridge activates the configuration for the VSI Type ID. This association is then applied to the traffic flow to/from the VSI instance. o Deassociate: The de-associate TLV Type is used to remove an association between a VSI instance and a bridge port. Pre- Associated and Associated VSIs can be de-associated. De-associate releases any resources that were reserved as a result of prior Associate or Pre-Associate operations for that VSI instance. |1 |2 |3 |4 |7 |8 |9 |25 |26 |25+M |---------+--------+--------+--------+--------+------+-------+-----------+------------| |TLV type|TLV info | Status |VSI Type|VSI Type|VSIID |VSIID |Filter Info|Filter Infor| |(7bits) |strlength|(1octet)| ID |version |format|(16oct)| format | (M octets) | | | (9bits) | |(3oct) |(1oct) |(1oct)| | (1 octet)| | |--------+---------+--------+--------+--------+------+-------+-----------+------------| | |<-------VSI type&instance------>|<-------Filter----------| | |<--------------------VSI attibutes---------------------->| |<----TLV header--><--------------TLV information string = 23+Moctets---------------->| Figure 6: VDP TLV definitions Some important flag values in VDP request: o M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is migrating (M-bit = 1) or provides no guidance on the migration of the user of the VSI (M-bit = 0). The M-bit is used as an indicator relative to the VSI that the user is migrating to. Gu & Li Expires April 22, 2013 [Page 21] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 o S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is suspended (S-bit = 1) or provides no guidance as to whether the user of the VSI is suspended (S-bit = 0). A keep-alive Associate request with S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used as an indicator relative to the VSI that the user is migrating from. The filter information field supports the following format: o VID +---------+------+-------+--------+ | #of | PS | PCP | VID | |entries |(1bit)|(3bits)|(12bits)| |(2octets)| | | | +---------+------+-------+--------+ |<--Repeated per entry->| Figure 7 o MAC/VID +---------+--------------+------+-------+--------+ | #of | MAC address | PS | PCP | VID | |entries | (6 octets) |(1bit)|(3bits)|(12bits)| |(2octets)| | | | | +---------+--------------+------+-------+--------+ |<--------Repeated per entry---------->| Figure 8 o GroupID/VID +---------+--------------+------+-------+--------+ | #of | GroupID | PS | PCP | VID | |entries | (4 octets) |(1bit)|(3bits)|(12bits)| |(2octets)| | | | | +---------+--------------+------+-------+--------+ |<--------Repeated per entry---------->| Figure 9 o GroupID/MAC/VID +---------+-----------+-------------+------+-------+--------+ | #of | GroupID | MAC address | PS | PCP | VID | |entries |(4 octets) | (6 octets) |(1bit)|(3bits)|(12bits)| |(2octets)| | | | | | +---------+-----------+-------------+------+-------+--------+ |<--------------Repeated per entry--------------->| Figure 10 Gu & Li Expires April 22, 2013 [Page 22] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 In each format, the null VID can be used in the VDP Request. In this case, the Bridge is expected to supply the corresponding local VID value in the VDP Response. The VSIID in VDP request that identify a VM can be one of the following format: IPV4 address, IPV6 address, MAC address, UUID or locally defined. +--------------------------------------------------+----------------+ | VDP features | Requirements | | | Matching | +--------------------------------------------------+----------------+ | Pre-Associate/ Pre-Associate with resource | Requirement-1 | | reservation/ Associate/ Deassociate | | | M-bit/S-bit | Requirement-1 | | VSI type&instance in VDP request | Requirement-2 | | Filter Infor | Requirement-3 | | VID infor in VDP response | Requirement-4 | | PCP in VDP response | Requirement-5 | +--------------------------------------------------+----------------+ VDP TLV types 5.2. BGP gives a brief analysis on how BGP can be reused for TES and NVE signalling. Please refer to it for more information. [server2nve] 5.3. External Controller 6. References 6.1. Normative Reference [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", March 1997. [Qbg] "IEEE P802.1Qbg Edge Virtual Bridging". 6.2. Informative Reference [framework] Marc Lasserre, Marc., Balus, Florin., Morin, Thomas., Bitar, Nabil., and Yakov. Rekhter, "draft-ietf-nvo3-framework-00", September 2012. [overlay-cp] Gu & Li Expires April 22, 2013 [Page 23] Internet-Draft NVO3 TES to NVE mechanism Oct 2012 Kreeger, L., Dutt, D., Narten, T., Black, D., and M. Sridharan, "draft-kreeger-nvo3-overlay-cp-00", Jan 2012. [server2nve] Kompella, K., "draft-dunbar-nvo3-overlay-mobility-issues-00", July 2012. [statemigration-framework] Gu, Y., Shore, M., and S. Sivakumar, "A Framework and Problem Statement for Flow-associated Middlebox State Migration", October 2012. Authors' Addresses Gu Yingjie Huawei No. 101 Software Avenue Nanjing, Jiangsu Province 210001 P.R.China Phone: +86-25-56625392 Email: guyingjie@huawei.com Yizhou Li Huawei No. 101 Software Avenue Nanjing, Jiangsu Province 210001 P.R.China Phone: Email: liyizhou@huawei.com Gu & Li Expires April 22, 2013 [Page 24]