<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc autobreaks="yes"?>
<rfc category="info"
     docName="draft-lennox-raiarea-rtp-grouping-taxonomy-03"
     ipr="trust200902">
  <front>
    <title abbrev="RTP Grouping Taxonomy">A Taxonomy of Grouping
    Semantics and Mechanisms for Real-Time Transport Protocol (RTP)
    Sources</title>

    <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
      <organization abbrev="Vidyo">Vidyo, Inc.</organization>

      <address>
        <postal>
          <street>433 Hackensack Avenue</street>

          <street>Seventh Floor</street>

          <city>Hackensack</city>

          <region>NJ</region>

          <code>07601</code>

          <country>US</country>
        </postal>

        <email>jonathan@vidyo.com</email>
      </address>
    </author>

    <author fullname="Kevin Gross" initials="K." surname="Gross">
      <organization abbrev="AVA">AVA Networks, LLC</organization>

      <address>
        <postal>
          <street/>

          <city>Boulder</city>

          <region>CO</region>

          <country>US</country>
        </postal>

        <email>kevin.gross@avanw.com</email>
      </address>
    </author>

    <author fullname="Suhas Nandakumar" initials="S"
            surname="Nandakumar">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>170 West Tasman Drive</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>US</country>
        </postal>

        <email>snandaku@cisco.com</email>
      </address>
    </author>

    <author fullname="Gonzalo Salgueiro" initials="G"
            surname="Salgueiro">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>7200-12 Kit Creek Road</street>

          <city>Research Triangle Park</city>

          <region>NC</region>

          <code>27709</code>

          <country>US</country>
        </postal>

        <email>gsalguei@cisco.com</email>
      </address>
    </author>

    <author fullname="Bo Burman" initials="B." surname="Burman">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 13 11</phone>

        <email>bo.burman@ericsson.com</email>
      </address>
    </author>

    <!-- Add more authors here! -->

    <date day="9" month="October" year="2013"/>

    <area>Real Time Applications and Infrastructure (RAI)</area>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <!-- TODO: more keywords -->

    <abstract>
      <t>The terminology about, and associations among, Real-Time
      Transport Protocol (RTP) sources can be complex and somewhat
      opaque. This document describes a number of existing and
      proposed relationships among RTP sources, and attempts to define
      common terminology for discussing protocol entities and their
      relationships.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction" title="Introduction">
      <t>The existing taxonomy of sources in RTP is often regarded as
      confusing and inconsistent. Consequently, a deep understanding
      of how the different terms relate to each other becomes a real
      challenge. Frequently cited examples of this confusion are (1)
      how different protocols that make use of RTP use the same terms
      to signify different things and (2) how the complexities
      addressed at one layer are often glossed over or ignored at
      another.</t>

      <t>This document attempts to provide some clarity by reviewing
      the semantics of various aspects of sources in RTP. As an
      organizing mechanism, it approaches this by describing various
      ways that RTP sources can be grouped and associated
      together.</t>

      <t>All non-specific references to ControLling mUltiple streams
      for tElepresence (CLUE) in this document map to <xref
      target="I-D.ietf-clue-framework"/> and all references to Web
      Real-Time Communications (WebRTC) map to <xref
      target="I-D.ietf-rtcweb-overview"/>.</t>
    </section>

    <section title="Concepts">
      <t>This section defines concepts that serve to identify and name
      various transformations and streams in a given RTP usage. For
      each concept an attempt is made to list any alternate
      definitions and usages that co-exist today along with various
      characteristics that further describes the concept. These
      concepts are divided into two categories, one related to the
      chain of streams and transformations that media can be subject
      to, the other for entities involved in the communication.</t>

      <section title="Media Chain">
        <t>This section contains the concepts that can be involved in
        taking a sequence of physical world stimulus (sound waves,
        photons, key-strokes) at a sender side and transport them to a
        receiver, which may recover a sequence of physical stimulus.
        This chain of concepts is of two main types, streams and
        transformations. Streams are time-based sequences of samples
        of the physical stimulus in various representations, while
        transformations changes the representation of the streams in
        some way.</t>

        <t>The below examples are basic ones and it is important to
        keep in mind that this conceptual model enables more complex
        usages. Some will be further discussed in later sections of
        this document. In general the following applies to this
        model:<list style="symbols">
            <t>A transformation may have zero or more inputs and one
            or more outputs.</t>

            <t>A Stream is of some type.</t>

            <t>A Stream has one source transformation and one or more
            sink transformation (with the exception of <xref
            target="physical-stimulus">Physical Stimulus</xref> that
            can have no source or sink transformation).</t>

            <t>Streams can be forwarded from a transformation output
            to any number of inputs on other transformations that
            support that type.</t>

            <t>If the output of a transformation is sent to multiple
            transformations, those streams will be identical; it takes
            a transformation to make them different.</t>

            <t>There are no formal limitations on how streams are
            connected to transformations, this may include loops if
            required by a particular transformation.</t>
          </list> It is also important to remember that this is a
        conceptual model. Thus real-world implementations may look
        different and have different structure.</t>

        <t>To provide a basic understanding of the relationships in
        the chain we below first introduces the concepts for the <xref
        target="fig-sender-chain">sender side</xref>. This covers
        physical stimulus until media packets are emitted onto the
        network.</t>

        <figure align="center" anchor="fig-sender-chain"
                title="Sender Side Concepts in the Media Chain">
          <artwork><![CDATA[   Physical Stimulus
          |
          V
+--------------------+
|    Media Capture   |
+--------------------+
          |
     Raw stream
          V
+--------------------+
|    Media Source    |<- Synchronization Timing
+--------------------+
          |
    Source Stream
          V
+--------------------+
|   Media Encoder    |
+--------------------+
          |
    Encoded Stream     +-----------+
          V            |           V
+--------------------+ | +--------------------+
|  Media Packetizer  | | |  Media Redundancy  |
+--------------------+ | +--------------------+
          |            |           |
          +------------+ Redundancy Packet Stream
   Source Packet Stream            |
          V                        V
+--------------------+   +--------------------+
|  Media Transport   |   |  Media Transport   |
+--------------------+   +--------------------+
]]></artwork>
        </figure>

        <t>In <xref target="fig-sender-chain"/> we have included a
        branched chain to cover the concepts for using redundancy to
        improve the reliability of the transport. The Media Transport
        concept is an aggregate that is decomposed below in <xref
        target="media-stream-decomposition"/>.</t>

        <t>Below we review a <xref
        target="fig-receiver-chain">receiver media chain</xref>
        matching the sender side to look at the inverse
        transformations and their attempts to recover possibly
        identical streams as in the sender chain. Note that the
        streams out of a reverse transformation, like the Source
        Stream out the Media Decoder are in many cases not the same as
        the corresponding ones on the sender side, thus they are
        prefixed with a "Received" to denote a potentially modified
        version. The reason for not being the same lies in the
        transformations that can be of irreversible type. For example,
        lossy source coding in the Media Encoder prevents the Source
        Stream out of the Media Decoder to be the same as the one fed
        into the Media Encoder. Other reasons include packet loss or
        late loss in the Media Transport transformation that even
        Media Repair, if used, fails to repair. It should be noted
        that some transformations are not always present, like Media
        Repair that cannot operate without Redundancy Packet
        Streams.</t>

        <figure align="center" anchor="fig-receiver-chain"
                title="Receiver Side Concepts of the Media Chain">
          <artwork><![CDATA[+--------------------+   +--------------------+
|  Media Transport   |   |  Media Transport   |
+--------------------+   +--------------------+
          |                        |
Received Packet Stream   Received Redundancy PS
          |                        |
          |    +-------------------+
          V    V
+--------------------+
|    Media Repair    |
+--------------------+
          |
Repaired Packet Stream
          V
+--------------------+
| Media Depacketizer |
+--------------------+
          |
Received Encoded Stream
          V
+--------------------+
|   Media Decoder    |
+--------------------+
          |
Received Source Stream
          V
+--------------------+
|     Media Sink     |--> Synchronization Information
+--------------------+
          |
Received Raw Stream
          V
+--------------------+
|   Media Renderer   |
+--------------------+
          |
          V
  Physical Stimulus
]]></artwork>
        </figure>

        <section anchor="physical-stimulus" title="Physical Stimulus">
          <t>The physical stimulus is a physical event that can be
          captured and provided as media to a receiver. This include
          sound waves making up audio, photons in a light field that
          is visible, or other excitations or interactions with
          sensors, like keystrokes on a keyboard.</t>
        </section>

        <section anchor="media-capture" title="Media Capture">
          <t>The process of transforming the <xref
          target="physical-stimulus">Physical Stimulus</xref> into
          captured media. The Media Capture performs a digital
          sampling of the physical stimulus, usually periodically, and
          outputs this in some representation as a <xref
          target="raw-stream">Raw Stream</xref>. This data is due to
          its periodical sampling, or at least being timed
          asynchronous events, some form of a stream of media data.
          The Media Capture is normally instantiated in some type of
          device, i.e. media capture device. Examples of different
          types of media capturing devices are digital cameras,
          microphones connected to A/D converters, or keyboards.</t>

          <section title="Alternate Usages">
            <t>The CLUE WG uses the term "Capture Device" to identify
            a physical capture device.</t>

            <t>WebRTC WG uses the term "Recording Device" to refer to
            the locally available capture devices in an
            end-system.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Media Capture is identified either by
                hardware/manufacturer ID or via a session-scoped
                device identifier as mandated by the application
                usage.</t>

                <t>A Media Capture can generate an <xref
                target="encoded-stream">Encoded Stream </xref> if the
                capture device support such a configuration.</t>
              </list></t>
          </section>
        </section>

        <section anchor="raw-stream" title="Raw Stream">
          <t>The time progressing stream of digitally sampled
          information, usually periodically sampled, provided by a
          <xref target="media-capture">Media Capture</xref>.</t>
        </section>

        <section anchor="media-source" title="Media Source">
          <t>A Media Source is the logical source of a reference clock
          synchronized, time progressing, digital media stream, called
          a <xref target="source-stream">Source Stream</xref>. This
          transformation takes one or more <xref
          target="raw-stream">Raw Streams</xref> and provides a Source
          Stream as output. This output has been synchronized with
          some reference clock, even if just a system local wall
          clock.</t>

          <t>The output can be of different types. One type is
          directly associated with a particular Media Capture's Raw
          Stream. Others are more conceptual sources, like an <xref
          target="fig-media-source-mixer">audio mix of multiple Raw
          Streams</xref>, a mixed selection of the three loudest
          inputs regarding speech activity, a selection of a
          particular video based on the current speaker, i.e.
          typically based on other Media Sources.</t>

          <figure align="center" anchor="fig-media-source-mixer"
                  title="Conceptual Media Source in form of Audio Mixer">
            <artwork><![CDATA[   Raw       Raw       Raw
  Stream    Stream    Stream
    |         |         |
    V         V         V
+--------------------------+
|        Media Source      |<-- Reference Clock
|           Mixer          |
+--------------------------+
              |
              V
        Source Stream]]></artwork>
          </figure>

          <t/>

          <section title="Alternate Usages">
            <t>The CLUE WG uses the term "Media Capture" for this
            purpose. A CLUE Media Capture is identified via indexed
            notation. The terms Audio Capture and Video Capture are
            used to identify Audio Sources and Video Sources
            respectively. Concepts such as "Capture Scene", "Capture
            Scene Entry" and "Capture" provide a flexible framework to
            represent media captured spanning spatial regions.</t>

            <t>The WebRTC WG defines the term "RtcMediaStreamTrack" to
            refer to a Media Source. An "RtcMediaStreamTrack" is
            identified by the ID attribute.</t>

            <!--MW: I think the below SDP is a bit misplaced. Do we need a special section to discuss
    relation to SDP terminology. Or should this be focused and other interpretations
    be added?-->

            <t>Typically a Media Source is mapped to a single m=line
            via the Session Description Protocol (SDP) <xref
            target="RFC4566"/> unless mechanisms such as
            Source-Specific attributes are in place <xref
            target="RFC5576"/>. In the latter cases, an m=line can
            represent either multiple Media Sources, multiple <xref
            target="packet-stream">Packet Streams</xref>, or both.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>At any point, it can represent a physical captured
                source or conceptual source.</t>

                <!--MW: Put back a discussion of relation between Media Capture and Media sources?-->
              </list></t>
          </section>
        </section>

        <section anchor="source-stream" title="Source Stream">
          <t>A time progressing stream of digital samples that has
          been synchronized with a reference clock and comes from
          particular <xref target="media-source">Media
          Source</xref>.</t>
        </section>

        <section anchor="media-encoder" title="Media Encoder">
          <t>A Media Encoder is a transform that is responsible for
          encoding the media data from a <xref
          target="source-stream">Source Stream</xref> into another
          representation, usually more compact, that is output as an
          <xref target="encoded-stream">Encoded Stream</xref>.</t>

          <t>The Media Encoder step commonly includes pre-encoding
          transformations, such as scaling, resampling etc. The Media
          Encoder can have a significant number of configuration
          options that affects the properties of the encoded stream.
          This include properties such as bit-rate, start points for
          decoding, resolution, bandwidth or other fidelity affecting
          properties. The actually used codec is also an important
          factor in many communication systems, not only its
          parameters.</t>

          <t>Scalable Media Encoders need special mentioning as they
          produce multiple outputs that are potentially of different
          types. A scalable Media Encoder takes one input Source
          Stream and encodes it into multiple output streams of two
          different types; at least one Encoded Stream that is
          independently decodable and one or more <xref
          target="dependent-stream">Dependent Streams</xref> that
          requires at least one Encoded Stream and zero or more
          Dependent Streams to be possible to decode. A Dependent
          Stream's dependency is one of the grouping relations this
          document discusses further in <xref target="svc"/>.</t>

          <figure align="center" anchor="fig-scalable-media-encoder"
                  title="Scalable Media Encoder Input and Outputs">
            <artwork><![CDATA[       Source Stream
             |
             V
+--------------------------+
|  Scalable Media Encoder  |
+--------------------------+
   |         |   ...    |
   V         V          V
Encoded  Dependent  Dependent
Stream    Stream     Stream
]]></artwork>
          </figure>

          <t/>

          <section title="Alternate Usages">
            <t>Within the SDP usage, an SDP media description (m=line)
            describes part of the necessary configuration required for
            encoding purposes.</t>

            <t>CLUE's "Capture Encoding" provides specific encoding
            configuration for this purpose.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Media Source can be multiply encoded by different
                Media Encoders to provide various encoded
                representations.</t>
              </list></t>
          </section>
        </section>

        <section anchor="encoded-stream" title="Encoded Stream">
          <t>A stream of time synchronized encoded media that can be
          independently decoded.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Due to temporal dependencies, an Encoded Stream may
                have limitations in where decoding can be started.
                These entry points, for example Intra frames from a
                video encoder, may require identification and their
                generation may be event based or configured to occur
                periodically.</t>
              </list></t>
          </section>
        </section>

        <section anchor="dependent-stream" title="Dependent Stream">
          <t>A stream of time synchronized encoded media fragments
          that are dependent on one or more <xref
          target="encoded-stream">Encoded Streams</xref> and zero or
          more Dependent Streams to be possible to decode.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Each Dependent Stream has a set of dependencies.
                These dependencies must be understood by the parties
                in a multi-media session that intend to use a
                Dependent Stream.</t>
              </list></t>
          </section>
        </section>

        <section anchor="media_packetizer" title="Media Packetizer">
          <t>The transformation of taking one or more <xref
          target="encoded-stream">Encoded</xref> or <xref
          target="dependent-stream">Dependent Stream</xref> and put
          their content into one or more sequences of packets,
          normally RTP packets, and output <xref
          target="packet-stream">Source Packet Streams</xref>. This
          step includes both generating RTP payloads as well as RTP
          packets.</t>

          <t>The Media Packetizer can use multiple inputs when
          producing a single Packet Stream. One such example is the
          packetization when using SVC, as in Single Stream Transport
          (SST) usage of the payload format both an Encoded Stream as
          well as Dependent Streams are packetized in a single Source
          Packet Stream using a single SSRC.</t>

          <t>The Media Packetizer can also produce multiple Packet
          Streams, for example when Encoded and/or Dependent Streams
          are distributed over multiple Packet Streams, possibly in
          different RTP sessions.</t>

          <section title="Alternate Usages">
            <t>An RTP sender is part of the Media Packetizer.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>The Media Packetizer will select which
                Synchronization source(s) (SSRC) <xref
                target="RFC3550"/> in which RTP sessions that are
                used.</t>

                <t>Media Packetizer can combine multiple Encoded or
                Dependent Streams into one or more Packet Streams.</t>
              </list></t>
          </section>
        </section>

        <section anchor="packet-stream" title="Packet Stream">
          <t>A stream of RTP packets containing media data, source or
          redundant. The Packet Stream is identified by an SSRC
          belonging to a particular RTP session. The RTP session is
          identified as discussed in <xref target="rtp-session"/>.</t>

          <t>A Source Packet Stream is a packet stream containing at
          least some content from an Encoded Stream. Source material
          is any media material that is produced for transport over
          RTP without any additional redundancy applied to cope with
          network transport losses. Compare this with the <xref
          target="redundancy-packet-stream">Redundancy Packet
          Stream</xref>.</t>

          <section title="Alternate Usages">
            <t>The term "Stream" is used by the CLUE WG to define an
            encoded Media Source sent via RTP. "Capture Encoding",
            "Encoding Groups" are defined to capture specific details
            of the encoding scheme.</t>

            <t>RFC3550 <xref target="RFC3550"/> uses the terms media
            stream, audio stream, video stream and streams of (RTP)
            packets interchangeably. It defines the SSRC as the "The
            source of a stream of RTP packets, ..."</t>

            <t>The equivalent mapping of a Packet Stream in SDP <xref
            target="RFC4566"/> is defined per usage. For example, each
            Media Description (m=line) and associated attributes can
            describe one Packet Stream OR properties for multiple
            Packet Streams OR for an RTP session (via <xref
            target="RFC5576"/> mechanisms for example).</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Each Packet Stream is identified by a unique
                Synchronization source (SSRC) <xref target="RFC3550"/>
                that is carried in every RTP and RTP Control Protocol
                (RTCP) packet header in a specific RTP session
                context.</t>

                <t>At any given point in time, a Packet Stream can
                have one and only one SSRC.</t>

                <t>Each Packet Stream defines a unique RTP sequence
                numbering and timing space.</t>

                <t>Several Packet Streams may map to a single Media
                Source via the source transformations.</t>

                <t>Several Packet Streams can be carried over a single
                RTP Session.</t>
              </list></t>
          </section>
        </section>

        <section anchor="media-redundancy" title="Media Redundancy">
          <t>Media redundancy is a transformation that generates
          redundant or repair packets sent out as a Redundancy Packet
          Stream to mitigate network transport impairments, like
          packet loss and delay.</t>

          <t>The Media Redundancy exists in many flavors; they may be
          generating independent Repair Streams that are used in
          addition to the Source Stream (<xref target="RFC4588">RTP
          Retransmission</xref> and some <xref
          target="RFC5109">FEC</xref>), they may generate a new Source
          Stream by combining redundancy information with source
          information (Using <xref target="RFC5109">XOR FEC</xref> as
          a <xref target="RFC2198">redundancy payload</xref>), or
          completely replace the source information with only
          redundancy packets.</t>
        </section>

        <section anchor="redundancy-packet-stream"
                 title="Redundancy Packet Stream">
          <t>A <xref target="packet-stream">Packet Stream</xref> that
          contains no original source data, only redundant data that
          may be combined with one or more <xref
          target="received-packet-stream">Received Packet
          Stream</xref> to produce <xref
          target="repaired-packet-stream">Repaired Packet
          Streams</xref>.</t>
        </section>

        <section anchor="media-transport" title="Media Transport">
          <t>A Media Transport defines the transformation that the
          <xref target="packet-stream">Packet Streams</xref> are
          subjected to by the end-to-end transport from one RTP sender
          to one specific RTP receiver (an RTP session may contain
          multiple RTP receivers per sender). Each Media Transport is
          defined by a transport association that is identified by a
          5-tuple (source address, source port, destination address,
          destination port, transport protocol). Each transport
          association normally contains only a single RTP session,
          although a proposal exists for sending <xref
          target="I-D.westerlund-avtcore-transport-multiplexing">multiple
          RTP sessions over one transport association</xref>.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Media Transport transmits Packet Streams of RTP
                Packets from a source transport address to a
                destination transport address.</t>
              </list></t>
          </section>

          <section anchor="media-stream-decomposition"
                   title="Media Stream Decomposition">
            <t>The Media Transport concept sometimes needs to be
            decomposed into more steps to enable discussion of what a
            sender emits that gets transformed by the network before
            it is received by the receiver. Thus we provide also this
            <xref target="fig-media-transport">Media Transport
            decomposition</xref>.</t>

            <figure align="center" anchor="fig-media-transport"
                    title="Decomposition of Media Transport">
              <artwork><![CDATA[      Packet Stream
             |
             V
+--------------------------+
|  Media Transport Sender  |
+--------------------------+
             |
      Sent Packet Stream
             V
+--------------------------+
|    Network Transport     |
+--------------------------+
             |
 Transported Packet Stream
             V
+--------------------------+
| Media Transport Receiver |
+--------------------------+
             |
             V
    Received Packet Stream
]]></artwork>
            </figure>

            <t/>

            <section anchor="media-transport-sender"
                     title="Media Transport Sender">
              <t>The first transformation within the <xref
              target="media-transport">Media Transport</xref> is the
              Media Transport Sender, where the sending <xref
              target="end-point">End-Point</xref> takes a Packet
              Stream and emits the packets onto the network using the
              transport association established for this Media
              Transport thus creating a <xref
              target="sent-packet-stream">Sent Packet Stream</xref>.
              In this process it transforms the Packet Stream in
              several ways. First, it gains the necessary protocol
              headers for the transport association, for example IP
              and UDP headers, thus forming IP/UDP/RTP packets. In
              addition, the Media Transport Sender may queue, pace or
              otherwise affect how the packets are emitted onto the
              network. Thus adding delay, jitter and inter packet
              spacings that characterize the Sent Packet Stream.</t>
            </section>

            <section anchor="sent-packet-stream"
                     title="Sent Packet Stream">
              <t>The Sent Packet Stream is the Packet Stream as
              entering the first hop of the network path to its
              destination. The Sent Packet Stream is identified using
              network transport addresses, like for IP/UDP the 5-tuple
              (source IP address, source port, destination IP address,
              destination port, and protocol (UDP)).</t>
            </section>

            <section anchor="network-transport"
                     title="Network Transport">
              <t>Network Transport is the transformation that the
              <xref target="sent-packet-stream">Sent Packet
              Stream</xref> is subjected to by traveling from the
              source to the destination through the network. These
              transformations include, loss of some packets, varying
              delay on a per packet basis, packet duplication, and
              packet header or data corruption. These transformations
              produces a <xref
              target="transported-packet-stream">Transported Packet
              Stream</xref> at the exit of the network path.</t>
            </section>

            <section anchor="transported-packet-stream"
                     title="Transported Packet Stream">
              <t>The Packet Stream that is emitted out of the network
              path at the destination, subjected to the <xref
              target="network-transport">Network Transport's
              transformation</xref>.</t>
            </section>

            <section title="Media Transport Receiver">
              <t>The receiver <xref
              target="end-point">End-Point's</xref> transformation of
              the <xref target="transported-packet-stream">Transported
              Packet Stream</xref> by its reception process that
              result in the <xref
              target="received-packet-stream">Received Packet
              Stream</xref>. This transformation includes transport
              checksums being verified and if non-matching, causing
              discarding of the corrupted packet. Other
              transformations can include delay variations in
              receiving a packet on the network interface and
              providing it to the application.</t>
            </section>
          </section>
        </section>

        <section anchor="received-packet-stream"
                 title="Received Packet Stream">
          <t>The <xref target="packet-stream">Packet Stream</xref>
          resulting from the Media Transport's transformation, i.e.
          subjected to packet loss, packet corruption, packet
          duplication and varying transmission delay from sender to
          receiver.</t>
        </section>

        <section anchor="received-redundancy-ps"
                 title="Received Redundandy Packet Stream">
          <t>The <xref target="redundancy-packet-stream">Redundancy
          Packet Stream</xref> resulting from the Media Transport's
          transformation, i.e. subjected to packet loss, packet
          corruption, and varying transmission delay from sender to
          receiver.</t>
        </section>

        <section title="Media Repair">
          <t>A Transformation that takes as input one or more <xref
          target="packet-stream">Source Packet Streams</xref> as well
          as <xref target="redundancy-packet-stream">Redundancy Packet
          Streams</xref> and attempts to combine them to counter the
          transformations introduced by the <xref
          target="media-transport">Media Transport</xref> to minimize
          the difference between the <xref
          target="source-stream">Source Stream</xref> and the <xref
          target="received-source-stream">Received Source
          Stream</xref> after <xref target="media-decoder">Media
          Decoder</xref>. The output is a <xref
          target="repaired-packet-stream">Repaired Packet
          Stream</xref>.</t>
        </section>

        <section anchor="repaired-packet-stream"
                 title="Repaired Packet Stream">
          <t>A <xref target="received-packet-stream">Received Packet
          Stream</xref> for which <xref
          target="received-redundancy-ps">Received Redundancy Packet
          Stream</xref> information has been used to try to re-create
          the <xref target="packet-stream">Packet Stream</xref> as it
          was before <xref target="media-transport">Media
          Transport</xref>.</t>
        </section>

        <section title="Media Depacketizer">
          <t>A Media Depacketizer takes one or more <xref
          target="packet-stream">Packet Streams</xref> and
          depacketizes them and attempts to reconstitute the <xref
          target="encoded-stream">Encoded Streams</xref> or <xref
          target="dependent-stream">Dependent Streams</xref> present
          in those Packet Streams.</t>
        </section>

        <section anchor="received-encoded-stream"
                 title="Received Encoded Stream">
          <t>The received version of an <xref
          target="encoded-stream">Encoded Stream</xref>.</t>
        </section>

        <section anchor="media-decoder" title="Media Decoder">
          <t>A Media Decoder is a transformation that is responsible
          for decoding <xref target="encoded-stream">Encoded
          Streams</xref> and any <xref
          target="dependent-stream">Dependent Streams</xref> into a
          <xref target="source-stream">Source Stream</xref>.</t>

          <section title="Alternate Usages">
            <t>Within the context of SDP, an m=line describes the
            necessary configuration and identification (RTP Payload
            Types) required to decode either one or more incoming
            Media Streams.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Media Decoder is the entity that will have to
                deal with any errors in the encoded streams that
                resulted from corruptions or failures to repair packet
                losses. This as a media decoder generally is forced to
                produce some output periodically. It thus commonly
                includes concealment methods.</t>
              </list></t>
          </section>
        </section>

        <section anchor="received-source-stream"
                 title="Received Source Stream">
          <t>The received version of a <xref
          target="source-stream">Source Stream</xref>.</t>
        </section>

        <section anchor="media-sink" title="Media Sink">
          <t>The Media Sink receives a <xref
          target="source-stream">Source Stream</xref> that contains,
          usually periodically, sampled media data together with
          associated synchronization information. Depending on
          application, this Source Stream then needs to be transformed
          into a <xref target="raw-stream">Raw Stream</xref> that is
          sent in synchronization with the output from other Media
          Sinks to a <xref target="media-render">Media Render</xref>.
          The media sink may also be connected with a <xref
          target="media-source">Media Source</xref> and be used as
          part of a conceptual Media Source.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>The media sink can further transform the source
                stream into a representation that is suitable for
                rendering on the Media Render as defined by the
                application or system-wide configuration. This include
                sample scaling, level adjustments etc.</t>
              </list></t>
          </section>
        </section>

        <section title="Received Raw Stream">
          <t>The received version of a <xref target="raw-stream">Raw
          Stream</xref>.</t>
        </section>

        <section anchor="media-render" title="Media Render">
          <t>A Media Render takes a <xref target="raw-stream">Raw
          Stream</xref> and converts it into <xref
          target="physical-stimulus">Physical Stimulus</xref> that a
          human user can perceive. Examples of such devices are
          screens, D/A converters connected to amplifiers and
          loudspeakers.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>An End Point can potentially have multiple Media
                Renders for each media type.</t>
              </list></t>
          </section>
        </section>
      </section>

      <section anchor="communication-entities"
               title="Communication Entities">
        <t>This section contains concept for entities involved in the
        communication.</t>

        <section anchor="end-point" title="End Point">
          <t>A single addressable entity sending or receiving RTP
          packets. It may be decomposed into several functional
          blocks, but as long as it behaves as a single RTP stack
          entity it is classified as a single "End Point".</t>

          <section title="Alternate Usages">
            <t>The CLUE Working Group (WG) uses the terms "Media
            Provider" and "Media Consumer" to describes aspects of End
            Point pertaining to sending and receiving
            functionalities.</t>
          </section>

          <section title="Characteristics">
            <t>End Points can be identified in several different ways.
            While RTCP Canonical Names (CNAMEs) <xref
            target="RFC3550"/> provide a globally unique and stable
            identification mechanism for the duration of the
            Communication Session (see <xref target="comm-session"/>),
            their validity applies exclusively within a <xref
            target="syncontext">Synchronization Context</xref>. Thus
            one End Point can have multiple CNAMEs. Therefore,
            mechanisms outside the scope of RTP, such as application
            defined mechanisms, must be used to ensure End Point
            identification when outside this Synchronization
            Context.</t>
          </section>
        </section>

        <section anchor="rtp-session" title="RTP Session">
          <t>An RTP session is an association among a group of
          participants communicating with RTP. It is a group
          communications channel which can potentially carry a number
          of Packet Streams. Within an RTP session, every participant
          can find meta-data and control information (over RTCP) about
          all the Packet Streams in the RTP session. The bandwidth of
          the RTCP control channel is shared between all participants
          within an RTP Session.</t>

          <section title="Alternate Usages">
            <t>Within the context of SDP, a singe m=line can map to a
            single RTP Session or multiple m=lines can map to a single
            RTP Session. The latter is enabled via multiplexing
            schemes such as BUNDLE <xref
            target="I-D.ietf-mmusic-sdp-bundle-negotiation"/>, for
            example, which allows mapping of multiple m=lines to a
            single RTP Session.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Typically, an RTP Session can carry one ore more
                Packet Streams.</t>

                <t>An RTP Session shares a single SSRC space as
                defined in RFC3550 <xref target="RFC3550"/>. That is,
                the End Points participating in an RTP Session can see
                an SSRC identifier transmitted by any of the other End
                Points. An End Point can receive an SSRC either as
                SSRC or as a Contributing source (CSRC) in RTP and
                RTCP packets, as defined by the endpoints' network
                interconnection topology.</t>

                <t>An RTP Session uses at least two <xref
                target="media-transport">Media Transports</xref>, one
                for sending and one for receiving. Commonly, the
                receiving one is the reverse direction of the same one
                as used for sending. An RTP Session may use many Media
                Transports and these define the session's network
                interconnection topology. A single Media Transport can
                normally not transport more than one RTP Session,
                unless a solution for multiplexing multiple RTP
                sessions over a single Media Transport is used. One
                example of such a scheme is <xref
                target="I-D.westerlund-avtcore-transport-multiplexing">Multiple
                RTP Sessions on a Single Lower-Layer
                Transport</xref>.</t>

                <t>Multiple RTP Sessions can be related.</t>
              </list></t>
          </section>
        </section>

        <section anchor="participant" title="Participant">
          <t>A participant is an entity reachable by a single
          signaling address, and is thus related more to the signaling
          context than to the media context.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A single signaling-addressable entity, using an
                application-specific signaling address space, for
                example a SIP URI.</t>

                <t>A participant can have several <xref
                target="multimedia-session">Multimedia
                Sessions</xref>.</t>

                <t>A participant can have several associated transport
                flows, including several separate local transport
                addresses for those transport flows.</t>

                <!--MW: I can't understand what the purpose is of the last bullet regarding many
transport flows. It needs to be aligned with the rest of the concept language.
But I am unable to change it because I don't understand what one attempts
to say.
BoB: Speculatively, it is just trying to prohibit definig a Participant as
being one end of a single Media Transport. This bullet is then not needed,
as a single Multimedia Session can already have multiple Media Transports.
-->
              </list></t>
          </section>
        </section>

        <section anchor="multimedia-session"
                 title="Multimedia Session">
          <t>A multimedia session is an association among a group of
          participants engaged in the communication via one or more
          <xref target="rtp-session">RTP Sessions</xref>. It defines
          logical relationships among <xref
          target="media-source">Media Sources</xref> that appear in
          multiple RTP Sessions.</t>

          <section title="Alternate Usages">
            <t>RFC4566 <xref target="RFC4566"/> defines a multimedia
            session as a set of multimedia senders and receivers and
            the data streams flowing from senders to receivers.</t>

            <t>RFC3550 <xref target="RFC3550"/> defines it as set of
            concurrent RTP sessions among a common group of
            participants. For example, a video conference (which is a
            multimedia session) may contain an audio RTP session and a
            video RTP session.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Multimedia Session can be composed of several
                parallel RTP Sessions with potentially multiple Packet
                Streams per RTP Session.</t>

                <t>Each participant in a Multimedia Session can have a
                multitude of Media Captures and Media Rendering
                devices.</t>
              </list></t>
          </section>
        </section>

        <section anchor="comm-session" title="Communication Session">
          <t>A Communication Session is an association among group of
          participants communicating with each other via a set of
          Multimedia Sessions.</t>

          <section title="Alternate Usages">
            <t>The <xref target="RFC4566">Session Description Protocol
            (SDP)</xref> defines a multimedia session as a set of
            multimedia senders and receivers and the data streams
            flowing from senders to receivers. In that definition it
            is however not clear if a multimedia session includes both
            the sender's and the receiver's view of the same RTP
            Packet Stream.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Each participant in a Communication Session is
                identified via an application-specific signaling
                address.</t>

                <t>A Communication Session is composed of at least one
                Multimedia Session per participant, involving one or
                more parallel RTP Sessions with potentially multiple
                Packet Streams per RTP Session.</t>
              </list> For example, in a full mesh communication, the
            Communication Session consists of a set of separate
            Multimedia Sessions between each pair of Participants.
            Another example is a centralized conference, where the
            Communication Session consists of a set of Multimedia
            Sessions between each Participant and the conference
            handler.</t>
          </section>
        </section>
      </section>
    </section>

    <section title="Relations at Different Levels">
      <t>This section uses the concepts from previous section and look
      at different types of relationships among them. These
      relationships occur at different levels and for different
      purposes. The section is organized such as to look at the level
      where a relation is required. The reason for the relationship
      may exist at another step in the media handling chain. For
      example, using Simulcast (discussed in <xref
      target="simulcast"/>) needs to determine relations at Packet
      Stream level, however the reason to relate Packet Streams is
      that multiple Media Encoders use the same Media Source, i.e. to
      be able to identify a common Media Source.</t>

      <section title="Media Source Relations">
        <t><xref target="media-source">Media Sources</xref> are
        commonly grouped and related to an <xref
        target="end-point">End Point</xref> or a <xref
        target="participant">Participant</xref>. This occurs for
        several reasons; both application logic as well as media
        handling purposes. These cases are further discussed
        below.</t>

        <section anchor="syncontext" title="Synchronization Context">
          <t>A Synchronization Context defines a requirement on a
          strong timing relationship between the Media Sources,
          typically requiring alignment of clock sources. Such
          relationship can be identified in multiple ways as listed
          below. A single Media Source can only belong to a single
          Synchronization Context, since it is assumed that a single
          Media Source can only have a single media clock and
          requiring alignment to several Synchronization Contexts (and
          thus reference clocks) will effectively merge those into a
          single Synchronization Context.</t>

          <!--MW: The following paragraph may be quite misplaced. Should be reconsidered when improving
text for the relations between RTP Sessions, Multimedia Sessions and Communication
Sessions.-->

          <t>A single Multimedia Session can contain media from one or
          more Synchronization Contexts. An example of that is a
          Multimedia Session containing one set of audio and video for
          communication purposes belonging to one Synchronization
          Context, and another set of audio and video for presentation
          purposes (like playing a video file) with a separate
          Synchronization Context that has no strong timing
          relationship and need not be strictly synchronized with the
          audio and video used for communication.</t>

          <section title="RTCP CNAME">
            <t>RFC3550 <xref target="RFC3550"/> describes Inter-media
            synchronization between RTP Sessions based on RTCP CNAME,
            RTP and Network Time Protocol (NTP) <xref
            target="RFC5905"/> formatted timestamps of a reference
            clock. As indicated in <xref
            target="I-D.ietf-avtcore-clksrc"/>, despite using NTP
            format timestamps, it is not required that the clock be
            synchronized to an NTP source.</t>
          </section>

          <section title="Clock Source Signaling">
            <t><xref target="I-D.ietf-avtcore-clksrc"/> provides a
            mechanism to signal the clock source in SDP both for the
            reference clock as well as the media clock, thus allowing
            a Synchronization Context to be defined beyond the one
            defined by the usage of CNAME source descriptions.</t>
          </section>

          <section title="CLUE Scenes">
            <t>In CLUE "Capture Scene", "Capture Scene Entry" and
            "Captures" define an implied Synchronization Context.</t>
          </section>

          <section title="Implicitly via RtcMediaStream">
            <t>The WebRTC WG defines "RtcMediaStream" with one or more
            "RtcMediaStreamTracks". All tracks in a "RTCMediaStream"
            are intended to be possible to synchronize when
            rendered.</t>
          </section>

          <section title="Explicitly via SDP Mechanisms">
            <t>RFC5888 <xref target="RFC5888"/> defines m=line
            grouping mechanism called "Lip Synchronization (LS)" for
            establishing the synchronization requirement across
            m=lines when they map to individual sources.</t>

            <t>RFC5576 <xref target="RFC5576"/> extends the above
            mechanism when multiple media sources are described by a
            single m=line.</t>
          </section>
        </section>

        <section title="End Point">
          <t>Some applications requires knowledge of what Media
          Sources originate from a particular <xref
          target="end-point">End Point</xref>. This can include such
          decisions as packet routing between parts of the topology,
          knowing the End Point origin of the Packet Streams.</t>

          <t>In RTP, this identification has been overloaded with the
          Synchronization Context through the usage of the source
          description CNAME item. This works for some usages, but
          sometimes it breaks down. For example, if an End Point has
          two sets of Media Sources that have different
          Synchronization Contexts, like the audio and video of the
          human participant as well as a set of Media Sources of audio
          and video for a shared movie. Thus, an End Point may have
          multiple CNAMEs. The CNAMEs or the Media Sources themselves
          can be related to the End Point.</t>
        </section>

        <section title="Participant">
          <t>In communication scenarios, it is commonly needed to know
          which Media Sources that originate from which <xref
          target="participant">Participant</xref>. Thus enabling the
          application to for example display Participant Identity
          information correctly associated with the Media Sources.
          This association is currently handled through the signaling
          solution to point at a specific Multimedia Session where the
          Media Sources may be explicitly or implicitly tied to a
          particular End Point.</t>

          <t>Participant information becomes more problematic due to
          Media Sources that are generated through mixing or other
          conceptual processing of Raw Streams or Source Streams that
          originate from different Participants. This type of Media
          Sources can thus have a dynamically varying set of origins
          and Participants. RTP contains the concept of Contributing
          Sources (CSRC) that carries such information about the
          previous step origin of the included media content on RTP
          level.</t>
        </section>

        <section title="WebRTC MediaStream">
          <t>An RtcMediaStream, in addition to requiring a single
          Synchronization Context as discussed above, is also an
          explicit grouping of a set of Media Sources, as identified
          by RtcMediaStreamTracks, within the RtcMediaStream.</t>
        </section>
      </section>

      <section title="Packetization Time Relations">
        <t>At RTP Packetization time, there exists a possibility for a
        number of different types of relationships between <xref
        target="encoded-stream">Encoded Streams</xref>, <xref
        target="dependent-stream">Dependent Streams</xref> and <xref
        target="packet-stream">Packet Streams</xref>. These are caused
        by grouping together or distributing these different types of
        streams into Packet Streams. This section will look at such
        relationships.</t>

        <section title="Single Stream Transport of SVC">
          <t><xref target="RFC6190">Scalable Video Coding</xref> has a
          mode of operation where Encoded Streams and Dependent
          Streams from the SVC Media Encoder is grouped together in a
          single Source Packet Stream using the SVC RTP Payload
          format.</t>
        </section>

        <section title="Multi-Channel Audio">
          <t>There exist a number of RTP payload formats that can
          carry multi-channel audio, despite the codec being a mono
          encoder. Multi-channel audio can be viewed as multiple Media
          Sources sharing a common Synchronization Context. These are
          then independently encoded by a Media Encoder and the
          different Encoded Streams are then packetized together in a
          time synchronized way into a single Source Packet Stream
          using the used codec's RTP Payload format. Example of such
          codecs are, <xref target="RFC3551">PCMA and PCMU</xref>,
          <xref target="RFC4867">AMR</xref>, and <xref
          target="RFC5404">G.719</xref>.</t>
        </section>

        <section title="Redundancy Format">
          <t>The <xref target="RFC2198">RTP Payload for Redundant
          Audio Data</xref> defines how one can transport redundant
          audio data together with primary data in the same RTP
          payload. The redundant data can be a time delayed version of
          the primary or another time delayed Encoded stream using a
          different Media Encoder to encode the same Media Source as
          the primary, as depicted below in <xref
          target="fig-red-rfc2198"/>.</t>

          <figure align="center" anchor="fig-red-rfc2198"
                  title="Concept for usage of Audio Redundancy  with different Media Encoders">
            <artwork><![CDATA[+--------------------+
|    Media Source    |
+--------------------+
          |
     Source Stream
          |
          +------------------------+
          |                        |
          V                        V
+--------------------+   +--------------------+
|   Media Encoder    |   |   Media Encoder    |
+--------------------+   +--------------------+
          |                        |
          |                 +------------+
    Encoded Stream          | Time Delay |
          |                 +------------+
          |                        |
          |     +------------------+
          V     V
+--------------------+
|  Media Packetizer  |
+--------------------+
          |
          V
   Packet Stream ]]></artwork>
          </figure>

          <t>The Redundancy format is thus providing the necessary
          meta information to correctly relate different parts of the
          same Encoded Stream, or in the case <xref
          target="fig-red-rfc2198">depicted above</xref> relate the
          Received Source Stream fragments coming out of different
          Media Decoders to be able to combine them together into a
          less erroneous Source Stream.</t>
        </section>
      </section>

      <section title="Packet Stream Relations">
        <t>This section discusses various cases of relationships among
        Packet Streams. This is a common relation to handle in RTP due
        to that Packet Streams are separate and have their own SSRC,
        implying independent sequence numbers and timestamp spaces.
        The underlying reasons for the Packet Stream relationships are
        different, as can be seen in the cases below. The different
        Packet Streams can be handled within the same RTP Session or
        different RTP Sessions to accomplish different transport
        goals. This separation of Packet Streams is further discussed
        in <xref target="packet-stream-separation"/>.</t>

        <section anchor="simulcast" title="Simulcast">
          <t>A Media Source represented as multiple independent
          Encoded Streams constitutes a simulcast of that Media
          Source. <xref target="fig-simulcast"/> below represents an
          example of a Media Source that is encoded into three
          separate and different Simulcast streams, that are in turn
          sent on the same Media Transport flow. When using Simulcast,
          the Packet Streams may be sharing RTP Session and Media
          Transport, or be separated on different RTP Sessions and
          Media Transports, or be any combination of these two. It is
          other considerations that affect which usage is desirable,
          as discussed in <xref
          target="packet-stream-separation"/>.</t>

          <figure anchor="fig-simulcast"
                  title="Example of Media Source Simulcast">
            <artwork align="center"><![CDATA[                        +----------------+
                        |  Media Source  |
                        +----------------+
                 Source Stream  |
         +----------------------+----------------------+
         |                      |                      |
         v                      v                      v
+------------------+   +------------------+   +------------------+
|  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
+------------------+   +------------------+   +------------------+
         | Encoded              | Encoded              | Encoded
         | Stream               | Stream               | Stream
         v                      v                      v
+------------------+   +------------------+   +------------------+
| Media Packetizer |   | Media Packetizer |   | Media Packetizer |
+------------------+   +------------------+   +------------------+
         | Source               | Source               | Source
         | Packet               | Packet               | Packet
         | Stream               | Stream               | Stream
         +-----------------+    |    +-----------------+
                           |    |    |
                           V    V    V
                      +-------------------+
                      |  Media Transport  |
                      +-------------------+
]]></artwork>
          </figure>

          <t>The simulcast relation between the Packet Streams is the
          common Media Source. In addition, to be able to identify the
          common Media Source, a receiver of the Packet Stream may
          need to know which configuration or encoding goals that lay
          behind the produced Encoded Stream and its properties. This
          to enable selection of the stream that is most useful in the
          application at that moment.</t>
        </section>

        <section anchor="svc"
                 title="Layered Multi-Stream Transmission">
          <t>Multi-stream transmission (MST) is a mechanism by which
          different portions of a layered encoding of a Source Stream
          are sent using separate Packet Streams (sometimes in
          separate RTP sessions). MSTs are useful for receiver control
          of layered media.</t>

          <t>A Media Source represented as an Encoded Stream and
          multiple Dependent Streams constitutes a Media Source that
          has layered dependency. The figure below represents an
          example of a Media Source that is encoded into three
          dependent layers, where two layers are sent on the same
          Media Transport using different Packet Streams, i.e. SSRCs,
          and the third layer is sent on a separate Media Transport,
          i.e. a different RTP Session.</t>

          <figure align="center" anchor="fig-ddp"
                  title="Example of Media Source Layered Dependency">
            <artwork align="center"><![CDATA[                     +----------------+
                     |  Media Source  |
                     +----------------+
                             |
                             |
                             V
+---------------------------------------------------------+
|                      Media Encoder                      |
+---------------------------------------------------------+
        |                    |                     |
 Encoded Stream       Dependent Stream     Dependent Stream
        |                    |                     |
        V                    V                     V
+----------------+   +----------------+   +----------------+
|Media Packetizer|   |Media Packetizer|   |Media Packetizer|
+----------------+   +----------------+   +----------------+
        |                    |                     |
  Packet Stream         Packet Stream        Packet Stream
        |                    |                     |
        +------+      +------+                     |
               |      |                            |
               V      V                            V
         +-----------------+              +-----------------+
         | Media Transport |              | Media Transport |
         +-----------------+              +-----------------+
]]></artwork>
          </figure>

          <t>The SVC MST relation needs to identify the common Media
          Encoder origin for the Encoded and Dependent Streams. The
          SVC RTP Payload RFC is not particularly explicit about how
          this relation is to be implemented. When using different RTP
          Sessions, thus different Media Transports, and as long as
          there is only one Packet Stream per Media Encoder and a
          single Media Source in each RTP Session, common SSRC and
          CNAMEs can be used to identify the common Media Source. When
          multiple Packet Streams are sent from one Media Encoder in
          the same RTP Session, then CNAME is the only currently
          specified RTP identifier that can be used. In cases where
          multiple Media Encoders use multiple Media Sources sharing
          Synchronization Context, and thus having a common CNAME,
          additional heuristics need to be applied to create the MST
          relationship between the Packet Streams.</t>
        </section>

        <section anchor="repair" title="Robustness and Repair">
          <t>Packet Streams may be protected by Redundancy Packet
          Streams during transport. Several approaches listed below
          can achieve the same result; <list style="symbols">
              <t>Duplication of the original Packet Stream</t>

              <t>Duplication of the original Packet Stream with a time
              offset,</t>

              <t>Forward Error Correction (FEC) techniques, and</t>

              <t>Retransmission of lost packets (either globally or
              selectively).</t>
            </list></t>

          <t/>

          <section title="RTP Retransmission">
            <t>The <xref target="fig-rtx">figure below</xref>
            represents an example where a Media Source's Source Packet
            Stream is protected by a <xref
            target="RFC4588">retransmission (RTX) flow</xref>. In this
            example the Source Packet Stream and the Redundancy Packet
            Stream share the same Media Transport.</t>

            <figure align="center" anchor="fig-rtx"
                    title="Example of Media Source Retransmission Flows">
              <artwork align="center"><![CDATA[+--------------------+
|    Media Source    |
+--------------------+
          |
          V
+--------------------+
|   Media Encoder    |
+--------------------+
          |                              Retransmission
    Encoded Stream     +--------+     +---- Request
          V            |        V     V
+--------------------+ | +--------------------+
|  Media Packetizer  | | | RTP Retransmission |
+--------------------+ | +--------------------+
          |            |           |
          +------------+  Redundancy Packet Stream
   Source Packet Stream            |
          |                        |
          +---------+    +---------+
                    |    |
                    V    V
             +-----------------+
             | Media Transport |
             +-----------------+
]]></artwork>
            </figure>

            <t>The <xref target="fig-rtx">RTP Retransmission
            example</xref> helps illustrate that this mechanism works
            purely on the Source Packet Stream. The RTP Retransmission
            transform buffers the sent Source Packet Stream and upon
            requests emits a retransmitted packet with some extra
            payload header as a Redundancy Packet Stream. The <xref
            target="RFC4588">RTP Retransmission mechanism</xref> is
            specified so that there is a one to one relation between
            the Source Packet Stream and the Redundancy Packet Stream.
            Thus a Redundancy Packet Stream needs to be associated
            with its Source Packet Stream upon being received. This is
            done based on CNAME selectors and heuristics to match
            requested packets for a given Source Packet Stream with
            the original sequence number in the payload of any new
            Redundancy Packet Stream using the RTX payload format. In
            cases where the Redundancy Packet Stream is sent in a
            separate RTP Session from the Source Packet Stream, these
            sessions are related, e.g. using the <xref
            target="RFC5888">SDP Media Grouping's</xref> FID
            semantics.</t>
          </section>

          <section title="Forward Error Correction">
            <t>The <xref target="fig-fec">figure below</xref>
            represents an example where two Media Sources' Source
            Packet Streams are protected by FEC. Source Packet Stream
            A has a Media Redundancy transformation in FEC Encoder 1.
            This produces a Redundancy Packet Stream 1, that is only
            related to Source Packet Stream A. The FEC Encoder 2,
            however takes two Source Packet Streams (A and B) and
            produces a Redundancy Packet Stream 2 that protects them
            together, i.e. Redundancy Packet Stream 2 relate to two
            Source Packet Streams (a FEC group). FEC decoding, when
            needed due to packet loss or packet corruption at the
            receiver, requires knowledge about which Source Packet
            Streams that the FEC encoding was based on.</t>

            <t>In <xref target="fig-fec"/> all Packet Streams are sent
            on the same Media Transport. This is however not the only
            possible choice. Numerous combinations exist for spreading
            these Packet Streams over different Media Transports to
            achieve the communication application's goal.</t>

            <figure align="center" anchor="fig-fec"
                    title="Example of FEC Flows">
              <artwork align="center"><![CDATA[+--------------------+                +--------------------+
|   Media Source A   |                |   Media Source B   |
+--------------------+                +--------------------+
          |                                     |
          V                                     V
+--------------------+                +--------------------+
|   Media Encoder A  |                |   Media Encoder B  |
+--------------------+                +--------------------+
          |                                     |
    Encoded Stream                        Encoded Stream
          V                                     V
+--------------------+                +--------------------+
| Media Packetizer A |                | Media Packetizer B |
+--------------------+                +--------------------+
          |                                     |
Source Packet Stream A                Source Packet Stream B
          |                                     |
    +-----+-------+-------------+       +-------+------+
    |             V             V       V              |
    |    +---------------+  +---------------+          |
    |    | FEC Encoder 1 |  | FEC Encoder 2 |          |
    |    +---------------+  +---------------+          |
    |             |                 |                  |
    |     Redundancy PS 1    Redundancy PS 2           |
    V             V                 V                  V
+----------------------------------------------------------+
|                    Media Transport                       |
+----------------------------------------------------------+
]]></artwork>
            </figure>

            <t>As FEC Encoding exists in various forms, the methods
            for relating FEC Redundancy Packet Streams with its source
            information in Source Packet Streams are many. The <xref
            target="RFC5109">XOR based RTP FEC Payload format</xref>
            is defined in such a way that a Redundancy Packet Stream
            has a one to one relation with a Source Packet Stream. In
            fact, the RFC requires the Redundancy Packet Stream to use
            the same SSRC as the Source Packet Stream. This requires
            to either use a separate RTP session or to use the <xref
            target="RFC2198">Redundancy RTP Payload format</xref>. The
            underlying relation requirement for this FEC format and a
            particular Redundancy Packet Stream is to know the related
            Source Packet Stream, including its SSRC.</t>

            <t><!--MW: Here we could ad something about FECFRAME and generalized block FEC that can
protect multiple Packet Streams with one Redundancy Packet Stream. However, that do requrie
usage of explicit Source Packet Information. --></t>
          </section>
        </section>

        <section anchor="packet-stream-separation"
                 title="Packet Stream Separation">
          <t>Packet Streams can be separated exclusively based on
          their SSRCs or at the RTP Session level or at the
          Multi-Media Session level as explained below.</t>

          <t>When the Packet Streams that have a relationship are all
          sent in the same RTP Session and are uniquely identified
          based on their SSRC only, it is termed an SSRC-Only Based
          Separation. Such streams can be related via RTCP CNAME to
          identify that the streams belong to the same End Point.
          <xref target="RFC5576"/>-based approaches, when used, can
          explicitly relate various such Packet Streams.</t>

          <t>On the other hand, when Packet Streams that are related
          but are sent in the context of different RTP Sessions to
          achieve separation, it is known as RTP Session-based
          separation. This is commonly used when the different Packet
          Streams are intended for different Media Transports.</t>

          <t>Several mechanisms that use RTP Session-based separation
          rely on it to enable an implicit grouping mechanism
          expressing the relationship. The solutions have been based
          on using the same SSRC value in the different RTP Sessions
          to implicitly indicate their relation. That way, no explicit
          RTP level mechanism has been needed, only signalling level
          relations have been established using semantics from <xref
          target="RFC5888">Grouping of Media lines framework</xref>.
          Examples of this are <xref target="RFC4588">RTP
          Retransmission</xref>, <xref target="RFC6190">SVC Multi
          Stream Transmission</xref> and <xref target="RFC5109">XOR
          Based FEC</xref>. RTCP CNAME explicitly relates Packet
          Streams across different RTP Sessions, as explained in the
          previous section. Such a relationship can be used to perform
          inter-media synchronization.</t>

          <t>Packet Streams that are related and need to be associated
          can be part of different Multimedia Sessions, rather than
          just different RTP sessions within the same Multimedia
          Session context. This puts further demand on the scope of
          the mechanism(s) and its handling of identifiers used for
          expressing the relationships.</t>
        </section>
      </section>

      <section title="Multiple RTP Sessions over one Media Transport">
        <t><xref
        target="I-D.westerlund-avtcore-transport-multiplexing"/>
        describes a mechanism that allow several RTP Sessions to be
        carried over a single underlying Media Transport. The main
        reasons for doing this are related to the impact of using one
        or more Media Transports. Thus using a common network path or
        potentially have different ones. There is reduced need for
        NAT/FW traversal resources and no need for flow based QoS.</t>

        <t>However, Multiple RTP Sessions over one Media Transport
        makes it clear that a single Media Transport 5-tuple is not
        sufficient to express which RTP Session context a particular
        Packet Stream exists in. Complexities in the relationship
        between Media Transports and RTP Session already exist as one
        RTP Session contains multiple Media Transports, e.g. even a
        Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires
        two Media Transports, one in each direction. The relationship
        between Media Transports and RTP Sessions as well as
        additional levels of identifiers need to be considered in both
        signalling design and when defining terminology.</t>
      </section>
    </section>

    <section anchor="topologies"
             title="Topologies and Communication Entities">
      <t>This Section reviews some communication topologies and looks
      at the relationship among the communication entities that are
      defined in <xref target="communication-entities"/>. This section
      doesn't deal with discussions about the streams and their
      relation to the transport. Instead, it covers the aspects that
      enable the transport of those streams. For example, the <xref
      target="media-transport">Media Transports</xref> that exists
      between the <xref target="end-point">End Points</xref> that are
      part of an <xref target="rtp-session">RTP session</xref> and
      their relationship to the <xref
      target="multimedia-session">Multi-Media Session</xref> between
      <xref target="participant">Participants</xref> and the
      established <xref target="comm-session">Communication
      session</xref> are explained.</t>

      <section title="Point-to-Point Communication">
        <t><xref target="fig-p2p-basic"/> shows a very basic
        point-to-point communication session between A and B. It uses
        two different audio and video RTP sessions between A's and B's
        end points. Assume that the Multi-media session shared by the
        participants is established using SIP (i.e., there is a SIP
        Dialog between A and B). The high level representation of this
        communication scenario can be demonstrated using <xref
        target="fig-p2p-basic"/>.</t>

        <figure align="center" anchor="fig-p2p-basic"
                title="Point to Point Communication">
          <artwork><![CDATA[
+---+         +---+
| A |<------->| B |
+---+         +---+
]]></artwork>
        </figure>

        <t>However, this picture gets slightly more complex when
        redrawn using the communication entities concepts defined
        earlier in this document.</t>

        <figure align="center" anchor="fig-p2p"
                title="Point to Point Communication Session with two RTP Sessions">
          <artwork><![CDATA[
+-----------------------------------------------------------+
| Communication Session                                     |
|                                                           |
| +----------------+                     +----------------+ |
| | Participant A  |   +-------------+   | Participant B  | |
| |                |   | Multi-Media |   |                | |
| | +-------------+|<=>| Session     |<=>|+-------------+ | |
| | | End Point A ||   |(SIP Dialog) |   || End Point B | | |
| | |             ||   +-------------+   ||             | | |
| | | +-----------++---------------------++-----------+ | | |
| | | | RTP Session|                     |            | | | |
| | | | Audio      |---Media Transport-->|            | | | |
| | | |            |<--Media Transport---|            | | | |
| | | +-----------++---------------------++-----------+ | | |
| | |             ||                     ||             | | |
| | | +-----------++---------------------++-----------+ | | |
| | | | RTP Session|                     |            | | | |
| | | | Video      |---Media Transport-->|            | | | |
| | | |            |<--Media Transport---|            | | | |
| | | +-----------++---------------------++-----------+ | | |
| | +-------------+|                     |+-------------+ | |
| +----------------+                     +----------------+ |
+-----------------------------------------------------------+
]]></artwork>
        </figure>

        <t><xref target="fig-p2p"/> shows the two RTP Sessions only
        exist between the two End Points A and B and over their
        respective Media Transports. The Multi-Media Session
        establishes the association between the two Participants and
        configures these RTP sessions and the Media Transports that
        are used.</t>
      </section>

      <section anchor="central-conferencing"
               title="Central Conferencing">
        <t>This section looks at the central conferencing
        communication topology, where a number of participants, like
        A, B, C, and D in <xref target="fig-central-conf-basic"/>,
        communicate using an RTP mixer.</t>

        <figure anchor="fig-central-conf-basic"
                title="Centralized Conferincing using an RTP Mixer">
          <artwork><![CDATA[+---+      +------------+      +---+
| A |<---->|            |<---->| B |
+---+      |            |      +---+
           |   Mixer    |
+---+      |            |      +---+
| C |<---->|            |<---->| D |
+---+      +------------+      +---+
]]></artwork>
        </figure>

        <t>In this case each of the Participants establish their
        Multi-media session with the Conference Bridge. Thus,
        negotiation for the establishment of the used RTP sessions and
        their configuration happens between these entities. The
        participants have their End Points (A, B, C, D) and the
        Conference Bridge has the host running the RTP mixer, referred
        to as End Point M in <xref target="fig-central-conf"/>.
        However, despite the individual establishment of four
        Multi-Media Sessions and the corresponding Media Transports
        for each of the RTP sessions between the respective End Points
        and the Conference Bridge, there is actually only two RTP
        sessions. One for audio and one for Video, as these RTP
        sessions are, in this topology, shared between all the
        Participants.</t>

        <figure anchor="fig-central-conf"
                title="Central Conferencing with Two Participants A and B communicating over a Conference Bridge">
          <artwork><![CDATA[+-------------------------------------------------------------------+
| Communication Session                                             |
|                                                                   |
| +----------------+                             +----------------+ |
| | Participant A  |       +-------------+       | Conference     | |
| |                |       | Multi-Media |       | Bridge         | |
| | +-------------+|<=====>| Session A   |<=====>|+-------------+ | |
| | | End Point A ||       |(SIP Dialog) |       || End Point M | | |
| | |             ||       +-------------+       ||             | | |
| | | +-----------++-----------------------------++-----------+ | | |
| | | | RTP Session|                             |            | | | |
| | | | Audio      |-------Media Transport------>|            | | | |
| | | |            |<------Media Transport-------|            | | | |
| | | +-----------++-----------------------------++------+    | | | |
| | |             ||                             ||      |    | | | |
| | | +-----------++-----------------------------++----+ |    | | | |
| | | | RTP Session|                             |     | |    | | | |
| | | | Video      |-------Media Transport------>|     | |    | | | |
| | | |            |<------Media Transport-------|     | |    | | | |
| | | +-----------++-----------------------------++    | |    | | | |
| | +-------------+|                             ||    | |    | | | |
| +----------------+                             ||    | |    | | | |
|                                                ||    | |    | | | |
| +----------------+                             ||    | |    | | | |
| | Participant B  |       +-------------+       ||    | |    | | | |
| |                |       | Multi-Media |       ||    | |    | | | |
| | +-------------+|<=====>| Session B   |<=====>||    | |    | | | |
| | | End Point B ||       |(SIP Dialog) |       ||    | |    | | | |
| | |             ||       +-------------+       ||    | |    | | | |
| | | +-----------++-----------------------------++    | |    | | | |
| | | | RTP Session|                             |     | |    | | | |
| | | | Video      |-------Media Transport------>|     | |    | | | |
| | | |            |<------Media Transport-------|     | |    | | | |
| | | +-----------++-----------------------------++----+ |    | | | |
| | |             ||                             ||      |    | | | |
| | | +-----------++-----------------------------++------+    | | | |
| | | | RTP Session|                             |            | | | |
| | | | Audio      |-------Media Transport------>|            | | | |
| | | |            |<------Media Transport-------|            | | | |
| | | +-----------++-----------------------------++-----------+ | | |
| | +-------------+|                             |+-------------+ | |
| +----------------+                             +----------------+ |
+-------------------------------------------------------------------+
]]></artwork>
        </figure>

        <t>It is important to stress that in the case of <xref
        target="fig-central-conf"/>, it might appear that the the
        Multi-Media Sessions context is scoped between A and B over M.
        This might not be always true and they can have contexts that
        extend further. In this case the RTP session, its common SSRC
        space goes beyond what occurs between A and M and B and M
        respectively.</t>
      </section>

      <section title="Full Mesh Conferencing">
        <t>This section looks at the case where the three Participants
        (A, B and C) wish to communicate. They establish individual
        Multi-Media Sessions and RTP sessions between themselves and
        the other two peers. Thus, each providing two copies of their
        media to every other participant. <xref
        target="fig-full-mesh-basic"/> shows a high level
        representation of such a topology.</t>

        <figure align="center" anchor="fig-full-mesh-basic"
                title="Full Mesh Conferencing with three Participants A, B and C">
          <artwork><![CDATA[+---+      +---+
| A |<---->| B |
+---+      +---+
  ^         ^
   \       /
    \     /
     v   v
     +---+
     | C |
     +---+
]]></artwork>
        </figure>

        <t>In this particular case there are two aspects worth noting.
        The first is there will be multiple Multi-Media Sessions per
        Communication Session between the participants. This, however,
        hasn't been true in the earlier examples; the Centralized
        Conferencing in<xref target="central-conferencing"/> being the
        exception. The second aspect is consideration of whether one
        needs to maintain relationships between entities and concepts,
        for example MediaSources, between these different Multi-Media
        Sessions and between Packet Streams in the independent RTP
        sessions configured by those Multi-Media Sessions.</t>

        <figure align="center" anchor="fig-full-mesh"
                title="Full Mesh Conferencing between three Participants A, B and C">
          <artwork><![CDATA[                       +-----------------------------------------+
                       | Participant A                           |
   +----------+        | +--------------------------------------+|
   | Multi-   |        | | End Point A                          ||
   | Media    |<======>| |                                      ||
   | Session  |        | |+-------+     +-------+     +-------+ ||
   | 1        |        | || RTP 1 |<----| MS A1 |---->| RTP 2 | ||
   +----------+        | ||       |     +-------+     |       | ||
       ^^              | +|-------|-------------------|-------|-+|
       ||              +--|-------|-------------------|-------|--+
       ||                 |       |          ^^       |       |
       VV                 |       |          ||       |       |
+-------------------------|-------|----+     ||       |       |
| Participant B           |       |    |     VV       |       |
| +-----------------------|-------|---+| +----------+ |       |
| | End Point B    +----->|       |   || | Multi-   | |       |
| |                |      +-------+   || | Media    | |       |
| | +-------+      |      +-------+   || | Session  | |       |
| | | MS B1 |------+----->| RTP 3 |   || | 2        | |       |
| | +-------+             |       |   || +----------+ |       |
| +-----------------------|-------|---+|     ^^       |       |
+-------------------------|-------|----+     ||       |       |
       ^^                 |       |          ||       |       |
       ||                 |       |          VV       |       |
       ||              +--|-------|-------------------|-------|--+
       VV              |  |       | Participant C     |       |  |
   +----------+        | +|-------|-------------------|-------|-+|
   | Multi-   |        | ||       | End Point C       |       | ||
   | Media    |<======>| |+-------+                   +-------+ ||
   | Session  |        | |    ^         +-------+         ^     ||
   | 3        |        | |    +---------| MS C1 |---------+     ||
   +----------+        | |              +-------+               ||
                       | +--------------------------------------+|
                       +-----------------------------------------+
]]></artwork>
        </figure>

        <t>For the sake of clarity, <xref target="fig-full-mesh"/>
        above does not include all these concepts. The Media Sources
        (MS) from a given End Point is sent to the two peers. This
        requires encoding and Media Packetization to enable the Packet
        Streams to be sent over Media Transports in the context of the
        RTP sessions depicted. The RTP sessions 1, 2, and 3 are
        independent, and established in the context of each of the
        Multi-Media Sessions 1, 2 and 3. The joint communication
        session the full figure represents (not shown here as it was
        <xref target="fig-central-conf"/> in order to save space),
        however, combines the received representations of the peers'
        Media Sources and plays them back.</t>

        <t>It is noteworthy that the full mesh conferencing topologies
        described here have the potential for creating loops. For
        example, if one compares the above full mesh with a mixing
        three party communication session as <xref
        target="fig-three-relay">depicted in </xref>. In this example
        A's Media Source A1 is sent to B over a Multi-Media Session
        (A-B). In B the Media Source A1 is mixed with Media Source B1
        and the resulting Media Source (MS AB) is sent to C over a
        Multi-Media Session (B-C). If C and A would establish a
        Multi-Media Session (A-C) and C would act in the same role as
        B, then A would receive a Media Source from C that contains a
        mix of A, B and C's individual Media Sources. This would
        result in A playing out a time delay version of its own signal
        (i.e., the system has created an echo path).</t>

        <figure anchor="fig-three-relay"
                title="Mixing Three Party Communication Session">
          <artwork><![CDATA[+--------------+    +--------------+    +--------------+
| A            |    | B +-------+  |    | C            |
|              |    |   | MS B1 |  |    |              |
|              |    |   +-------+  |    |              |
| +-------+    |    |     |        |    |              |
| | MS A1 |----|--->|-----+ MS AB -|--->|              |
| +-------+    |    |              |    |              |
+--------------+    +--------------+    +--------------+
]]></artwork>
        </figure>

        <t>The looping issue can be avoided, detected or prevented
        using two general methods. The first method is to use great
        care when setting up and establishing the communication
        session if participants have any mixing or forwarding
        capacity, so that one doesn't end up getting back a partial or
        full representation of one's own media believing it is someone
        else's. The other method is to maintain some unique
        identifiers at the communication session level for all Media
        Sources and ensure that any Packet Streams received identify
        those Media Sources that contributed to the content of the
        Packet Stream.</t>
      </section>

      <section title="Source-Specific Multicast">
        <t>In one-to-many media distribution cases (e.g., IPTV), where
        one Media Sender or a set of Media Senders is allowed to send
        Packet Streams on a particular Source-Specific Multicast (SSM)
        group to many receivers (R), there are some different aspects
        to consider. <xref target="fig-ssm-basic"/> presents a high
        level SSM system for RTP/RTCP defined in <xref
        target="RFC5760"/>. In this case, several Media Senders sends
        their Packet Streams to the Distribution Source, which is the
        only one allowed to send to the SSM group. The Receivers
        joining the SSM group can provide RTCP feedback on its
        reception by sending unicast feedback to a Feedback Target
        (FT).</t>

        <figure anchor="fig-ssm-basic"
                title="Source-Specific Multicast Communication Topology">
          <artwork><![CDATA[+--------+       +-----+
|Media   |       |     |       Source-Specific
|Sender 1|<----->| D S |       Multicast (SSM)
+--------+       | I O |  +--+----------------> R(1)
                 | S U |  |  |                    |
+--------+       | T R |  |  +-----------> R(2)   |
|Media   |<----->| R C |->+  |           :   |    |
|Sender 2|       | I E |  |  +------> R(n-1) |    |
+--------+       | B   |  |  |          |    |    |
    :            | U   |  +--+--> R(n)  |    |    |
    :            | T +-|          |     |    |    |
    :            | I | |<---------+     |    |    |
+--------+       | O |F|<---------------+    |    |
|Media   |       | N |T|<--------------------+    |
|Sender M|<----->|   | |<-------------------------+
+--------+       +-----+       RTCP Unicast

FT = Feedback Target
]]></artwork>
        </figure>

        <t>Here the Media Transport from the Distribution Source to
        all the SSM receivers (R) have the same 5-tuple, but in
        reality have different paths. Also, the Multi-Media Sessions
        between the Distribution Source and the individual receivers
        are normally identical. This is due to one-way communication
        from the Distribution Source to the receiver of configuration
        information. This is information typically embedded in
        Electronic Program Guides (EPGs), distributed by the Session
        Announcement Protocol (SAP) <xref target="RFC2974"/> or other
        one-way protocols. In some cases load balancing occurs, for
        example, by providing the receiver with a set of Feedback
        Targets and then it randomly selects one out of the set.</t>

        <t>This scenario varies significantly from previously
        described communication topologies due to the asymmetric
        nature of the RTP Session context across the Distribution
        Source. The Distribution Source forms a focal point in
        collecting the unicasted RTCP feedback from the receivers and
        then re-distributing it to the Media Senders. Each Media
        Sender and the Distribution Source establish their own
        Multi-Media Session Context for the underlying RTP Sessions
        but with shared RTCP context across all the receivers.</t>

        <t>To improve the readability,<xref target="fig-ssm-basic">
        </xref> intentionally hides the details of the various
        entities . Expanding on this, one can think of Media Senders
        being part of one or more Multi-Media Sessions grouped under a
        Communication Session. The Media Sender in this scenario
        refers to the Media Packetizer transformation <xref
        target="media_packetizer"/>. The Packet Stream generated by
        such a Media Sender can be part of its own RTP Session or can
        be multiplexed with other Packet Streams within an End Point.
        The latter case requires careful consideration since the
        re-distributed RTCP packets now correspond to a single RTP
        Session Context across all the Media Senders.</t>
      </section>
    </section>

    <section anchor="security" title="Security Considerations">
      <t>This document simply tries to clarify the confusion prevalent
      in RTP taxonomy because of inconsistent usage by multiple
      technologies and protocols making use of the RTP protocol. It
      does not introduce any new security considerations beyond those
      already well documented in the RTP protocol <xref
      target="RFC3550"/> and each of the many respective
      specifications of the various protocols making use of it.</t>

      <t>Hopefully having a well-defined common terminology and
      understanding of the complexities of the RTP architecture will
      help lead us to better standards, avoiding security
      problems.</t>
    </section>

    <section title="Acknowledgement">
      <t>This document has many concepts borrowed from several
      documents such as WebRTC <xref
      target="I-D.ietf-rtcweb-overview"/>, CLUE <xref
      target="I-D.ietf-clue-framework"/>, Multiplexing Architecture
      <xref target="I-D.westerlund-avtcore-transport-multiplexing"/>.
      The authors would like to thank all the authors of each of those
      documents.</t>

      <t>The authors would also like to acknowledge the insights,
      guidance and contributions of Magnus Westerlund, Roni Even, Paul
      Kyzivat, Colin Perkins, Keith Drage, and Harald Alvestrand.</t>
    </section>

    <section title="Contributors">
      <t>Magnus Westerlund has contributed the concept model for the
      media chain using transformations and streams model, including
      rewriting pre-existing concepts into this model and adding
      missing concepts. The first proposal for updating the
      relationships and the topologies based on this concept was also
      performed by Magnus.</t>
    </section>

    <section anchor="iana" title="IANA Considerations">
      <t>This document makes no request of IANA.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.3550"?>

      <reference anchor="UML">
        <front>
          <title>OMG Unified Modeling Language (OMG UML),
          Superstructure, V2.2</title>

          <author>
            <organization abbrev="OMG">Object Management
            Group</organization>
          </author>

          <date month="February" year="2009"/>
        </front>

        <seriesInfo name="OMG" value="formal/2009-02-02"/>

        <format target="http://www.omg.org/spec/UML/2.2/Superstructure/PDF/"
                type="PDF"/>
      </reference>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.2198'?>

      <?rfc include='reference.RFC.2974'?>

      <?rfc include="reference.RFC.3264"?>

      <?rfc include='reference.RFC.3551'?>

      <?rfc include="reference.RFC.4566"?>

      <?rfc include='reference.RFC.4588'?>

      <?rfc include='reference.RFC.4867'?>

      <?rfc include='reference.RFC.5109'?>

      <?rfc include='reference.RFC.5404'?>

      <?rfc include="reference.RFC.5576"?>

      <?rfc include='reference.RFC.5760'?>

      <?rfc include="reference.RFC.5888"?>

      <?rfc include="reference.RFC.5905"?>

      <?rfc include='reference.RFC.6190'?>

      <?rfc include="reference.RFC.6222"?>

      <?rfc include="reference.I-D.ietf-clue-framework"?>

      <?rfc include="reference.I-D.ietf-rtcweb-overview"?>

      <?rfc include="reference.I-D.ietf-mmusic-sdp-bundle-negotiation"?>

      <?rfc include="reference.I-D.ietf-avtcore-clksrc"?>

      <?rfc include="reference.I-D.westerlund-avtcore-transport-multiplexing"?>
    </references>

    <section title="Changes From Earlier Versions">
      <t>NOTE TO RFC EDITOR: Please remove this section prior to
      publication.</t>

      <section title="Modifications Between Version -02 and -03">
        <t><list style="symbols">
            <t>Section 4 rewritten (and new communication topologies
            added) to reflect the major updates to Sections 1-3</t>

            <t>Section 8 removed (carryover from initial -00
            draft)</t>

            <t>General clean up of text, grammar and nits</t>
          </list></t>
      </section>

      <section title="Modifications Between Version -01 and -02">
        <t><list style="symbols">
            <t>Section 2 rewritten to add both streams and
            transformations in the media chain.</t>

            <t>Section 3 rewritten to focus on exposing
            relationships.</t>
          </list></t>
      </section>

      <section title="Modifications Between Version -00 and -01">
        <t><list style="symbols">
            <t>Too many to list</t>

            <t>Added new authors</t>

            <t>Updated content organization and presentation</t>
          </list></t>
      </section>
    </section>
  </back>
</rfc>
