CLUE WG                                                     M. Duckworth
Internet-Draft                                                   Polycom
Intended status: Informational                             June 14, 2013
Expires: December 16, 2013


                      CLUE Switching Mixer Example
               draft-duckworth-clue-switching-example-00

Abstract

   This document presents an example multipoint use case scenario for
   CLUE.  This example uses the media switching variety of the Topo-
   Mixer RTP topology.  This example is intended to promote discussion
   about how to implement it using the CLUE Framework, and whether or
   not the framework as currently defined is sufficient to enable this
   use case.

   This first version is incomplete, and is intended to raise questions
   and prompt discussion.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 16, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect



Duckworth               Expires December 16, 2013               [Page 1]

Internet-Draft        CLUE Switching Mixer Example             June 2013


   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Scenario from user's point of view  . . . . . . . . . . . . . . 4
   3.  Mixer Advertisement . . . . . . . . . . . . . . . . . . . . . . 5
     3.1.  Advertising one big scene . . . . . . . . . . . . . . . . . 5
     3.2.  Advertising multiple scenes . . . . . . . . . . . . . . . . 5
     3.3.  Other ways of advertising . . . . . . . . . . . . . . . . . 6
   4.  Endpoint Selecting from Advertisement . . . . . . . . . . . . . 6
     4.1.  One big scene . . . . . . . . . . . . . . . . . . . . . . . 6
     4.2.  Multiple scenes . . . . . . . . . . . . . . . . . . . . . . 6
   5.  Endpoint Rendering - proper spatial relationships . . . . . . . 6
     5.1.  One big scene . . . . . . . . . . . . . . . . . . . . . . . 6
     5.2.  Multiple scenes . . . . . . . . . . . . . . . . . . . . . . 7
   6.  Open issues . . . . . . . . . . . . . . . . . . . . . . . . . . 7
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 7
   8.  Informative References  . . . . . . . . . . . . . . . . . . . . 7
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 8



























Duckworth               Expires December 16, 2013               [Page 2]

Internet-Draft        CLUE Switching Mixer Example             June 2013


1.  Introduction

   This document presents an example multipoint use case scenario for
   CLUE.  This example uses the media switching variety of the Topo-
   Mixer RTP topology.  This example is intended to promote discussion
   about how to implement it using the CLUE Framework
   [I-D.ietf-clue-framework], and whether or not the framework as
   currently defined is sufficient to enable this use case.

   From the requirements document
   [I-D.ietf-clue-telepresence-requirements]:

      "REQMT-13: The solution MUST support both transcoding and
      switching approaches to providing multipoint conferences."

   This example uses the switching approach.

   [I-D.ietf-clue-rtp-mapping] says media-switching mixer is one of the
   RTP topologies relevant for CLUE.  The media switching variety of
   Topo-Mixer is described in section 3.6.2 of
   [I-D.ietf-avtcore-rtp-topologies-update].  In this topology, the
   mixer provides one or more conceptual sources selecting one source at
   a time from the original sources.  The mixer creates a conference-
   wide RTP session by sharing remote SSRC values as CSRCs to all
   conference participants.

   The basic scenario for this example is a multipoint conference
   consisting of some traditional single-camera single-screen endpoints
   and some 3-camera multi-screen endpoints.  Each endpoint receives
   multiple Capture Encodings that originated from several other
   endpoints.  The multi-screen endpoints show the currently speaking
   endpoint's video using a large area of the display screens, and also
   show other recent speakers in smaller size using less screen space.

   Since the middlebox (the mixer) is of the switching variety it is not
   doing any video composition.  The endpoints are responsible for
   composing video streams to be rendered on the endpoint's display
   screens.  The mixer sends several Capture Encodings to each endpoint,
   with those Capture Encodings originally coming from several other
   endpoints.  So each endpoint receives many capture encodings,
   representing Media Captures that originate at other endpoints.  The
   multi-camera endpoints send multiple Media Captures, while the
   single-camera endpoints send just one Media Capture.  Each Media
   Capture could have multiple Capture Encodings, however.

   The mixer selects which original sources it sends to the endpoints
   based on speech activity, using a policy defined by the mixer.




Duckworth               Expires December 16, 2013               [Page 3]

Internet-Draft        CLUE Switching Mixer Example             June 2013


   When completed, this example should be added to the examples in the
   Framework.


2.  Scenario from user's point of view

   From the human user's point of view, this example is a more specific
   case of the general multipoint scenario in [ref use cases].  Consider
   a conference with these endpoints:

   Endpoint A - 4 screens, 3 cameras
   Endpoint B - 3 screens, 3 cameras
   Endpoint C - 3 screens, 3 cameras
   Endpoint D - 3 screens, 3 cameras
   Endpoint E - 1 screen, 1 camera
   Endpoint F - 2 screens, 1 cameras
   Endpoint G - 1 screen, 1 camera

   This example focuses on what the user in one of the 3-camera multi-
   screen endpoints sees.  Call this person User A, at Endpoint A. There
   are 4 large display screens at Endpoint A. Whenever somebody at
   another site is speaking, all the video captures from that endpoint
   are shown on the large screens.  If the talker is at a 3-camera site,
   then the video from those 3 cameras fills 3 of the screens.  If the
   talker is at a single-camera site, then video from that camera fills
   one of the screens, while the other screens show video from other
   single-camera endpoints.

   User A can also see video from other endpoints, in addition to the
   current talker, although much smaller in size.  Endpoint A has 4
   screens, so one of those screens shows up to 9 other Media Captures
   in a tiled fashion.

   User B at Endpoint B sees a similar arrangement, except there are
   only 3 screens, so the 9 other Media Captures are spread out across
   the bottom of the 3 displays, in a picture-in-picture (PIP) format.

   When somebody at a different endpoint becomes the current talker,
   then User A and User B both see the video from the new talker appear
   on their large screen area, while the previous talker takes one of
   the smaller tiled or PIP areas.  The person who is the current talker
   doesn't see themselves, they see the previous talker in their large
   screen area.

   TBD - Diagrams go here






Duckworth               Expires December 16, 2013               [Page 4]

Internet-Draft        CLUE Switching Mixer Example             June 2013


3.  Mixer Advertisement

   The Media Producer in the mixer sends a CLUE Advertisement to each
   endpoint in the conference.  There are different possibilities for
   how the mixer might construct advertisements.

3.1.  Advertising one big scene

   The Producer in the mixer can advertise one Capture Scene, with many
   Capture Scene Entries (CSE), each with a different number of Media
   Captures.  Say the Producer wants to send up to 12 Media Captures, it
   could advertise one CSE with 12 switched captures, one with 11, one
   with 10, etc.  These switched Media Captures are distinct from the
   Media Captures sent from the endpoints.  But these switched media
   captures get their media from those endpoint Media Captures (really
   their encodings).

   A Consumer could then pick the CSE that had the number of Media
   Captures the Consumer wants to receive.

   Questions:

   1.  What does it mean about spatial relationships, when there is a
       CSE with 12 switched captures?  Does the Producer include any
       spatial information for this type of CSE?

3.2.  Advertising multiple scenes

   The Producer could advertise multiple scenes, each one representing a
   different level in the recent talker list.  Each Scene could have a
   CSE with 3 Media Captures.

   Scene 1: current and most recent talkers
   Scene 2: next most recent talkers
   Scene 3: next most recent talkers
   Scene 4: next most recent talkers

   The grouping of Media Captures, 3 at a time, into the CSEs in each
   Scene indicates the mixer is responsible for maintaining a useful
   spatial relationship between the original source Media Captures it
   switches into these conceptual Media Captures.  The mixer provides
   spatial information, probably using "no scale" coordinates.

   The mixer should use the priority attribute to indicate the Media
   Captures in Scene 1 are highest priority, Scene 2 is next highest,
   and so on.

   Questions:



Duckworth               Expires December 16, 2013               [Page 5]

Internet-Draft        CLUE Switching Mixer Example             June 2013


   1.  How does the Provider know to use CSEs with 3 captures?  Why not
       some other number?  Or could it also work if it included multiple
       CSEs, say with 1 to 4 captures?

3.3.  Other ways of advertising

   What other ways should be considered?


4.  Endpoint Selecting from Advertisement

   This section describes how the Endpoint Consumer selects Media
   Captures from the advertisement.

4.1.  One big scene

   The multi-screen Consumer knows it wants to receive 12 captures, so
   it picks the Media Captures in the CSE that has 12 captures.  A
   single screen endpoint might choose to receive only 1 capture, or
   maybe 3 or 4, depending on how it wants to render video for showing
   to the user.

4.2.  Multiple scenes

   The multi-screen Consumer knows it wants to receive 12 captures, so
   it picks the Media Captures with the highest priority first (one CSE
   of 3 captures), then the next highest (another CSE in another Scene
   with 3 captures), and so on until it has picked 12 captures.


5.  Endpoint Rendering - proper spatial relationships

5.1.  One big scene

   Questions:

   1.  How does the consumer know the actual spatial relationships
       between all the captures it receives?  All 12 captures don't
       really have a combined relationship with each other.
   2.  Does the Consumer need to have access to all the spatial
       relationships advertised by the Media Producers in all the other
       endpoints?  If so, how is this information distributed throughout
       the conference?
       A.  In CLUE messages?
       B.  Extension to SIP Event Package [RFC4575] and XCON Data Model
           [RFC6501]?





Duckworth               Expires December 16, 2013               [Page 6]

Internet-Draft        CLUE Switching Mixer Example             June 2013


5.2.  Multiple scenes

   For the multiple scene approach, the spatial relationships can be
   handled in a straightforward manner by the spatial attributes the
   mixer puts in its advertisement.  The mixer can ensure that when it
   switches media captures from a multi-camera source into its outgoing
   captures, it puts them together in the correct order that it
   described in the advertisement.  And when it switches captures from
   single-camera sources, it could also pick multiple single camera
   sources and assign them to a consistent conceptual spatial relation,
   even though they don't have a real physical relationship.

   The Consumer renderer assigns the scene with highest priority
   captures to the largest areas on its display screens, and it assigns
   each other scene to the smaller areas on its screens.  These
   assigments can remain static, they don't need to change when the
   mixer switches between sources for these media captures.


6.  Open issues

   1.  Add audio considerations - how to switch and render audio
       consistent with video.  Add audio to the example
   2.  Consider how the scene-switch-policy attribute can be used with
       this scenario


7.  Acknowledgements

   Thanks to Stephan Wenger, Rob Hansen, and Andy Pepperell for
   contributing to the ideas in this example.


8.  Informative References

   [I-D.ietf-clue-framework]
              Duckworth, M., Pepperell, A., and S. Wenger, "Framework
              for Telepresence Multi-Streams",
              draft-ietf-clue-framework-10 (work in progress), May 2013.

   [I-D.ietf-clue-telepresence-requirements]
              Romanow, A. and S. Botzko, "Requirements for Telepresence
              Multi-Streams",
              draft-ietf-clue-telepresence-requirements-03 (work in
              progress), January 2013.

   [I-D.ietf-clue-rtp-mapping]
              Even, R. and J. Lennox, "Mapping RTP streams to CLUE media



Duckworth               Expires December 16, 2013               [Page 7]

Internet-Draft        CLUE Switching Mixer Example             June 2013


              captures", draft-ietf-clue-rtp-mapping-00 (work in
              progress), February 2013.

   [I-D.ietf-avtcore-rtp-topologies-update]
              Westerlund, M. and S. Wenger, "RTP Topologies",
              draft-ietf-avtcore-rtp-topologies-update-00 (work in
              progress), April 2013.

   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session
              Initiation Protocol (SIP) Event Package for Conference
              State", RFC 4575, August 2006.

   [RFC6501]  Novo, O., Camarillo, G., Morgan, D., and J. Urpalainen,
              "Conference Information Data Model for Centralized
              Conferencing (XCON)", RFC 6501, March 2012.


Author's Address

   Mark Duckworth
   Polycom

   Email: mark.duckworth@polycom.com




























Duckworth               Expires December 16, 2013               [Page 8]