Network Working Group N. Cook Request for Comments: 5616 Cloudmark Category: Informational August 2009 Streaming Internet Messaging Attachments Abstract This document describes a method for streaming multimedia attachments received by a resource- and/or network-constrained device from an IMAP server. It allows such clients, which often have limits in storage space and bandwidth, to play video and audio email content. The document describes a profile for making use of the URLAUTH- authorized IMAP URLs (RFC 5092), the Network Announcement SIP Media Service (RFC 4240), and the Media Server Control Markup Language (RFC 5022). Status of This Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Cook Informational [Page 1] RFC 5616 Streaming Internet Messaging Attachments August 2009 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1. Overview of Mechanism . . . . . . . . . . . . . . . . . . 3 3.2. Media Server Discovery . . . . . . . . . . . . . . . . . . 5 3.3. Client Use of GENURLAUTH Command . . . . . . . . . . . . . 7 3.4. Client Determination of Media Server Capabilities . . . . 9 3.5. Client Use of the Media Server Announcement Service . . . 10 3.6. Media Negotiation and Transcoding . . . . . . . . . . . . 11 3.7. Client Use of the Media Server MSCML IVR Service . . . . . 13 3.8. Media Server Use of IMAP Server . . . . . . . . . . . . . 17 3.9. Protocol Diagrams . . . . . . . . . . . . . . . . . . . . 18 3.9.1. Announcement Service Protocol Diagram . . . . . . . . 18 3.9.2. IVR Service Protocol Diagram . . . . . . . . . . . . . 19 4. Security Considerations . . . . . . . . . . . . . . . . . . . 21 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 6. Digital Rights Management (DRM) Issues . . . . . . . . . . . . 24 7. Deployment Considerations . . . . . . . . . . . . . . . . . . 24 8. Formal Syntax . . . . . . . . . . . . . . . . . . . . . . . . 25 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 26 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 10.1. Normative References . . . . . . . . . . . . . . . . . . . 26 10.2. Informative References . . . . . . . . . . . . . . . . . . 28 1. Introduction Email clients on resource- and/or network-constrained devices, such as mobile phones, may have difficulties in retrieving and/or storing large attachments received in a message. For example, on a poor network link, the latency required to download the entire attachment before displaying any of it may not be acceptable to the user. Conversely, even on a high-speed network, the device may not have enough storage space to secure the attachment once retrieved. For certain media, such as audio and video, there is a solution: the media can be streamed to the device, using protocols such as RTP [RTP]. Streaming can be initiated and controlled using protocols such as SIP [SIP] and particularly the media server profiles as specified in RFC 4240 [NETANN] or MSCML [MSCML]. Streaming the media to the device addresses both the latency issue, since the client can start playing the media relatively quickly, and the storage issue, since the client does not need to store the media locally. A tradeoff is that the media cannot be viewed/played when the device is offline. Cook Informational [Page 2] RFC 5616 Streaming Internet Messaging Attachments August 2009 Examples of the types of media that would benefit from the ability to stream to the device include: o Voice or video mail messages received as an attachment o Audio clips such as ring tones received as an attachment o Video clips, such as movie trailers, received as an attachment The client may wish to present the user with the ability to use simple "VCR-style" controls such as pause, fast-forward, and rewind. In consideration of this, the document presents two alternatives for streaming media -- a simple mechanism that makes use of the announcement service of RFC 4240, and a more complex mechanism which allows VCR controls, based on MSCML (RFC 5022) [MSCML]. The choice of which mechanism to use is up to the client; for example, it may be based on limitations of the client or the configured media server. This document presents suggestions for determining which of these streaming services are available. 2. Conventions Used in This Document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [KEYWORDS]. In examples, "C:" and "S:" indicate lines sent by the client and server, respectively. If a single "C:" or "S:" label applies to multiple lines, then some of the line breaks between those lines are for editorial clarity only and may not be part of the actual protocol exchange. 3. Mechanism 3.1. Overview of Mechanism The proposed mechanism for streaming media to messaging clients is a profile for making use of several existing mechanisms, namely: o IMAP URLAUTH Extension [URLAUTH] - Providing the ability to generate an IMAP URL that allows access by external entities to specific message parts, e.g., an audio clip. o URLFETCH Binary Extension [URLFETCH_BINARY] - Providing the ability to specify BINARY and BODYPARTSTRUCTURE arguments to the URLFETCH command. Cook Informational [Page 3] RFC 5616 Streaming Internet Messaging Attachments August 2009 o Media Server Announcement Service (RFC 4240) [NETANN] - Providing the ability for a media server to stream media using a reference provided by the media server client in a URL. o Media Server Interactive Voice Response (IVR) Service (RFC 5022) [MSCML] - Providing the ability to stream media as above, but with VCR-style controls. The approach is shown in the following figure: +--------------+ | | | Email Client |^ | | \ +--------------+ \ ^ ^ \ | \ \ (5) | (1), \ \ | (2) \ \ | (3),\ \ | (6) \ \ | \ \ v v v +--------------+ +----------------+ | | (4) | | | IMAP Server |<----->| Media Server | | | | | +--------------+ +----------------+ Figure 1: Proposed Mechanism The proposed mechanism has the following steps: (1) The client determines from MIME headers of a particular message that a particular message part (attachment) should be streamed to the user. Note that no assumptions are made about how/when/if the client contacts the user of the client about this decision. User input may be required in order to initiate the proposed mechanism. (2) The client constructs an IMAP URL referencing the message part, and uses the GENURLAUTH [URLAUTH] command to generate a URLAUTH- authorized IMAP URL. (3) The client connects to a SIP Media Server using the announcement service as specified in RFC 4240 [NETANN], or the IVR service as specified in RFC 5022 [MSCML], and passes the URLAUTH-authorized URL to the media server. Cook Informational [Page 4] RFC 5616 Streaming Internet Messaging Attachments August 2009 (4) The media server connects to the IMAP server specified in the referenced URL, and uses the IMAP URLFETCH [URLAUTH] command to retrieve the message part. (5) The media server streams the retrieved message part to the client using RTP [RTP]. (6) The media server or the client terminates the media streaming, or the streaming ends naturally. The SIP session is terminated by either client or server. It should be noted that the proposed mechanism makes several assumptions about the mobile device, as well as available network services, namely: o The mobile device is provisioned with, or obtains via some dynamic mechanism (see Section 3.2), the location of a media server which supports either RFC 4240 [NETANN] and/or RFC 5022 [MSCML]. o The media server(s) used by the mobile device support the IMAP URL [IMAPURL] scheme for the announcement and/or IVR services. o The IMAP server used by the mobile device supports generating anonymous IMAP URLs using the URLAUTH mechanism as well as the IMAP URLFETCH BINARY [URLFETCH_BINARY] extension. 3.2. Media Server Discovery This section discusses possibilities for the automatic discovery of suitable media servers to perform streaming operations, and provides for such a mechanism using the IMAP METADATA [METADATA] extension. There are two possibilities for clients with regard to determining the hostname and port number information of a suitable media server: 1. No discovery of media servers is required: clients are configured with suitable media server information in an out-of-band manner. 2. Discovery of media servers is required: clients use a discovery mechanism to determine a suitable media server that will be used for streaming multimedia message parts. There are several scenarios where media server discovery would be a requirement for streaming to be successful: o Client is not configured with the address of any media servers. Cook Informational [Page 5] RFC 5616 Streaming Internet Messaging Attachments August 2009 o Client is configured with the address of one or more media servers, but the IMAP server is configured to only accept URLFETCH requests from specific media servers (for security or site policy reasons), and thus streaming would fail due to the media server not being able to retrieve the media from the IMAP server. There is also a scenario where media server discovery would improve the security of the streaming mechanism, by avoiding the use of completely anonymous URLs. For example, the client could discover a media server address that was an authorized user of the IMAP server for streaming purposes, which would allow the client to generate a URL, which was secure in that it could *only* be accessed by an entity that is trusted by the IMAP server to retrieve content. The issue of trust in media servers is discussed more fully in Section 4. This document describes using the IMAP METADATA [METADATA] extension, via the use of a server entry that provides the contact information for suitable media servers for use with the IMAP server. Media Server discovery is optional: clients are free to use pre-configured information about media servers, or to fall back to pre-configured information if they encounter IMAP servers that do not support either the METADATA extension or the proposed entry, or that do not provide a value for the entry. A METADATA entry with the name of "/shared/mediaServers" is used to store the locations of suitable media servers known to the IMAP server. The entry is formatted according to the formalSyntax specified in Section 8. This consists of a tuple of a URI and optional "stream" string, where the URI is surrounded by <> symbols, the URI and "stream" are separated using a colon ":", and tuples are separated using a ";". The "stream" string (c.f. the "stream" access identifier from [ACCESSID]) is used to identify media servers capable of connecting to the IMAP server as users authorized to retrieve URLs constructed using the "stream" access identifier. It indicates that the client MUST create the content URI using the "stream" access identifier. See Section 3.3 for a description of how the client should make use of the access identifier when generating IMAP URLs.) Example values of the /shared/mediaServers METADATA entry (N.B. Any line-wrapping below is for the purpose of clarity): ":stream;;" ";;:stream" Cook Informational [Page 6] RFC 5616 Streaming Internet Messaging Attachments August 2009 It should be noted that the URI specified in the ABNF (in Section 8) is generic, i.e., not restricted to SIP URIs; however, this document only specifies how to make use of SIP URIs. Additionally, the "userinfo" (known as the "service indicator" in RFC 4240 and RFC 4722) component of the URI is optional; if specified, it gives the client additional information about the media server capabilities. For example, a "userinfo" component of "annc" indicates that the media server supports RFC 4240, and "ivr" indicates support for RFC 4722. Section 3.4 further describes how clients should behave if the "userinfo" component is not present. Clients SHOULD parse the value of the /shared/mediaServers entry, and contact a media server using one of the returned URIs. The servers are returned in order of preference as suggested by the server; however, it is left to the client to decide if a different order is more appropriate when selecting the media server(s) to contact, as well as the selection of alternates under failure conditions. Administrators configuring the values of the /shared/mediaServers entry, who do not know the capabilities of the media servers being configured, SHOULD NOT include a "userinfo" component as part of the URI. In that case, the client will determine which service to use as specified in Section 3.4. Note that if a media server supports multiple services, a URI with the appropriate userinfo component SHOULD be configured for each service. Note that even though the media server address can be discovered dynamically, it is assumed that the necessary security arrangements between the client and the media server already exist. For example, the media server could use SIP digest authentication to provide access only to authenticated clients; in this case, it is assumed the username and password have already been set up. Likewise, if the client wants to authenticate the media server using, e.g., TLS and certificates, it is assumed the necessary arrangements (trust anchors and so on) already exist. In some deployments, the clients and media servers may even be willing to rely on the security of the underlying network, and omit authentication between the client and the media server entirely. See Section 4 for more details. 3.3. Client Use of GENURLAUTH Command The decision to make use of streaming services for a message part will usually be predicated on the content type of the message part. Using the capabilities of the IMAP FETCH command, clients determine the MIME [MIME] Content-Type of particular message parts, and based on local policies or heuristics, they decide whether streaming for that message part will be attempted. Cook Informational [Page 7] RFC 5616 Streaming Internet Messaging Attachments August 2009 Once the client has determined that a particular message part requires streaming, the client generates an IMAP URL that refers to the message part according to the method described in RFC 5092 [IMAPURL]. The client then begins the process of generating an URLAUTH URL by appending ";EXPIRE=" and ";URLAUTH=" to the initial URL. The ";EXPIRE=" parameter is optional; however, it SHOULD be used, since the use of anonymous URLAUTH-authorized URLs is a security risk (see Section 4), and it ensures that at some point in the future, permission to access that URL will cease. IMAP server implementors may choose to reject anonymous URLs that are considered insecure (for example, with an EXPIRE date too far in the future), as a matter of local security policy. To prevent this from causing interoperability problems, IMAP servers that implement this profile MUST NOT reject GENURLAUTH commands for anonymous URLs on the basis of the EXPIRE time, if that time is equal to, or less than, 1 hour in the future. The portion of the URLAUTH URL MUST be 'stream' (see [ACCESSID]) if an out-of-band mechanism or the media server discovery mechanism discussed in Section 3.2 specifies that the media server is an authorized user of the IMAP server for the purposes of retrieving content via URLFETCH. Without specific prior knowledge of such a configuration (either through the discovery mechanism described in this document, or by an out-of-band mechanism), the client SHOULD use the 'stream' access identifier, which will cause streaming to fail if the media server is not an authorized user of the IMAP server for the purposes of streaming. However, if the client wishes to take the risk associated with generating a URL that can be used by any media server (see Section 4), it MAY use 'anonymous' as the portion of the URLAUTH URL passed to the GENURLAUTH command. For example, the client may have been pre-configured with the address of media servers in the local administrative domain (thus implying a level of trust in those media servers), without knowing whether those media servers have a pre-existing trust relationship with the IMAP server to be used (which may well be in a different administrative domain). See Section 4 for a full discussion of the security issues. The client uses the URL generated as a parameter to the GENURLAUTH command, using the INTERNAL authorization mechanism. The URL returned by a successful response to this command will then be passed to the media server. If no successful response to the GENURLAUTH command is received, then no further action will be possible with respect to streaming media to the client. Cook Informational [Page 8] RFC 5616 Streaming Internet Messaging Attachments August 2009 Examples: C: a122 UID FETCH 24356 (BODYSTRUCTURE) S: * 26 FETCH (BODYSTRUCTURE (("TEXT" "PLAIN" S: ("CHARSET" "US-ASCII") NIL S: NIL "7BIT" 1152 23)("VIDEO" "MPEG" NIL NIL "BASE64" 655350)) UID 24356) S: a122 OK FETCH completed. C: a123 GENURLAUTH "imap://joe@example.com/INBOX/;uid=24356/; section=1.2;expire=2006-12-19T16:39:57-08:00; urlauth=anonymous" INTERNAL S: * GENURLAUTH "imap://joe@example.com/INBOX/;uid=24356/; section=1.2;expire=2006-12-19T16:39:57-08:00; urlauth=anonymous: internal:238234982398239898a9898998798b987s87920" S: a123 OK GENURLAUTH completed C: a122 UID FETCH 24359 (BODYSTRUCTURE) S: * 27 FETCH (BODYSTRUCTURE (("TEXT" "PLAIN" S: ("CHARSET" "US-ASCII") NIL S: NIL "7BIT" 1152 23)("AUDIO" "G729" NIL NIL "BASE64" 87256)) UID 24359) S: a122 OK FETCH completed. C: a123 GENURLAUTH "imap://joe@example.com/INBOX/;uid=24359/; section=1.3;expire=2006-12-19T16:39:57-08:00; urlauth=stream" INTERNAL S: * GENURLAUTH "imap://joe@example.com/INBOX/;uid=24359/; section=1.3;expire=2006-12-20T18:31:45-08:00; urlauth=stream: internal:098230923409284092384092840293480239482" S: a123 OK GENURLAUTH completed 3.4. Client Determination of Media Server Capabilities Once an authorized IMAP URL has been generated, it is up to the client to pass that URL to a suitable media server that is capable of retrieving the URL via IMAP, and streaming the content to the client using the RTP [RTP] protocol. This section specifies the behavior of clients that have not determined (either statically through configuration, or dynamically through a discovery process as discussed in Section 3.2), the capabilities of the media server with respect to the services (i.e., RFC 4240 or 5022) supported by that media server. Clients that have determined those capabilities should use the mechanisms described in Sections 3.5 or 3.7, as appropriate. Cook Informational [Page 9] RFC 5616 Streaming Internet Messaging Attachments August 2009 If the client supports the MSCML IVR service, then it SHOULD attempt to contact the media server using the MSCML protocol by sending a SIP INVITE that has the service indicator "ivr". Assuming the media server responds to the INVITE without error, the client can carry on using the MSCML IVR service as specified in Section 3.7. If the media server responds with an error indicating that the "ivr" service is not supported, then if the client supports it, the client SHOULD attempt to contact the media server using the announcement service, as described in Section 3.5. The following example shows an example SIP INVITE using the "ivr" service indicator: C: INVITE sip:ivr@ms2.example.com SIP/2.0 < SIP Header fields omitted for reasons of brevity > 3.5. Client Use of the Media Server Announcement Service Assuming the client or media server does not support use of the MSCML protocol, the media server announcement service is used, as described in RFC 4240 [NETANN]. This service allows the client to send a SIP INVITE to a special username ('annc') at the media server (the "announcement" user), supplying the URL obtained as per Section 3.3. The SIP INVITE is constructed as shown in the examples below; note that as per RFC 4240, the play parameter is mandatory and specifies the authorized IMAP URL to be played. Examples of valid SIP INVITE URIs sent to the media server announcement service: C: sip:annc@ms2.example.net; play=imap:%2F%2Fjoe@example.com%2FINBOX%2F%3Buid%3D24356%2F%3Bsection %3D1.2%3Bexpire%3D2006-12-19T16:39:57-08:00%3Burlauth%3Danonymous: internal:238234982398239898a9898998798b987s87920 C: sip:annc@ms1.example.net; play=imap:%2F%2Ffred@ example.com%2FINBOX%2F%3Buid%3D24359%2F%3Bsection %3D1.3%3Bexpire%3D2006-12-20T18:31:45-08:00%3Burlauth%3Dstream: internal:098230923409284092384092840293480239482 Notice that many of the characters that are used as parameters of the IMAP URI are escaped, as otherwise they would change the meaning of the enclosing SIP URI, by being regarded as SIP URI parameters instead of IMAP URL parameters. Cook Informational [Page 10] RFC 5616 Streaming Internet Messaging Attachments August 2009 If the client receives a 200 (OK) response, the media server has successfully retrieved the content from the IMAP server and the negotiated RTP stream will shortly begin. There are many possible response codes; however, a response code of 404 received from the media server indicates that the content could not be found or could not be retrieved for some reason. For example, the media server may not support the use of IMAP URLs. At this point, there are several options to the client, such as using alternate media servers, or giving up in attempting to stream the required message part. 3.6. Media Negotiation and Transcoding This document uses standards and protocols from two traditionally separate application areas: Mobile Email (primarily IMAP) and Internet Telephony/Streaming (e.g., SIP/RTP). Since the document primarily addresses enhancing the capabilities of mobile email, it is felt worthwhile to give some examples of simple SIP/SDP exchanges and to discuss capabilities such as media negotiation (using SDP) and media transcoding. In the below example, the client contacts the media server using the SIP INVITE command to contact the announcement service (see Section 3.5), advertising support for a range of audio and video codecs (using SDP [SDP]), and in response the media server advertises only a set of audio codecs. This process is identical for the IVR service, except that the IVR service does not use the SIP Request-URI to indicate the content to be played; instead, this is carried in a subsequent SIP INFO request. The client and server now know from the SDP session description advertised by both client and server that communication must be using the subset of audio codecs supported by both client and server (in the example SDP session description below, it is clear that the server does not support any video codecs). The media server may perform transcoding (i.e., converting between codecs) on the media received from the IMAP server in order to satisfy the codecs supported by the client. For example, the media server may downgrade the video retrieved from the IMAP server to the audio component only. For clients using the announcement service, the media server MUST return an error to the INVITE if it cannot find a common codec between the client, server and media, or it cannot transcode to a suitable codec. Similarly, for clients using the MSCML IVR service, the media server MUST return a suitable error response to the request. Cook Informational [Page 11] RFC 5616 Streaming Internet Messaging Attachments August 2009 Example SIP INVITE and SDP Media Negotiation C: INVITE sip:annc@ms2.example.com; play=imap:%2F%2Fjoe@example.com%2FINBOX%2F%3Buid%3D24356%2F%3B section%3D1.2%3Bexpire%3D2006-12-19T16:39:57-08:00%3Burlauth%3D anonymous:internal:238234982398239898a9898998798b987s87920 SIP/2.0 C: From: UserA C: To: NetAnn C: Call-ID: 8204589102@example.com C: CSeq: 1 INVITE C: Contact: C: Content-Type: application/sdp C: Content-Length: 481 C: C: v=0 C: o=UserA 2890844526 2890844526 IN IP4 192.0.2.40 C: s=Session SDP C: c=IN IP4 192.0.2.40 C: t=3034423619 0 C: m=audio 9224 RTP/AVP 0 8 3 98 101 C: a=alt:1 1 : 01BB7F04 6CBC7A28 192.0.2.40 9224 C: a=fmtp:101 0-15 C: a=rtpmap:98 ilbc/8000 C: a=rtpmap:101 telephone-event/8000 C: a=recvonly C: m=video 9226 RTP/AVP 105 34 120 C: a=alt:1 1 : 01BCADB3 95DFFD80 192.0.2.40 9226 C: a=fmtp:105 profile=3;level=20 C: a=fmtp:34 CIF=2 QCIF=2 MAXBR=5120 C: a=rtpmap:105 h263-2000/90000 C: a=rtpmap:120 h263/90000 C: a=recvonly S: SIP/2.0 200 OK S: From: UserA S: To: NetAnn S: Call-ID: 8204589102@example.com S: CSeq: 1 INVITE S: Contact: S: Content-Type: application/sdp S: Content-Length: 317 S: S: v=0 S: o=NetAnn 2890844527 2890844527 IN IP4 192.0.2.41 S: s=Session SDP S: c=IN IP4 192.0.2.41 S: t=3034423619 0 S: m=audio 17684 RTP/AVP 0 8 3 18 98 101 Cook Informational [Page 12] RFC 5616 Streaming Internet Messaging Attachments August 2009 S: a=rtpmap:0 PCMU/8000 S: a=rtpmap:8 PCMA/8000 S: a=rtpmap:3 GSM/8000 S: a=rtpmap:18 G729/8000 S: a=fmtp:18 annexb=no S: a=rtpmap:98 iLBC/8000 S: a=rtpmap:101 telephone-event/8000 S: a=fmtp:101 0-16 C: ACK sip:netann@192.0.2.41 SIP/2.0 C: From: UserA C: To: NetAnn C: Call-ID: 8204589102@example.com C: CSeq: 1 ACK C: Content-Length: 0 3.7. Client Use of the Media Server MSCML IVR Service Once the client has determined that the media server supports the IVR service, it is up to the client to generate a suitable MSCML request to initiate streaming of the required media. When using the IVR service, the initial SIP invite is used only to establish that the media server supports the MSCML IVR service, and to negotiate suitable media codecs. Once the initial SIP INVITE and response to that INVITE have been completed successfully, the client must generate a SIP INFO request with MSCML in the body of the request to initiate streaming. The request is used, as this allows the use of dual tone multi-frequency (DTMF) digits to control playback of the media, such as fast-forward or rewind. Since the request is used purely for its VCR-like capabilities, there is no need for the media server to perform DTMF collection. Therefore, the playcollect attributes "firstdigittimer", "interdigittimer", and "extradigittimer" SHOULD all be set to "0ms", which will have the effect of causing digit collection to cease immediately after the media has finished playing. The "ffkey" and "rwkey" attributes of are used to control fast-forward and rewind behavior, with the "skipinterval" attribute being used to control the 'speed' of these actions. The tag is used to specify the media to be played, and SHOULD have a single