Network Working Group
Request for Comments: 3351
Category: Informational

N. Charlton
Millpark
M. Gasson
Koru Solutions
G. Gybels
M. Spanner
RNID
A. van Wijk
Ericsson
August 2002

User Requirements for the Session Initiation Protocol (SIP)

in Support of Deaf, Hard of Hearing

and Speech-impaired Individuals

Status of this Memo

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

Copyright Notice

Abstract

This document presents a set of Session Initiation Protocol (SIP) user requirements that support communications for deaf, hard of hearing and speech-impaired individuals. These user requirements address the current difficulties of deaf, hard of hearing and speech-impaired individuals in using communications facilities, while acknowledging the multi-functional potential of SIP-based communications.

A number of issues related to these user requirements are further raised in this document.

Also included are some real world scenarios and some technical requirements to show the robustness of these requirements on a concept-level.

   1. Terminology and Conventions Used in this Document................2
   2. Introduction.....................................................3
   3. Purpose and Scope................................................4
   4. Background.......................................................4
   5. Deaf, Hard of Hearing and Speech-impaired Requirements for SIP...5
      5.1 Connection without Difficulty................................5
      5.2 User Profile.................................................6
      5.3 Intelligent Gateways.........................................6
      5.4 Inclusive Design.............................................7
      5.5 Resource Management..........................................7
      5.6 Confidentiality and Security.................................7
   6. Some Real World Scenarios........................................8
      6.1 Transcoding Service..........................................8
      6.2 Media Service Provider.......................................9
      6.3 Sign Language Interface......................................9
      6.4 Synthetic Lip-reading Support for Voice Calls...............10
      6.5 Voice-Activated Menu Systems................................10
      6.6 Conference Call.............................................11
   7. Some Suggestions for Service Providers and User Agent
      Manufacturers...................................................13
   8. Acknowledgements................................................14
      Security Considerations.........................................14
      Normative References............................................15
      Informational References........................................15
      Author's Addresses..............................................15
      Full Copyright Statement........................................17

1. Terminology and Conventions Used in this Document

In this document, the key words "MUST", "MUST NOT","REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC2119[1] and indicate requirement levels for compliant SIP implementations.

For the purposes of this document, the following terms are considered to have these meanings:

   Abilities:  A person's capacity for communicating which could include
   a hearing or speech impairment or not.  The terms Abilities and
   Preferences apply to both caller and call-recipient.
   
   Preferences:  A person's choice of communication mode.  This could
   include any combination of media streams, e.g., text, audio, video.

The terms Abilities and Preferences apply to both caller and call-recipient.

   Relay Service:  A third-party or intermediary that enables
   communications between deaf, hard of hearing and speech-impaired
   people, and people without hearing or speech-impairment.  Relay
   Services form a subset of the activities of Transcoding Services (see
   definition).
   
   Transcoding Services:  A human or automated third party acting as an
   intermediary in any session between two other User Agents (being a
   User Agent itself), and transcoding one stream into another (e.g.,
   voice to text or vice versa).
   
   Textphone:  Sometimes called a TTY (teletypewriter), TDD
   (telecommunications device for the deaf) or a minicom, a textphone
   enables a deaf, hard of hearing or speech-impaired person to place a
   call to a telephone or another textphone.  Some textphones use the
   V.18[3] protocol as a standard for communication with other textphone
   communication protocols world-wide.
   
   User:  A deaf, hard of hearing or speech-impaired individual.  A user
   is otherwise referred to as a person or individual, and users are
   referred to as people.
   
   Note:  For the purposes of this document, a deaf, hard of hearing, or
   speech-impaired person is an individual who chooses to use SIP
   because it can minimize or eliminate constraints in using common
   communication devices.  As SIP promises a total communication
   solution for any kind of person, regardless of ability and
   preference, there is no attempt to specifically define deaf, hard of
   hearing or speech-impaired in this document.

2. Introduction

The background for this document is the recent development of SIP[2] and SIP-based communications, and a growing awareness of deaf, hard of hearing and speech-impaired issues in the technical community.

The SIP capacity to simplify setting up, managing and tearing down communication sessions between all kinds of User Agents has specific implications for deaf, hard of hearing and speech-impaired individuals.

As SIP enables multiple sessions with translation between multiple types of media, these requirements aim to provide the standard for recognizing and enabling these interactions, and for a communications model that includes any and all types of SIP-networking abilities and preferences.

3. Purpose and Scope

The scope of this document is firstly to present a current set of user requirements for deaf, hard of hearing and speech-impaired individuals through SIP-enabled communications. These are then followed by some real world scenarios in SIP-communications that could be used in a test environment, and some concepts of how these requirements can be developed by service providers and User Agent manufacturers.

These recommendations make explicit the needs of a currently often disadvantaged user-group and attempt to match them with the capacity of SIP. It is not the intention here to prioritize the needs of deaf, hard of hearing and speech-impaired people in a way that would penalize other individuals.

These requirements aim to encourage developers and manufacturers world-wide to consider the specific needs of deaf, hard of hearing and speech-impaired individuals. This document presents a world-vision where deafness, hard of hearing or speech impairment are no longer a barrier to communication.

4. Background

Deaf, hard of hearing and speech-impaired people are currently often unable to use commonly available communication devices. Although this is documented[4], this does not mean that developers or manufacturers are always aware of this. Communication devices for deaf, hard of hearing and speech-impaired people are currently often primitive in design, expensive, and non-compatible with progressively designed, cheaper and more adaptable communication devices for other individuals. For example, many models of textphone are unable to communicate with other models.

Additionally, non-technical human communications, for example sign languages or lip-reading, are non-standard around the world.

There are intermediary or third-party relay services (e.g. transcoding services) that facilitate communications, uni- or bi- directional, for deaf, hard of hearing and speech-impaired people. Currently relay services are mostly operator-assisted (manual), although methods of partial automation are being implemented in some areas. These services enable full access to modern facilities and conveniences for deaf, hard of hearing and speech-impaired people. Although these services are somewhat limited, their value is undeniable as compared to their previous complete unavailability.

Yet communication methods in recent decades have proliferated: email, mobile phones, video streaming, etc. These methods are an advance in the development of data transfer technologies between devices.

Developers and advocates of SIP agree that it is a protocol that not only anticipates the growth in real-time communications between convergent networks, but also fulfills the potential of the Internet as a communications and information forum. Further, they agree that these developments allow a standard of communication that can be applied throughout all networking communities, regardless of abilities and preferences.

5. Deaf, Hard of Hearing and Speech-impaired Requirements for SIP

   Introduction

The user requirements in this section are provided for the benefit of service providers, User Agent manufacturers and any other interested parties in the development of products and services for deaf, hard of hearing and speech-impaired people.

The user requirements are as follows:

5.1 Connection without Difficulty

This requirement states:

Whatever the preferences and abilities of the user and User Agent, there SHOULD be no difficulty in setting up SIP sessions. These sessions could include multiple proxies, call routing decisions, transcoding services, e.g., the relay service Typetalk[5] or other media processing, and could include multiple simultaneous or alternative media streams.

This means that any User Agent in the conversation (including transcoding services) MUST be able to add or remove a media stream from the call without having to tear it down and re-establish it.

5.2 User Profile

This requirement states:

Deaf, hard of hearing and speech-impaired user abilities and preferences (i.e., user profile) MUST be communicable by SIP, and these abilities and preferences MUST determine the handling of the session.

The User Profile for a deaf, hard of hearing or speech-impaired person might include details about:

How media streams are received and transmitted (text, voice, video, or any combination, uni- or bi-directional).

Redirecting specific media streams through a transcoding service (e.g., the relay service Typetalk)

Roaming (e.g., a deaf person accessing their User Profile from a web-interface at an Internet cafe)

   - Anonymity: i.e., not revealing that a deaf person is calling, even
     through a transcoding service (e.g., some relay services inform the
     call-recipient that there is an incoming text call without saying
     that a deaf person is calling).

Part of this requirement is to ensure that deaf, hard of hearing and speech-impaired people can keep their preferences and abilities confidential from others, to avoid possible discrimination or prejudice, while still being able to establish a SIP session.

5.3 Intelligent Gateways

This requirement states:

SIP SHOULD support a class of User Agents to perform as gateways for legacy systems designed for deaf, hard of hearing and speech-impaired people.

For example, an individual could have a SIP User Agent acting as a gateway to a PSTN legacy textphone.

5.4 Inclusive Design

This requirement states:

Where applicable, design concepts for communications (devices, applications, etc.) MUST include the abilities and preferences of deaf, hard of hearing and speech-impaired people.

Transcoding services and User Agents MUST be able to connect with each other regardless of the provider or manufacturer. This means that new User Agents MUST be able to support legacy protocols through appropriate gateways.

5.5 Resource Management

This requirement states:

User Agents SHOULD be able to identify the content of a media stream in order to obtain such information as the cost of the media stream, if a transcoding service can support it, etc.

User Agents SHOULD be able to choose among transcoding services and similar services based on their capabilities (e.g., whether a transcoding service carries a particular media stream), and any policy constraints they impose (e.g., charging for use). It SHOULD be possible for User Agents to discover the availability of alternative media streams and to choose from them.

5.6 Confidentiality and Security

This requirement states:

All third-party or intermediaries (transcoding services) employed in a session for deaf, hard of hearing and speech-impaired people MUST offer a confidentiality policy. All information exchanged in this type of session SHOULD be secure, that is, erased before confidentiality is breached, unless otherwise required.

This means that transcoding services (e.g., interpretation, translation) MUST publish their confidentiality and security policies.

6. Some Real World Scenarios

These scenarios are intended to show some of the various types of media streams that would be initiated, managed, directed, and terminated in a SIP-enabled network, and shows how some resources might be managed between SIP-enabled networks, transcoding services and service providers.

To illustrate the communications dynamic of these kinds of scenarios, each one specifically mentions the kind of media streams transmitted, and whether User Agents and Transcoding Services are involved.

6.1 Transcoding Service

In this scenario, a hearing person calls the household of a deaf person and a hearing person.

A voice conversation is initiated between the hearing participants:

      ( Person A) <-----Voice ---> ( Person B)

During the conversation, the hearing person asks to talk with the deaf person, while keeping the voice connection open so that voice to voice communications can continue if required.

A Relay Service is invited into the conversation.

The Relay Service transcodes the hearing person's words into text.

Text from the hearing person's voice appears on the display of the deaf person's User Agent.

The deaf person types a response.

The Relay Service receives the text and reads it to the hearing person:

      (         ) <------------------Voice----------------> (         )
      (Person A ) -----Voice---> ( Voice To Text  ) -Text-> (Person B )
      (         ) <----Voice---- (Service Provider) <-Text- (         )

The hearing person asks to talk with the hearing person in the deaf person's household.

The Relay Service withdraws from the call.

6.2 Media Service Provider

In this scenario, a deaf person wishes to receive the content of a radio program through a text stream transcoded from the program's audio stream.

The deaf person attempts to establish a connection to the radio broadcast, with User Agent preferences set to receiving audio stream as text.

The User Agent of the deaf person queries the radio station User Agent on whether a text stream is available, other than the audio stream.

However, the radio station has no text stream available for a deaf listener, and responds in the negative.

As no text stream is available, the deaf person's User Agent requests a voice-to-text transcoding service (e.g., a real-time captioning service) to come into the conversation space.

The transcoding service User Agent identifies the audio stream as a radio broadcast. However, the policy of the transcoding service is that it does not accept radio broadcasts because it would overload their resources far too quickly.

In this case, the connection fails.

Alternatively, continuing from 2 above:

The radio station does provide text with their audio streams.

The deaf person receives a text stream of the radio program.

   Note:  To support deaf, hard of hearing and speech-impaired people,
   service providers are encouraged to provide text with audio streams.

6.3 Sign Language Interface

In this scenario, a deaf person enables a signing avatar (e.g., ViSiCAST[6]) by setting up a User Agent to receive audio streams as XML data that will operate an avatar for sign-language. For outgoing communications, the deaf person types text that is transcoded into an audio stream for the other conversation participant.

For example:

(         )-Voice->(Voice To Avatar Commands) ----XMLData-->(        )
( hearing )                                                 (deaf    )
( Person A)<-Voice-( Text To Voice  ) <--------Text-------- (Person B)
(         )        (Service Provider)                       (        )

6.4 Synthetic Lip-speaking Support for Voice Calls

In order to receive voice calls, a hard of hearing person uses lip- speaking avatar software (e.g., Synface[7]) on a PC. The lip- speaking software processes voice (audio) stream data and displays a synthetic animated face that a hard of hearing person may be able to lip-read. During a conversation, the hard of hearing person uses the lip-speaking software as support for understanding the audio stream.

For example:

      (         ) <------------------Voice-------------->(         )
      ( hearing )                    ( PC with     )     ( hard of )
      ( Person A) -------Voice-----> ( lip-speaking)---->( hearing )
      (         )                    ( software    )     ( Person B)

6.5 Voice Activated Menu Systems

In this scenario, a deaf person wishing to book cinema tickets with a credit card, uses a textphone to place the call. The cinema employs a voice-activated menu system for film titles and showing times.

The deaf person places a call to the cinema with a textphone:

(Textphone) <-----Text ---> (Voice-activated System)

The cinema's voice-activated menu requests an auditory response to continue.

A Relay Service is invited into the conversation.

The Relay Service transcodes the prompts of the voice-activated menu into text.

Text from the voice-activated menu appears on the display of the deaf person's textphone.

The deaf person types a response.

The Relay Service receives the text and reads it to the voice- activated system:

   (           )         (Relay Service   )          (               )
   ( deaf      ) -Text-> (Provider        ) -Voice-> (Voice-Activated)
   ( Person A  ) <-Text- (Text To Voice   ) <-Voice- (System         )

The transaction is finalized with a confirmed booking time.

The Relay Service withdraws from the call.

6.6 Conference Call

A conference call is scheduled between five people:

- Person A listens and types text (hearing, no speech)
- Person B recognizes sign language and signs back (deaf, no speech) - Person C reads text and speaks (deaf or hearing impaired) - Person D listens and speaks
- Person E recognizes sign language and reads text and signs

A conference call server calls the five people and based on their preferences sets up the different transcoding services required. Assuming English is the base language for the call, the following intermediate transcoding services are invoked:

- A transcoding service (English speech to English text)
- An English text to sign language service
- A sign language to English text service
- An English text to English speech service

   Note:  In order to translate from English speech to sign language, a
   chain of intermediate transcoding services was used (transcoding and
   English text to sign language) because there was no speech-to-sign
   language available for direct translation.  Accordingly, the same
   applied for the translation from sign language to English speech.

(Person A) ----- Text ----> (  Text-to-SL  ) --- Video ----> (Person B)
           ---------------------- Text --------------------> (Person C)
           ----- Text ----> (Text-to-Speech) --- Voice ----> (Person D)
           ---------------------- Text --------------------> (Person E)
           ----- Text ----> (  Text-to-SL  ) --- Video ----> (Person E)
(Person B) -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person A)
           ---- Video ----> (  SL-to-Text  ) ---- Text ----> (Person C)
           -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person D)
           --------------------- Video --------------------> (Person E)
           ---- Video ----> (  SL-to-Text  ) ---- Text ----> (Person E)
(Person C) --------------------- Voice --------------------> (Person A)
           Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person B)
           --------------------- Voice --------------------> (Person D)
           ---- Voice ----> (Speech-to-Text) ---- Text ----> (Person E)
           Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person E)
(Person D) --------------------- Voice --------------------> (Person A)
           Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person B)
           ---- Voice ----> (Speech-to-Text) ---- Text ----> (Person C)
           ---- Voice ----> (Speech-to-Text) ---- Text ----> (Person E)
           Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person E)
(Person E) -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person A)
           --------------------- Video --------------------> (person B)
           ---- Video ----> (  SL-to-Text  ) ---- Text ----> (Person C)
           -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person D)

Remarks: - Some services might be shared by users and/or other

services.

Person E uses two parallel streams (SL and English Text). The User Agent might perform time synchronisation when displaying the streams. However, this would require synchronisation information to be present on the streams.

The session protocols might support optional buffering of media streams, so that users and/or intermediate services could go back to previous content or to invoke a transcoding service for content they just missed.

Hearing impaired users might still receive audio as well, which they will use to drive some visual indicators so that they can better see where, for instance, the pauses are in the conversation.

7. Some Suggestions for Service Providers and User Agent Manufacturers

This section is included to encourage service providers and user agent manufacturers in developing products and services that can be used by as wide a range of individuals as possible, including deaf, hard of hearing and speech-impaired people.

Service providers and User Agent manufacturers can offer to a deaf, hard of hearing and speech-impaired person the possibility of being able to prevent their specific abilities and preferences from being made public in any transaction.

If a User Agent performs auditory signalling, for example a pager, it could also provide another signalling method; visual (e.g., a flashing light) or tactile (e.g., vibration).

Service providers who allow the user to store specific abilities and preferences or settings (i.e., a user profile) might consider storing these settings in a central repository, accessible no matter what the location of the user and regardless of the User Agent used at that time or location.

If there are several transcoding services available, the User Agent can be set to select the most economical/highest quality service.

The service provider can show the cost per minute and any minimum charge of a transcoding service call before a session starts, allowing the user a choice of engaging in the service or not.

Service providers are encouraged to offer an alternative stream to an audio stream, for example, text or data streams that operate avatars, etc.

Service providers are encouraged to provide a text alternative to voice-activated menus, e.g., answering and voice mail systems.

Manufacturers of voice-activated software are encouraged to provide an alternative visual format for software prompts, menus, messages, and status information.

Manufacturers of mobile phones are encouraged to design equipment that avoids electro-magnetic interference with hearing aids.

All services for interpreting, transliterating, or facilitating communications for deaf, hard of hearing and speech-impaired people are required to:

Keep information exchanged during the transaction strictly confidential

Enable information exchange literally and simply, without deviating and compromising the content

Facilitate communication without bias, prejudice or opinion

Match skill-sets to the requirements of the users of the service

Behave in a professional and appropriate manner

Be fair in pricing of services

Strive to improve the skill-sets used for their services.

Conference call services might consider ways to allow users who employ transcoding services (which usually introduce a delay) to have real-time information sufficient to be able to identify gaps in the conversation so they could inject comments, as well as ways to raise their hand, vote and carry out other activities where timing of their response relative to the real-time conversation is important.

8. Acknowledgements

The authors would like to thank the following individuals for their contributions to this document:

David R. Oran, Cisco
Mark Watson, Nortel Networks
Brian Grover, RNID
Anthony Rabin, RNID
Michael Hammer, Cisco
Henry Sinnreich, Worldcom
Rohan Mahy, Cisco
Julian Branston, Cedalion Hosting Services
Judy Harkins, Gallaudet University, Washington, D.C. Cary Barbin, Gallaudet University, Washington, D.C. Gregg Vanderheiden, Trace R&D Center University of Wisconsin-Madison Gottfried Zimmerman, Trace R&D Center University of Wisconsin-Madison

Security Considerations

This document presents some privacy and security considerations. They are treated in Section 5.6 Confidentiality and Security.

Normative References

   [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
       Levels", BCP 14, RFC 2119, March 1997.
   
   [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
       Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
       Session Initiation Protocol", RFC 3261, June 2002.

Informational References

   [3] International Telecommunication Union (ITU), "Operational and
       interworking requirements for DCEs operating in the text
       telephone mode". ITU-T Recommendation V.18, November 2000.
   
   [4] Moore, Matthew, et al. "For Hearing People Only: Answers to Some
       of the Most Commonly Asked Questions About the Deaf Community,
       Its Culture, and the Deaf Reality". MSM Productions Ltd., 2nd
       Edition, September 1993.

[5] http://www.typetalk.org.

[6] http://www.visicast.co.uk.

[7] http://www.speech.kth.se/teleface.

Authors' Addresses

   Nathan Charlton
   Millpark Limited
   52 Coborn Road
   London E3 2DG
   Tel: +44-7050 803628
   Fax: +44-7050 803628
   EMail: nathan@millpark.com
   
   Mick Gasson
   Koru Solutions
   30 Howland Way
   London SE16 6HN
   Tel: +44-20 7237 3488
   Fax: +44-20 7237 3488
   EMail: michael.gasson@korusolutions.com
   Guido Gybels
   RNID
   19-23 Featherstone Street
   London EC1Y 8SL
   Tel: +44-20 7296 8000
   Textphone: +44-20 7296 8001
   Fax: +44-20 7296 8199
   EMail: Guido.Gybels@rnid.org.uk
   
   Mike Spanner
   RNID
   19-23 Featherstone Street
   London EC1Y 8SL
   Tel: +44-20 7296 8000
   Textphone: +44-20 7296 8001
   Fax: +44-20 7296 8199
   EMail: mike.spanner@rnid.org.uk
   
   Arnoud van Wijk
   Ericsson EuroLab Netherlands BV
   P.O. Box 8
   5120 AA Rijen
   The Netherlands
   Fax: +31-161-247569
   EMail: Arnoud.van.Wijk@eln.ericsson.se

Comments can be sent to the SIPPING mailing list.

Full Copyright Statement

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

Funding for the RFC Editor function is currently provided by the Internet Society.

User Requirements for the Session Initiation Protocol (SIP)

in Support of Deaf, Hard of Hearing

and Speech-impaired Individuals

Status of this Memo

Copyright Notice

Abstract

Table of Contents

1. Terminology and Conventions Used in this Document

2. Introduction

3. Purpose and Scope

4. Background

5. Deaf, Hard of Hearing and Speech-impaired Requirements for SIP

5.1 Connection without Difficulty

5.2 User Profile

5.3 Intelligent Gateways

5.4 Inclusive Design

5.5 Resource Management

5.6 Confidentiality and Security

6. Some Real World Scenarios

6.1 Transcoding Service

6.2 Media Service Provider

6.3 Sign Language Interface

For example:

6.4 Synthetic Lip-speaking Support for Voice Calls

6.5 Voice Activated Menu Systems

(Textphone) <-----Text ---> (Voice-activated System)

6.6 Conference Call

7. Some Suggestions for Service Providers and User Agent Manufacturers

8. Acknowledgements

Security Considerations

Normative References

Informational References

Authors' Addresses

Full Copyright Statement

Acknowledgement