Microsoft
K. Evans
J. Klein
Tandem Computers
July 1998
Transaction Internet Protocol
Version 3.0
Status of this Memo
-
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
-
Copyright © The Internet Society (1998). All Rights Reserved.
Abstract
-
In many applications where different nodes cooperate on some work, there is a need to guarantee that the work happens atomically. That is, each node must reach the same conclusion as to whether the work is to be completed, even in the face of failures. This document proposes a simple, easily-implemented protocol for achieving this end.
Table of Contents
-
1. Introduction 2 2. Example Usage 3 3. Transactions 4 4. Connections 4 5. Transaction Identifiers 5 6. Pushing vs. Pulling Transactions 5 7. TIP Transaction Manager Identification & Connection Establishment 6 8. TIP Uniform Resource Locators 8 9. States of a Connection 10 10. Protocol Versioning 12 11. Commands and Responses 12 12. Command Pipelining 13 13. TIP Commands 13 14. Error Handling 20 15. Connection Failure and Recovery 20 16. Security Considerations 22 17. References 25 18. Authors' Addresses 26 19. Comments 26 Appendix A. The TIP Multiplexing Protocol Version 2.0. 27 Fully Copyright Statement 31
1. Introduction
-
The standard method for achieving atomic commitment is the two-phase commit protocol; see [1] for an introduction to atomic commitment and two-phase commit protocols.
Numerous two-phase commit protocols have been implemented over the years. However, none of them has become widely used in the Internet, due mainly to their complexity. Most of that complexity comes from the fact that the two-phase commit protocol is bundled together with a specific program-to-program communication protocol, and that protocol lives on top of a very large infrastructure.
This memo proposes a very simple two-phase commit protocol. It achieves its simplicity by specifying only how different nodes agree on the outcome of a transaction; it allows (even requires) that the subject matter on which the nodes are agreeing be communicated via other protocols. By doing so, we avoid all of the issues related to application communication semantics and data representation (to name just a few). Independent of the application communication protocol a transaction manager may use the Transport Layer Security protocol [3] to authenticate other transaction managers and encrypt messages.
It is envisioned that this protocol will be used mainly for a transaction manager on one Internet node to communicate with a transaction manager on another node. While it is possible to use this protocol for application programs and/or resource managers to speak to transaction managers, this communication is usually intra-node, and most transaction managers already have more-than-adequate interfaces for the task.
While we do not expect this protocol to replace existing ones, we do expect that it will be relatively easy for many existing heterogeneous transaction managers to implement this protocol for communication with each other.
Further supplemental information regarding the TIP protocol can be found in [5].
2. Example Usage
-
Today the electronic shopping basket is a common metaphor at many electronic store-fronts. Customers browse through an electronic catalog, select goods and place them into an electronic shopping basket. HTTP servers [2] provide various means ranging from URL encoding to context cookies to keep track of client context (e.g. the shopping basket of a customer) and resume it on subsequent customer requests.
Once a customer has finished shopping they may decide to commit their selection and place the associated orders. Most orders may have no relationship with each other except being executed as part of the same shopping transaction; others may be dependent on each other (for example, if made as part of a special offering). Irrespective of these details a customer will expect that all orders have been successfully placed upon receipt of a positive acknowledgment. Today's electronic store-fronts must implement their own special protocols to coordinate such placement of all orders. This programming is especially complex when orders are placed through multiple electronic store-fronts. This complexity limits the potential utility of internet applications, and constrains growth. The protocol described in this document intends to provide a standard for Internet servers to achieve agreement on a unit of shared work (e.g. placement of orders in an electronic shopping basket). The server (e.g. a CGI program) placing the orders may want to start a transaction calling its local transaction manager, and ask other servers participating in the work to join the transaction. The server placing the orders passes a reference to the transaction as user data on HTTP requests to the other servers. The other servers call their transaction managers to start a local transaction and ask them to join the remote transaction using the protocol defined in this document. Once all orders have been placed, execution of the two-phase-commit protocol is delegated to the involved transaction managers. If the transaction commits, all orders have been successfully placed and the customer gets a positive acknowledgment. If the transaction aborts no orders will be placed and the customer will be informed of the problem.
Transaction support greatly simplifies programming of these applications as exception handling and failure recovery are delegated to a special component. End users are also not left having to deal with the consequences of only partial success. While this example shows how the protocol can be used by HTTP servers, applications may use the protocol when accessing a remote database (e.g. via ODBC), or invoking remote services using other already existing protocols (e.g.
RPC). The protocol makes it easy for applications in a heterogeneous network to participate in the same transaction, even if using different communication protocols.
3. Transactions
-
"Transaction" is the term given to the programming model whereby computational work performed has atomic semantics. That is, either all work completes successfully and changes are made permanent (the transaction commits), or if any work is unsuccessful, changes are undone (the transaction aborts). The work comprising a transaction (unit of work), is defined by the application.
4. Connections
-
The Transaction Internet Protocol (TIP) requires a reliable ordered stream transport with low connection setup costs. In an Internet (IP) environment, TIP operates over TCP, optionally using TLS to provide a secured and authenticated connection, and optionally using a protocol to multiplex light-weight connections over the same TCP or TLS connection.
Transaction managers that share transactions establish a TCP (and optionally a TLS) connection. The protocol uses a different connection for each simultaneous transaction shared betwween two transaction managers. After a transaction has ended, the connection can be reused for a different transaction.
Optionally, instead of associating a TCP or TLS connection with only a single transaction, two transaction managers may agree on a protocol to multiplex light-weight connections over the same TCP or TLS connection, and associate each simultaneous transaction with a separate light-weight connection. Using light-weight connections reduces latency and resource consumption associated with executing simultaneous transactions. Similar techniques as described here are widely used by existing transaction processing systems. See Appendix A for an example of one such protocol.
Note that although the TIP protocol itself is described only in terms of TCP and TLS, there is nothing to preclude the use of TIP with other transport protocols. However, it is up to the implementor to ensure the chosen transport provides equivalent semantics to TCP, and to map the TIP protocol appropriately.
In this document the terms "connection" or "TCP connection" can refer to a TIP TCP connection, a TIP TLS connection, or a TIP multiplexing connection (over either TCP or TLS). It makes no difference which, the behavior is the same in each case. Where there are differences in behavior between the connection types, these are stated explicitly.
5. Transaction Identifiers
-
Unfortunately, there is no single globally-accepted standard for the format of a transaction identifier; there are various standard and proprietary formats. Allowed formats for a TIP transaction identifier are described below in the section "TIP Uniform Resource Locators". A transaction manager may map its internal transaction identifiers into this TIP format in any manner it sees fit. Furthermore, each party in a superior/subordinate relationship gets to assign its own identifier to the transaction; these identifiers are exchanged when the relationship is first established. Thus, a transaction manager gets to use its own format of transaction identifier internally, but it must remember a foreign transaction identifier for each superior/subordinate relationship in which it is involved.
6. Pushing vs. Pulling Transactions
-
Suppose that some program on node "A" has created a transaction, and wants some program on node "B" to do some work as part of the transaction. There are two classical ways that he does this, referred to as the "push" model and the "pull" model.
In the "push" model, the program on A first asks his transaction manager to export the transaction to node B. A's transaction manager sends a message to B's TM asking it to instantiate the transaction as a subordinate of A, and return its name for the transaction. The program on A then sends a message to its counterpart on B on the order of "Do some work, and make it part of the transaction that your transaction manager already knows of by the name ...". Because A's TM knows that it sent the transaction to B's TM, A's TM knows to involve B's TM in the two-phase commit process.
In the "pull" model, the program on A merely sends a message to B on the order of "Do some work, and make it part of the transaction that my TM knows by the name ...". The program on B asks its TM to enlist in the transaction. At that time, B's TM will "pull" the transaction over from A. As a result of this pull, A's TM knows to involve B's TM in the two-phase commit process.
The protocol described here supports both the "push" and "pull" models.
7. TIP Transaction Manager Identification and Connection Establishment
-
In order for TIP transaction managers to connect they must be able to identify and locate each other. The information necessary to do this is described by the TIP transaction manager address.
[This specification does not prescribe how TIP transaction managers initially obtain the transaction manager address (which will probably be via some implementation-specific configuration mechanism).]
TIP transaction manager addresses take the form:
<hostport><path>
The <hostport> component comprises:
<host>[:<port>]
where <host> is either a <dns name> or an <ip address>; and <port> is a decimal number specifying the port at which the transaction manager (or proxy) is listening for requests to establish TIP connections. If the port number is omitted, the standard TIP port number (3372) is used.
A <dns name> is a standard name, acceptable to the domain name service. It must be sufficiently qualified to be useful to the receiver of the command.
An <ip address> is an IP address, in the usual form: four decimal numbers separated by period characters.
The <hostport> component defines the scope (locale) of the <path> component.
The <path> component of the transaction manager address contains data identifying the specific TIP transaction manager, at the location defined by <hostport>.
The <path> component takes the form:
"/" [path_segments]
-
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pcharpchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" unreserved = ASCII character octets with values in the range
(inclusive): 48-57, 65-90, 97-122 | "$" | "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" | "," escaped = "%" hex hex hex = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
The <path> component may consist of a sequence of path segments separated by a single slash "/" character. Within a path segment, the characters "/", ";", "=", and "?" are reserved. Each path segment may include a sequence of parameters, indicated by the semicolon ";" character. The parameters are not significant to the parsing of relative references.
[It is intended that the form of the transaction manager address follow the proposed scheme for Uniform Resource Identifiers (URI) [8].]
The TIP transaction manager address therefore provides to the connection initiator (the primary) the endpoint identifier to be used for the TCP connection (<hostport>), and to the connection receiver (the secondary) the path to be used to locate the specific TIP transaction manager (<path>). This is all the information required for the connection between the primary and secondary TIP transaction managers to be established.
After a connection has been established, the primary party issues an IDENTIFY command. This command includes as parameters two transaction manager addresses: the primary transaction manager address, and the secondary transaction manager address.
The primary transaction manager address identifies the TIP transaction manager that initiated the connection. This information is required in certain cases after connection failures, when one of the parties of the connection must re-establish a new connection to the other party in order to complete the two-phase-commit protocol. If the primary party needs to re-establish the connection, the job is easy: a connection is established in the same way as was the original connection. However, if the secondary party needs to re-establish the connection, it must be known how to contact the initiator of the original connection. This information is supplied to the secondary via the primary transaction manager address on the IDENTIFY command. If a primary transaction manager address is not supplied, the primary party must not perform any action which would require a connection to be re-established (e.g. to perform recovery actions).
The secondary transaction manager address identifies the receiving TIP transaction manager. In the case of TIP communication via intermediate proxy servers, this URL may be used by the proxy servers to correctly identify and connect to the required TIP transaction manager.
-
8. TIP Uniform Resource Locators
-
Transactions and transaction managers are resources associated with the TIP protocol. Transaction managers and transactions are located using the transaction manager address scheme. Once a connection has been established, TIP commands may be sent to operate on transactions associated with the respective transaction managers.
Applications which want to pull a transaction from a remote node must supply a reference to the remote transaction which allows the local transaction manager (i.e. the transaction manager pulling the transaction) to connect to the remote transaction manager and identify the particular transaction. Applications which want to push a transaction to a remote node must supply a reference to the remote transaction manager (i.e. the transaction manager to which the transaction is to be pushed), which allows the local transaction manager to locate the remote transaction manager. The TIP protocol defines a URL scheme [4] which allows applications and transaction managers to exchange references to transaction managers and transactions.
A TIP URL takes the form:
-
tip://<transaction manager address>?<transaction string>
where <transaction manager address> identifies the TIP transaction manager (as defined in Section 7 above); and <transaction string> specifies a transaction identifier, which may take one of two forms (standard or non-standard):
i. "urn:" <NID> ":" <NSS>
-
A standard transaction identifier, conforming to the proposed Internet Standard for Uniform Resource Names (URNs), as specified by RFC2141; where <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. The Namespace ID determines the syntactic interpretation of the Namespace Specific String. The Namespace Specific String is a sequence of characters representing a transaction identifier (as defined by <NID>). The rules for the contents of these fields are specified by [6] (valid characters, encoding, etc.).
This format of <transaction string> may be used to express global transaction identifiers in terms of standard representations. Examples for <NID> might be <iso> or <xopen>. e.g.
tip://123.123.123.123/?urn:xopen:xid
-
Note that Namespace Ids require registration. See [7] for details on how to do this.
ii. <transaction identifier>
-
A sequence of printable ASCII characters (octets with values in the range 32 through 126 inclusive (excluding ":") representing a transaction identifier. In this non-standard case, it is the combination of <transaction manager address> and <transaction identifier> which ensures global uniqueness. e.g.
tip://123.123.123.123/?transid1
-
To create a non-standard TIP URL from a transaction identifier, first replace any reserved characters in the transaction identifier with their equivalent escape sequences, then insert the appropriate transaction manager address. If the transaction identifier is one that you created, insert your own transaction manager address. If the transaction identifier is one that you received on a TIP connection that you initiated, use the secondary transaction manager address that was sent in the IDENTIFY command. If the transaction identifier is one that you received on a TIP connection that you did not initiate, use the primary transaction manager address that was received in the IDENTIFY command.
TIP URLs must be guaranteed globally unique for all time. This uniqueness constraint ensures TIP URLs are never duplicated, thereby preventing possible non-deterministic behaviour. How uniqueness is achieved is implementation specific. For example, the Universally Unique Identifier (UUID, also known as a Globally Unique Identifier, or GUID (see [9])) could be used as part of the <transaction string>. Note also that some standard transaction identifiers may define their own rules for ensuring global uniqueness (e.g. OSI CCR atomic action identifiers).
Except as otherwise described above, the TIP URL scheme follows the rules for reserved characters as defined in [4], and uses escape sequences as defined in [4] Section 5.
Note that the TIP protocol itself does not use the TIP URL scheme (it does use the transaction manager address scheme). The TIP URL scheme is proposed as a standard way to pass transaction identification information through other protocols. e.g. between cooperating application processes. The TIP URL may then be used to communicate to the local transaction manager the information necessary to associate the application with a particular TIP transaction. e.g. to PULL the transaction from a remote transaction manager. It is anticipated that each TIP implementation will provide some set of APIs for this purpose ([5] includes examples of such APIs).
-
9. States of a Connection
-
At any instant, only one party on a connection is allowed to send commands, while the other party is only allowed to respond to commands that he receives. Throughout this document, the party that is allowed to send commands is called "primary"; the other party is called "secondary". Initially, the party that initiated the connection is primary; however, a few commands cause the roles to switch. A connection returns to its original polarity whenever the Idle state is reached.
When multiplexing is being used, these rules apply independently to each "virtual" connection, regardless of the polarity of the underlying connection (which will be in the Multiplexing state).
Note that commands may be sent "out of band" by the secondary via the use of pipelining. This does not affect the polarity of the connection (i.e. the roles of primary and secondary do not switch). See section 12 for details.
In the normal case, TIP connections should only be closed by the primary, when in Initial state. It is generally undesirable for a connection to be closed by the secondary, although this may be necessary in certain error cases.
At any instant, a connection is in one of the following states. From the point of view of the secondary party, the state changes when he sends a reply; from the point of view of the primary party, the state changes when he receives a reply.
Initial: The initial connection starts out in the Initial state.
-
Upon entry into this state, the party that initiated the connection becomes primary, and the other party becomes secondary. There is no transaction associated with the connection in this state. From this state, the primary can send an IDENTIFY or a TLS command.
Idle: In this state, the primary and the secondary have agreed on a
-
protocol version, and the primary supplied an identifier to the secondary party to reconnect after a failure. There is no transaction associated with the connection in this state. Upon entry to this state, the party that initiated the connection becomes primary, and the other party becomes secondary. From this state, the primary can send any of the following commands: BEGIN, MULTIPLEX, PUSH, PULL, QUERY and RECONNECT.
Begun: In this state, a connection is associated with an active
-
transaction, which can only be completed by a one-phase protocol. A BEGUN response to a BEGIN command places a connection into this state. Failure of a connection in Begun state implies that the transaction will be aborted. From this state, the primary can send an ABORT, or COMMIT command.
Enlisted: In this state, the connection is associated with an active
-
transaction, which can be completed by a one-phase or, two-phase protocol. A PUSHED response to a PUSH command, or a PULLED response to a PULL command, places the connection into this state. Failure of the connection in Enlisted state implies that the transaction will be aborted. From this state, the primary can send an ABORT, COMMIT, or PREPARE command.
Prepared: In this state, a connection is associated with a
-
transaction that has been prepared. A PREPARED response to a PREPARE command, or a RECONNECTED response to a RECONNECT command places a connection into this state. Unlike other states, failure of a connection in this state does not cause the transaction to automatically abort. From this state, the primary can send an ABORT, or COMMIT command.
Multiplexing: In this state, the connection is being used by a
-
multiplexing protocol, which provides its own set of connections. In this state, no TIP commands are possible on the connection. (Of course, TIP commands are possible on the connections supplied by the multiplexing protocol.) The connection can never leave this state.
Tls: In this state, the connection is being used by the TLS
-
protocol, which provides its own secured connection. In this state, no TIP commands are possible on the connection. (Of course, TIP commands are possible on the connection supplied by the TLS protocol.) The connection can never leave this state.
Error: In this state, a protocol error has occurred, and the
-
connection is no longer useful. The connection can never leave this state.
10. Protocol Versioning
-
This document describes version 3 of the protocol. In order to accommodate future versions, the primary party sends a message indicating the lowest and the highest version number it understands. The secondary responds with the highest version number it understands.
After such an exchange, communication can occur using the smaller of the highest version numbers (i.e., the highest version number that both understand). This exchange is mandatory and occurs using the IDENTIFY command (and IDENTIFIED response).
If the highest version supported by one party is considered obsolete and no longer supported by the other party, no useful communication can occur. In this case, the newer party should merely drop the connection.
11. Commands and Responses
-
All commands and responses consist of one line of ASCII text, using only octets with values in the range 32 through 126 inclusive, followed by either a CR (an octet with value 13) or an LR (an octet with value 10). Each line can be split up into one or more "words", where successive words are separated by one or more space octets (value 32).
Arbitrary numbers of spaces at the beginning and/or end of each line are allowed, and ignored.
Lines that are empty, or consist entirely of spaces are ignored. (One implication of this is that you can terminate lines with both a CR and an LF if desired; the LF will be treated as terminating an empty line, and ignored.)
In all cases, the first word of each line indicates the type of command or response; all defined commands and responses consist of upper-case letters only.
For some commands and responses, subsequent words convey parameters for the command or response; each command and response takes a fixed number of parameters.
All words on a command or response line after (and including) the first undefined word are totally ignored. These can be used to pass human-readable information for debugging or other purposes.
12. Command Pipelining
-
In order to reduce communication latency and improve efficiency, it is possible for multiple TIP "lines" (commands or responses) to be pipelined (concatenated) together and sent as a single message. Lines may also be sent "ahead" (by the secondary, for later procesing by the primary). Examples are an ABORT command immediately followed by a BEGIN command, or a COMMITTED response immediately followed by a PULL command.
The sending of pipelined lines is an implementation option. Likewise which lines are pipelined together. Generally, it must be certain that the pipelined line will be valid for the state of the connection at the time it is processed by the receiver. It is the responsibility of the sender to determine this.
All implementations must support the receipt of pipelined lines - the rules for processing of which are described by the following paragraph:
-
When the connection state is such that a line should be read (either command or response), then that line (when received) is processed. No more lines are read from the connection until processing again reaches such a state. If a line is received on a connection when it is not the turn of the other party to send, that line is _not_ rejected. Instead, the line is held and processed when the connection state again requires it. The receiving party must process lines and issue responses in the order of lines received. If a line causes an error the connection enters the Error state, and all subsequent lines on the connection are discarded.
-
13. TIP Commands
-
Commands pertain either to connections or transactions. Commands which pertain to connections are: IDENTIFY, MULTIPLEX and TLS. Commands which pertain to transactions are: ABORT, BEGIN, COMMIT, PREPARE, PULL, PUSH, QUERY, and RECONNECT.
Following is a list of all valid commands, and all possible responses to each:
ABORT
-
This command is valid in the Begun, Enlisted, and Prepared states. It informs the secondary that the current transaction of the connection will abort. Possible responses are:
ABORTED
The transaction has aborted; the connection enters Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
BEGIN
-
This command is valid only in the Idle state. It asks the secondary to create a new transaction and associate it with the connection. The newly created transaction will be completed with a one-phase protocol. Possible responses are:
BEGUN <transaction identifier>
A new transaction has been successfully begun, and that transaction is now the current transaction of the connection. The connection enters Begun state.
NOTBEGUN
A new transaction could not be begun; the connection remains in Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
COMMIT
-
This command is valid in the Begun, Enlisted or Prepared states. In the Begun or Enlisted state, it asks the secondary to attempt to commit the transaction; in the Prepared state, it informs the secondary that the transaction has committed. Note that in the Enlisted state this command represents a one-phase protocol, and should only be done when the sender has 1) no local recoverable resources involved in the transaction, and 2) only one subordinate (the sender will not be involved in any transaction recovery process). Possible responses are:
ABORTED
This response is possible only from the Begun and Enlisted states. It indicates that some party has vetoed the commitment of the transaction, so it has been aborted instead of committing. The connection enters the Idle state.
COMMITTED
This response indicates that the transaction has been committed, and that the primary no longer has any responsibilities to the secondary with respect to the transaction. The connection enters the Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
ERROR
-
This command is valid in any state; it informs the secondary that a previous response was not recognized or was badly formed. A secondary should not respond to this command. The connection enters Error state.
IDENTIFY <lowest protocol version> <highest protocol version> <primary transaction manager address> | "-" <secondary transaction manager address>
-
This command is valid only in the Initial state. The primary party informs the secondary party of: 1) the lowest and highest protocol version supported (all versions between the lowest and highest must be supported; 2) optionally, an identifier for the primary party at which the secondary party can re-establish a connection if ever needed (the primary transaction manager address); and 3) an identifier which may be used by intermediate proxy servers to connect to the required TIP transaction manager (the secondary transaction manager address). If a primary transaction manager address is not supplied, the secondary party will respond with ABORTED or READONLY to any PREPARE commands. Possible responses are:
IDENTIFIED <protocol version>
The secondary party has been successfully contacted and has saved the primary transaction manager address. The response contains the highest protocol version supported by the secondary party. All future communication is assumed to take place using the smaller of the protocol versions in the IDENTIFY command and the IDENTIFIED response. The connection enters the Idle state.
NEEDTLS
The secondary party is only willing to communicate over TLS secured connections. The connection enters the Tls state, and all subsequent communication is as defined by the TLS protocol. This protocol will begin with the first octet after the line terminator of the IDENTIFY command (for data sent by the primary party), and the first byte after the line terminator of the NEEDTLS response (for data sent by the secondary party). This implies that an implementation must not send both a CR and a LF octet after either the IDENTIFY command or the NEEDTLS response, lest the LF octet be mistaken for the first byte of the TLS protocol. The connection provided by the TLS protocol starts out in the Initial state. After TLS has been negotiated, the primary party must resend the IDENTIFY command. If the primary party cannot support (or refuses to use) the TLS protocol, it closes the connection.
ERROR
The command was issued in the wrong state, or was malformed. This response also occurs if the secondary party does not support any version of the protocol in the range supported by the primary party. The connection enters the Error state. The primary party should close the connection.
MULTIPLEX <protocol-identifier>
-
This command is only valid in the Idle state. The command seeks agreement to use the connection for a multiplexing protocol that will supply a large number of connections on the existing connection. The primary suggests a particular multiplexing protocol. The secondary party can either accept or reject use of this protocol.
At the present, the only defined protocol identifier is "TMP2.0", which refers to the TIP Multiplexing Protocol, version 2.0. See Appendix A for details of this protocol. Other protocol identifiers may be defined in the future.
If the MULTIPLEX command is accepted, the specified multiplexing protocol will totally control the underlying connection. This protocol will begin with the first octet after the line terminator of the MULTIPLEX command (for data sent by the initiator), and the first byte after the line terminator of the MULTIPLEXING response (for data received by the initiator). This implies that an implementation must not send both a CR and a LF octet after either the MULTIPLEX command or the MULTIPLEXING response, lest the LF octet be mistaken for the first byte of the multiplexing protocol.
Note that when using TMP V2.0, a single TIP command (TMP application message) must be wholly contained within a single TMP packet (the TMP PUSH flag is not used by TIP). Possible responses to the MULTIPLEX command are:
MULTIPLEXING
The secondary party agrees to use the specified multiplexing protocol. The connection enters the Multiplexing state, and all subsequent communication is as defined by that protocol. All connections created by the multiplexing protocol start out in the Idle state.
CANTMULTIPLEX
The secondary party cannot support (or refuses to use) the specified multiplexing protocol. The connection remains in the Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
PREPARE
-
This command is valid only in the Enlisted state; it requests the secondary to prepare the transaction for commitment (phase one of two-phase commit). Possible responses are:
PREPARED
The subordinate has prepared the transaction; the connection enters PREPARED state.
ABORTED
The subordinate has vetoed committing the transaction. The connection enters the Idle state. After this response, the superior has no responsibilities to the subordinate with respect to the transaction.
READONLY
The subordinate no longer cares whether the transaction commits or aborts. The connection enters the Idle state. After this response, the superior has no responsibilities to the subordinate with respect to the transaction.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
PULL <superior's transaction identifier> <subordinate's transaction identifier>
-
This command is only valid in Idle state. This command seeks to establish a superior/subordinate relationship in a transaction, with the primary party of the connection as the subordinate (i.e., he is pulling a transaction from the secondary party). Note that the entire value of <transaction string> (as defined in the section "TIP Uniform Resource Locators") must be specified as the transaction identifier. Possible responses are:
PULLED
The relationship has been established. Upon receipt of this response, the specified transaction becomes the current transaction of the connection, and the connection enters Enlisted state. Additionally, the roles of primary and secondary become reversed. (That is, the superior becomes the primary for the connection.)
NOTPULLED
The relationship has not been established (possibly, because the secondary party no longer has the requested transaction). The connection remains in Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
PUSH <superior's transaction identifier>
-
This command is valid only in the Idle state. It seeks to establish a superior/subordinate relationship in a transaction with the primary as the superior. Note that the entire value of <transaction string> (as defined in the section "TIP Uniform Resource Locators") must be specified as the transaction identifier. Possible responses are:
PUSHED <subordinate's transaction identifier>
The relationship has been established, and the identifier by which the subordinate knows the transaction is returned. The transaction becomes the current transaction for the connection, and the connection enters Enlisted state.
ALREADYPUSHED <subordinate's transaction identifier>
The relationship has been established, and the identifier by which the subordinate knows the transaction is returned. However, the subordinate already knows about the transaction, and is expecting the two-phase commit protocol to arrive via a different connection. In this case, the connection remains in the Idle state.
NOTPUSHED
The relationship could not be established. The connection remains in the Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters Error state.
QUERY <superior's transaction identifier>
-
This command is valid only in the Idle state. A subordinate uses this command to determine whether a specific transaction still exists at the superior. Possible responses are:
QUERIEDEXISTS
The transaction still exists. The connection remains in the Idle state.
QUERIEDNOTFOUND
The transaction no longer exists. The connection remains in the Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters Error state.
RECONNECT <subordinate's transaction identifier>
-
This command is valid only in the Idle state. A superior uses the command to re-establish a connection for a transaction, when the previous connection was lost during Prepared state. Possible responses are:
RECONNECTED
The subordinate accepts the reconnection. The connection enters Prepared state.
NOTRECONNECTED
The subordinate no longer knows about the transaction. The connection remains in Idle state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters Error state.
TLS
-
This command is valid only in the Initial state. A primary uses this command to attempt to establish a secured connection using TLS.
If the TLS command is accepted, the TLS protocol will totally control the underlying connection. This protocol will begin with the first octet after the line terminator of the TLS command (for data sent by the primary), and the first byte after the line terminator of the TLSING response (for data received by the primary). This implies that an implementation must not send both a CR and a LF octet after either the TLS command or the TLSING response, lest the LF octet be mistaken for the first byte of the TLS protocol.
Possible responses to the TLS command are:
TLSING
The secondary party agrees to use the TLS protocol [3]. The connection enters the Tls state, and all subsequent communication is as defined by the TLS protocol. The connection provided by the TLS protocol starts out in the Initial state.
CANTTLS
The secondary party cannot support (or refuses to use) the TLS protocol. The connection remains in the Initial state.
ERROR
The command was issued in the wrong state, or was malformed. The connection enters the Error state.
-
14. Error Handling
-
If either party receives a line that it cannot understand it closes the connection. If either party (either a command or a response), receives an ERROR indication or an ERROR response on a connection the connection enters the Error state and no further communication is possible on that connection. An implementation may decide to close the connection. Closing of the connection is treated by the other party as a communication failure.
Receipt of an ERROR indication or an ERROR response indicates that the other party believes that you have not properly implemented the protocol.
15. Connection Failure and Recovery
-
A connection failure may be caused by a communication failure, or by any party closing the connection. It is assumed TIP implementations will use some private mechanism to detect TIP connection failure (e.g. socket keepalive, or a timeout scheme).
Depending on the state of a connection, transaction managers will need to take various actions when a connection fails.
If the connection fails in Initial or Idle state, the connection does not refer to a transaction. No action is necessary.
If the connection fails in the Multiplexing state, all connections provided by the multiplexing protocol are assumed to have failed. Each of them will be treated independently.
If the connection fails in Begun or Enlisted state and COMMIT has been sent, then transaction completion has been delegated to the subordinate (the superior is not involved); the outcome of the transaction is unknown by the superior (it is known at the subordinate). The superior uses application-specific means to determine the outcome of the transaction (note that transaction integrity is not compromised in this case since the superior has no recoverable resources involved in the transaction). If the connection fails in Begun or Enlisted state and COMMIT has not been sent, the transaction will be aborted.
If the connection fails in Prepared state, then the appropriate action is different for the superior and subordinate in the transaction.
If the superior determines that the transaction commits, then it must eventually establish a new connection to the subordinate, and send a RECONNECT command for the transaction. If it receives a NOTRECONNECTED response, it need do nothing else. However, if it receives a RECONNECTED response, it must send a COMMIT request and receive a COMMITTED response.
If the superior determines that the transaction aborts, it is allowed to (but not required to) establish a new connection and send a RECONNECT command for the transaction. If it receives a RECONNECTED response, it should send an ABORT command.
The above definition allows the superior to reestablish the connection before it knows the outcome of the transaction, if it finds that convenient. Having succeeded in a RECONNECT command, the connection is back in Prepared state, and the superior can send a COMMIT or ABORT command as appropriate when it knows the transaction outcome.
Note that it is possible for a RECONNECT command to be received by the subordinate before it is aware that the previous connection has failed. In this case the subordinate treats the RECONNECT command as a failure indication and cleans-up any resources associated with the connection, and associates the transaction state with the new connection.
If a subordinate notices a connection failure in Prepared state, then it should periodically attempt to create a new connection to the superior and send a QUERY command for the transaction. It should continue doing this until one of the following two events occurs:
- It receives a QUERIEDNOTFOUND response from the superior. In this case, the subordinate should abort the transaction.
- The superior, on some connection that it initiated, sends a RECONNECT command for the transaction to the subordinate. In this case, the subordinate can expect to learn the outcome of the transaction on this new connection. If this new connection should fail before the subordinate learns the outcome of the transaction, it should again start sending QUERY commands.
Note that if a TIP system receives either a QUERY or a RECONNECT command, and for some reason is unable to satisfy the request (e.g. the necessary recovery information is not currently available), then the connection should be dropped.
16. Security Considerations
-
This section is meant to inform application developers, transaction manager developers, and users of the security implications of TIP as described by this document. The discussion does not include definitive solutions to the issues described, though it does make some suggestions for reducing security risks.
As with all two phase-commit protocols, any security mechanisms applied to the application communication protocol are liable to be subverted unless corresponding mechanisms are applied to the commitment protocol. For example, any authentication between the parties using the application protocol must be supported by security of the TIP exchanges to at least the same level of certainty.
16.1. TLS, Mutual Authentication and Authorization
-
TLS provides optional client-side authentication, optional server- side authentication, and optional packet encryption.
A TIP implementation may refuse to provide service unless TLS is being used. It may refuse to provide service if packet encryption is not being used. It may refuse to provide service unless the remote party has been authenticated (via TLS).
A TIP implementation should be willing to be authenticated itself (via TLS). This is true regardless of whether the implementation is acting as a client or a server.
Once a remote party has been authenticated, a TIP transaction manager may use that remote party's identity to decide what operations to allow.
Whether TLS is to be used on a connection, and if so, how TLS is to be used, and what operations are to subsequently be allowed, is determined by the security policies of the connecting TIP transaction managers towards each other. How these security policies are defined, and how a TIP transaction manager learns of them is totally private to the implementation and beyond the scope of this document.
16.2. PULL-Based Denial-of-Service Attack
-
Assume that a malicious user knows the identity of a transaction that is currently active in some transaction manager. If the malefactor opens a TIP connection to the transaction manager, sends a PULL command, then closes the connection, he can cause that transaction to be aborted. This results in a denial of service to the legitimate owner of the transaction.
An implementation may avoid this attack by refusing PULL commands unless TLS is being used, the remote party has been authenticated, and the remote party is trusted.
16.3. PUSH-Based Denial-of-Service Attack
-
When the connection between two transaction managers is closed while a transaction is in the Prepared state, each transaction manager needs to remember information about the transaction until a connection can be re-established.
If a malicious user exploits this fact to repeatedly create transactions, get them into Prepared state and drop the connection, he may cause a transaction manager to suffer resource exhaustion, thus denying service to all legitimate users of that transaction manager.
An implementation may avoid this attack by refusing PUSH commands unless TLS is being used, the remote party has been authenticated, and the remote party is trusted.
16.4. Transaction Corruption Attack
-
If a subordinate transaction manager has lost its connection for a particular prepared transaction, a malicious user can initiate a TIP connection to the transaction manager, and send it a RECONNECT command followed by either a COMMIT or an ABORT command for the transaction. The malicious user could thus cause part of a transaction to be committed when it should have been aborted, or vice versa.
An implementation may avoid this attack by recording the authenticated identity of its superior in a transaction, and by refusing RECONNECT commands unless TLS is being used and the authenticated identity of the remote party is the same as the identity of the original superior.
16.5. Packet-Sniffing Attacks
-
If a malicious user can intercept traffic on a TIP connection, he may be able to deduce information useful in planning other attacks. For example, if comment fields include the product name and version number of a transaction manager, a malicious user might be able to use this information to determine what security bugs exist in the implementation.
An implementation may avoid this attack by always using TLS to provide session encryption, and by not putting any personalizing information on the TLS/TLSING command/response pair.
16.6. Man-in-the-Middle Attack
-
If a malicious user can intercept and alter traffic on a TIP connection, he can wreak havoc in a number of ways. For example, he could replace a COMMIT command with an ABORT command.
An implementation may avoid this attack by always using TLS to provide session encryption and authentication of the remote party.
17. References
-
[1] Gray, J. and A. Reuter (1993), Transaction Processing: Concepts and Techniques. San Francisco, CA: Morgan Kaufmann Publishers. (ISBN 1-55860-190-2). [2] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, January 1997. [3] Dierks, T., et. al., "The TLS Protocol Version 1.0", Work in Progress. [4] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994. [5] Evans, K., Klein, J., and J. Lyon, "Transaction Internet Protocol - Requirements and Supplemental Information", RFC 2372, July 1998. [6] Moats, R., "URN Syntax", RFC 2141, May 1997. [7] Faltstrom, P., et. al., "Namespace Identifier Requirements for URN Services", Work in Progress. [8] Berners-Lee, T., et. at., "Uniform Resource Identifiers (URI): Generic Syntax and Semantics", Work in Progress. [9] Leach, P., and R. Salz, "UUIDs and GUIDs", Work in Progress.
18. Authors' Addresses
-
Jim Lyon Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399, USA Phone: +1 (206) 936 0867 Fax: +1 (206) 936 7329 EMail: JimLyon@Microsoft.Com
Keith Evans
Tandem Computers, Inc.
5425 Stevens Creek Blvd
Santa Clara, CA 95051-7200, USAPhone: +1 (408) 285 5314 Fax: +1 (408) 285 5245 EMail: Keith.Evans@Tandem.Com
Johannes Klein
Tandem Computers Inc.
10555 Ridgeview Court
Cupertino, CA 95014-0789, USAPhone: +1 (408) 285 0453 Fax: +1 (408) 285 9818 EMail: Johannes.Klein@Tandem.Com
19. Comments
-
Please send comments on this document to the authors at <JimLyon@Microsoft.Com>, <Keith.Evans@Tandem.Com>, <Johannes.Klein@Tandem.Com>, or to the TIP mailing list at <Tip@Lists.Tandem.Com>. You can subscribe to the TIP mailing list by sending mail to <Listserv@Lists.Tandem.Com> with the line "subscribe tip <full name>" somewhere in the body of the message.
Appendix A. The TIP Multiplexing Protocol Version 2.0.
-
This appendix describes version 2.0 of the TIP Multiplexing Protocol (TMP). TMP is intended solely for use with the TIP protocol, and forms part of the TIP protocol specification (although its implementation is optional). TMP V2.0 is the only multiplexing protocol supported by TIP V3.0.
Abstract
-
TMP provides a simple mechanism for creating multiple lightweight connections over a single TCP connection. Several such lightweight connections can be active simultaneously. TMP provides a byte oriented service, but allows message boundaries to be marked.
A.1. Introduction
-
There are several protocols in widespread use on the Internet which create a single TCP connection for each transaction. Unfortunately, because these transactions are short lived, the cost of setting up and tearing down these TCP connections becomes significant, both in terms of resources used and in the delays associated with TCP's congestion control mechanisms.
The TIP Multiplexing Protocol (TMP) is a simple protocol running on top of TCP that can be used to create multiple lightweight connections over a single transport connection. TMP therefore provides for more efficient use of TCP connections. Data from several different TMP connections can be interleaved, and both message boundaries and end of stream markers can be provided.
Because TMP runs on top of a reliable byte ordered transport service it can avoid most of the extra work TCP must go through in order to ensure reliability. For example, TMP connections do not need to be confirmed, so there is no need to wait for handshaking to complete before data can be sent.
Note: TMP is not intended as a generalized multiplexing protocol. If you are designing a different protocol that needs multiplexing, TMP may or may not be appropriate. Protocols with large messages can exceed the buffering capabilities of the receiver, and under certain conditions this can cause deadlock. TMP when used with TIP does not suffer from this problem since TIP is a request-response protocol, and all messages are short.
A.2. Protocol Model
-
The basic protocol model is that of multiple lightweight connections operating over a reliable stream of bytes. The party which initiated the connection is referred to as the primary, and the party which accepted the connection is referred to as the secondary.
Connections may be unidirectional or bi-directional; each end of a bi-directional connection may be closed separately. Connections may be closed normally, or reset to indicate an abortive release. Aborting a connection closes both data streams.
Once a connection has been opened, applications can send messages over it, and signal the end of application level messages. Application messages are encapsulated in TMP packets and transferred over the byte stream. A single TIP command (TMP application message) must be wholly contained within a single TMP packet.
A.3. TMP Packet Format
-
A TMP packet consists of a 64 bit header followed by zero or more octets of data. The header contains three fields; a flag byte, the connection identifier, and the packet length. Both integers, the connection identifier and the packet length must be sent in network byte order.
FLAGS +--------+--------+--------+--------+ |SFPR0000| Connection ID | +--------+--------+--------+--------+ | | Length | +--------+--------+--------+--------+
A.3.1. Flag Details
-
+-------+-----------+-----------------------------------------+ | Name | Mask | Description | +-------+-----------+ ----------------------------------------+ | SYN | 1xxx|0000 | Open a new connection | | FIN | x1xx|0000 | Close an existing connection | | PUSH | xx1x|0000 | Mark application level message boundary | | RESET | xxx1|0000 | Abort the connection | +-------+-----------+-----------------------------------------+
A.4. Connection Identifiers
-
Each TMP connection is identified by a 24 bit integer. TMP connections created by the party which initiated the underlying TCP connection must have even identifiers; those created by the other party must have odd identifiers.
A.5. TMP Connection States
-
TMP connections can exist in several different states; Closed, OpenWrite, OpenSynRead, OpenSynReset, OpenReadWrite, CloseWrite, and CloseRead. A connection can change its state in response to receiving a packet with the SYN, FIN, or RESET bits set, or in response to an API call by the application. The available API calls are open, close, and abort.
The meaning of most states is obvious (e.g. OpenWrite means that a connection has been opened for writing). The meaning of the states OpenSynRead and OpenResetRead need more explanation.
In the OpenSynRead state a primary opened and immediately closed the output data stream of a connection, and is now waiting for a SYN response from the secondary to open the input data stream for reading.
In the OpenResetRead state a primary opened and immediately aborted a connection, and is now waiting for a SYN response from the secondary to finally close the connection.
A.6. Event Priorities and State Transitions
-
The state table shown below describes the actions and state transitions that occur in response to a given event. The events accepted by each state are listed in priority order with highest priority first. If multiple events are present in a message, those events matching the list are processed. If multiple events match, the event with the highest priority is accepted and processed first. Any remaining events are processed in the resultant successor state.
For example, if a TMP connection at the secondary is in the Closed state, and the secondary receives a packet containing a SYN event, a FIN event and an input data event (i.e. DATA-IN), the secondary first accepts the SYN event (because it is the only match in Closed state). The secondary accepts the connection, sends a SYN event and enters the ReadWrite state. The SYN event is removed from the list of pending events. The remaining events are FIN and DATA-IN. In the ReadWrite state the secondary reads the input data (i.e. the DATA-IN event is processed first because it has higher priority than the FIN event). Once the data has been read and the DATA-IN event has been removed from the list of pending events, the FIN event is processed and the secondary enters the CloseWrite state.
If the secondary receives a packet containing a SYN event, and is for some reason unable to accept the connection (e.g. insufficient resources), it should reject the request by sending a SYN event followed by a RESET event. Note that both events can be sent as part of the same TMP packet.
If either party receives a TMP packet that it does not understand, or an event in an incorrect state, it closes the TCP connection.
+==============+=========+==========+==============+ | Entry State | Event | Action | Exit State | +==============+=========+==========+==============+ | Closed | SYN | SYN | ReadWrite | | | OPEN | SYN | OpenWrite | +--------------+---------+----------+--------------+ | OpenWrite | SYN | Accept | ReadWrite | | | WRITE | DATA-OUT | OpenWrite | | | CLOSE | FIN | OpenSynRead | | | ABORT | RESET | OpenSynReset | +--------------+---------+----------+--------------+ | OpenSynRead | SYN | Accept | CloseRead | +--------------+---------+----------+--------------+ | OpenSynReset | SYN | Accept | Closed | +--------------+---------+----------+--------------+ | ReadWrite | DATA-IN | Accept | ReadWrite | | | FIN | Accept | CloseWrite | | | RESET | Accept | Closed | | | WRITE | DATA-OUT | ReadWrite | | | CLOSE | FIN | CloseRead | | | ABORT | RESET | Closed | +--------------+---------+----------+--------------+ | CloseWrite | RESET | Accept | Closed | | | WRITE | DATA-OUT | CloseWrite | | | CLOSE | FIN | Closed | | | ABORT | RESET | Closed | +--------------+---------+----------+--------------+ | CloseRead | DATA-IN | Accept | CloseRead | | | FIN | Accept | Closed | | | RESET | Accept | Closed | | | ABORT | RESET | Closed | +--------------+---------+----------+--------------+
-
TMP Event Priorities and State Transitions
-
Full Copyright Statement
-
Copyright © The Internet Society (1998). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.