Behave Working Group P. Srisuresh Internet-Draft Caymas Systems, Inc. Expires: September 28, 2005 S. Sivakumar K. Biswas Cisco Systems, Inc. B. Ford M.I.T. March 28, 2005 NAT Behavioral Requirements for TCP Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 1, 2005. Copyright Notice Copyright (C) The Internet Society (2005). Abstract NAT devices are available from a number of vendors and are in use by several residential and enterprise users. Yet, there is much Srisuresh, et al. [Page 1] Internet-Draft NAT Behavioral Requirements for TCP March 2005 variation in how the NAT devices work. Application developers, network administrators and users of NAT devices seek some level of uniformity and predictability in how various of the NAT devices operate. The objective of this document is to specify the operational and behavioral requirements on the NAT devices while processing TCP packets. A NAT device that conforms to the requirements listed in the document will bring predictability in how NATs operate with regard to TCP packet processing. A NAT device is said to be IETF behave compliant when it complies with the requirements outlined in this document and two other companion documents ([BEH-GEN], [BEH-UDP]) which outline the requirements for processing IP, ICMP & UDP. Table of Contents 1. Introduction & Scope . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. TCP requirements discussion . . . . . . . . . . . . . . . . . 3 3.1 Address Binding and/or TCP Port Binding . . . . . . . . . 3 3.2 Timeouts for TCP Sessions . . . . . . . . . . . . . . . . 4 3.3 SYN packets during Connecting and Closing phases . . . . . 5 3.4 Denial of Service (DoS) attacks . . . . . . . . . . . . . 5 3.5 NAT initiated TCP keep-alives . . . . . . . . . . . . . . 6 3.6 NAT initiated RST packets . . . . . . . . . . . . . . . 7 4. Hints to implementers . . . . . . . . . . . . . . . . . . . . 7 4.1 Light weight TCP state machine is a common practice . . . 7 4.2 TCP segment processing in NATs supporting ALGs . . . . . . 8 4.3 Adjusting Sequence and Acknowledgement Numbers . . . . . . 9 5. TCP behavioral requirements summary . . . . . . . . . . . . . 10 6. Security considerations . . . . . . . . . . . . . . . . . . . 10 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 8.1 Normative References . . . . . . . . . . . . . . . . . . . 11 8.2 Informative References . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Intellectual Property and Copyright Statements . . . . . . . . . . 13 1. Introduction & Scope NAT implementations vary amongst vendors in how they handle TCP packets. This document defines the operational and behavioral requirements that the NAT devices should comply with while processing TCP packets. In addition, a section is devoted to describing hints to implementers in deciphering some of the requirements. Srisuresh, et al. [Page 2] Internet-Draft NAT Behavioral Requirements for TCP March 2005 The requirements outlined here are applicable across all NAT types identified in [RFC2663], most importantly the Traditional Nat, as described in [RFC3022]. This document does not mandate a specific implementation choice. However, this does require NAT devices to adhere to the basic design principles and general behavioral requirements outlined in [BEH-GEN]. Behavioral requirements for UDP are covered in [BEH-UDP]. Application Layer Gateways (ALGs) are out of scope for this document. However, hints on how a NAT could be extended to support ALGs are discussed under the hints section. 2. Terminology Definitions for the NAT terms used throughout the document may be found in [BEH-GEN] and/or [RFC2663]. TCP terms used in the document are as per the definitions given in [TCP]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. TCP requirements discussion This section lists the behavioral requirements of a NAT device when processing TCP packets. Associated with each requirement, the rationale behind the requirement is discussed in detail. 3.1 Address Binding and/or TCP Port Binding NAT provides transparent routing between address realms by assigning realm-specific endpoint locator(s), as packets pertaining to a session cross realm boundaries. Several applications use the same endpoint within a realm to establish multiple simultaneous sessions. Many peer-to-peer applications use the public endpoint registration of peering hosts to initiate sessions into. In order to support peer-to-peer applications and applications that entertain multiple simultaneous session using the same TCP endpoint, NAT MUST retain the association it assigned to an endpoint between realms and reuse the same endpoint association when multiple sessions using the same endpoint are routed through the NAT device. Such a binding between endpoints can occur when a NAT device maintains Address Bindings and/or TCP Port Bindings. REQ-1: A NAT device MUST maintain Address and/or TCP Port Bindings. Srisuresh, et al. [Page 3] Internet-Draft NAT Behavioral Requirements for TCP March 2005 3.2 Timeouts for TCP NAT Sessions As may be noted from [TCP], an end-to-end TCP session in its lifetime goes through three phases, namely Connecting, Established, and Closing. Each end-to-end TCP session is managed through a separate NAT Session within NAT. The NAT Session must be capable of identifying the current phase of the end-to-end TCP session it represents and use an idle timeout period that is appropriate for the current phase. Connecting Phase: An end-to-end TCP session is said to enter the Connecting Phase when either of the endpoints sends the first SYN for the TCP session and exit the phase upon completion of 3-way SYN handshake. The idle timeout used by the NAT Session during this phase is called the SYN timeout. SYN timeout needs to be relatively short, so NAT can protect itself (and, potentially, the hosts behind it) from SYN flood attacks. A NAT session is freed when the SYN timeout expires. Established Phase: An end-to-end TCP session is said to enter the Established Phase upon completion of 3-way handshake and exit the phase upon seeing the first FIN or RST for the session. The idle timeout used by the NAT during this phase of the end-to-end TCP session is called the Session timeout. Session timeout needs to be relatively long, so the NAT Session can retain state of the end-to-end TCP Session within itself even after long periods of inactivity in the session. Long periods of inactivity is not uncommon with applications such as telnet and ftp. When Session timer expires, the corresponding NAT Session may be freed (or) the NAT Session may assume the TCP connection to have transitioned into the Closing phase. Closing Phase: An end-to-end TCP session is said to enter the Closing Phase when either of the endpoitns sends the first FIN or RST for the session. Alternately, the NAT Session may deem the TCP session to have entered this phase when the TCP Session timer expires. The idle timeout used by the NAT Session during this phase is called the Close timeout. Close timeout is relatively short to ensure that the ACKs for the final FINs on a gracefully-closed TCP session had a chance to propagate in both directions, and to allow time for either endpoint to re-open a recently closed or reset TCP session if desired. A NAT device MAY opt to have different Close timeouts depending upon whether the Closing phase is triggered by FIN or not. Once the Close timer expires, the NAT Session will be freed. Srisuresh, et al. [Page 4] Internet-Draft NAT Behavioral Requirements for TCP March 2005 The following requirements apply to the NAT's timeouts: REQ-2: A NAT device MUST be capable of identifying the current phase of an end-to-end TCP session and use different idle timeout periods for each phase of the TCP Session. The timeouts used for each phase SHOULD be admin configurable. The recommended value for SYN timeout is 30 seconds. The recommended value for TCP session timeout is 30 minutes. Lastly, the recommended value for close timeout is 2 x MSL (Maximum Segment Lifetime) or 4 minutes. 3.3 SYN packets during Connecting and Closing phases A NAT device might allow sessions to be initiated in just one direction and not the other. However, once a NAT session is created for a permitted TCP session, and the TCP session is in Connecting phase, the NAT device MUST let the SYN packets through in either direction. Likewise, if the TCP session is in Closing phase and a new SYN packet arrives from either endpoint before the close timer expires, the NAT device should assume that the TCP session has re-entered the Connecting phase and initiate SYN timer as described above. This is because TCP protocol fundamentally permits simultaneous TCP Open from either end. A number of TCP based Peer-to-peer applications utilize the simultaneous TCP open technique to establish peer to peer connections. The following requirement applies to SYN packets arriving during Connecting and Closing phases of a TCP connection. REQ-3: A NAT device MUST let the SYN packets through when the SYN Packets are received on a TCP connection which is in one of Connecting or Closing phase. 3.4 Denial of Service (DoS) attacks Since NAT devices are Internet hosts, they can be the target of a number of different DoS attacks, such as SYN floods and RST attacks. NAT devices SHOULD employ the same sort of protection techniques as Internet-based servers do. Let us examine two types of Dos attacks that are well known with regard to TCP connections. A SYN flood attack is a DoS attack in which one or more external entities initiate a number of simultaneous TCP connections using a SYN packet, but donot complete the 3-way handshake. Naturally, a NAT device is prone to this type Srisuresh, et al. [Page 5] Internet-Draft NAT Behavioral Requirements for TCP March 2005 of attack when the NAT device is in the traversal path of the SYN attacks. One technique to defend against this type of attack is to ensure that the NAT device employs a short SYN timeout and reduce the timeout even further when it determines it is under SYN flood attack. RST attack is another well known DoS attack. An attacker could simply forge a number of RST packets for a variety of Established TCP connections and cause the NAT sessions to be reset and freed. One technique to defend against this type of attack is to validate the RST packet and not let the packet through unless the sequence number used in the RST packet is within the expected TCP window size of the TCP Session. REQ-4: A NAT device SHOULD employ necessary techniques to defend against well known DoS attacks. 3.5 NAT initiated TCP keep-alives When session timer expires for a NAT session, that indicates that the associated TCP session has been idle with no activity for the period matching the TCP Session timeout. Sensing no activity, NAT could free up the NAT Session and remove the state associated with the TCP connection within the NAT device. However, doing so would violate the end-to-end reliability of the IP network. Ideally speaking, IP network is not supposed to retain any hard state. Unfortunately, a NAT device retains session state within itself (via the NAT Sessions) and this information should not be dropped without confirming that one or both halfs of the TCP session are alive. A NAT device may validate the liveness of a TCP client by sending keep-alive packets to the TCP client using the technique described in section 4.2.3.6 of [HOST]. If the NAT device receives an ACK or other traffic from the internal endpoint, it resets the session timer and assumes the connection to be in ôEstablished" phase. If the NAT device receives a RST from the TCP client, the NAT device transitions the TCP connection into the "closing" phase and initiates Close timeout for the session. If the NAT device receives no response from the internal endpoint after sending several keep-alive packets, the NAT assumes that the internal endpoint is dead and again assumes that the TCP connection has entered the "closing" phase. Below is the keep-alive requirement on NAT devices REQ-5: When session timer expires for an established TCP connection, the NAT device MAY initiate sending TCP keep-alives to the clients Srisuresh, et al. [Page 6] Internet-Draft NAT Behavioral Requirements for TCP March 2005 prior to freeing up the Session state within NAT. 3.6. NAT initiated RST packets When session timer expires for a NAT session, it is an indication that the associated TCP connection has been idle with no activity for the duration matching the TCP Session timeout. Sensing no activity, NAT could free up the NAT Session and remove the state associated with the TCP connection. When this happens, the two TCP endpoints in the network, which might potentially be alive, may be unable to resume activity on the connection because the NAT device enroute no longer has the state information pertaining to the end-to-end TCP connection. This can be problematic for application servers that impose limits on the number of connections a user might be allowed to setup in a given period of time. After a few zombie sessions, the server might deny access to its clients, when the connection count on the server exceeds the set limit. The Server has no way to know that some of the client sessions it retains are zombies and shouldnt be counted as real. In such a situation, the NAT device sending a RST packet to both parties will alert the two parties of the connection going away. And, the application servers are not fooled with zombie sessions. Note, a NAT device may choose to send RST packets after it probed the TCP client with TCP Keep-alive packets. Below is the requirement on sending TCP RST packet. REQ-6: When Session timer expires on an Established TCP connection, the NAT device SHOULD send a RST packet to both halfs of a TCP connection and enter Closing state on the connection prior to freeing up the NAT Session. 4. Hints to implementers 4.1 Light weight TCP state machine is a common practice Unlike UDP, TCP sessions are fundamentally unicast in nature and multiple NAT Sessions cannot be aggregated. NAT devices maintain a separate NAT Session to track each end-to-end TCP connection that traverses the NAT device. A NAT device needs to be able to track the current phase of a TCP session at any given time so an idle timer for a duration appropriate for the phase is initiated. Further, a NAT device defending against even the most trivial type of DoS attack will require the knowledge of TCP sequence number and window Size to defend itself against such attacks. As such, many vendors use a light-weight state machine within the NAT Session to track the current state of a TCP connection. Items tracked Srisuresh, et al. [Page 7] Internet-Draft NAT Behavioral Requirements for TCP March 2005 within the state machine would include the last acknowledged sequence number from either half of the TCP session, TCP window size, and the TCP connection phase. The State machine within a NAT Session enters the Connecting state when NAT sees the first SYN packet for that session. The state machine transitions from the Connecting to Established state once the 3-way handshake is completed. The state machine transitions from the Established state to the Closing state when the NAT observes a FIN/FIN ACK sequence, representing graceful shutdown reached cooperatively by both endpoints, or when the NAT observes a RST from either endpoint, representing a non-graceful connection reset forced by one endpoint. The NAT device deletes the NAT session after the Close timer expires while the TCP connection is in the Closing state. In addition to this basic state information, many NATs also record information about the TCP sequence numbers and the acknowledgment numbers they observe in the TCP packets flowing across the NAT. If the NAT contains built-in ALGs that can change the payload length of TCP packets, effectively inserting or removing bytes from the TCP stream in one or both directions, then the NAT MUST adjust the sequence numbers in all subsequent packets exchanged in either direction to reflect these inserted or removed bytes. 4.2 TCP segment processing in NATs supporting ALGs The following discussion on TCP segment processing is relevant only when a NAT device includes support for one or more embedded ALGs. Many NAT devices have the ALG for FTP enabled by default. A NAT device may receive payload relevant to an ALG in multiple TCP segments. Consider the following diagram where the MSS is set to 536 bytes in each endpoint of the TCP connection. +-------------------+ +-------------------+ | Application-Layer | | Application-Layer | +-------------------+ +-------------------+ | TCP [MSS = 536] | | TCP [MSS = 536] | +-------------------+ +-------------------+ | IP | | IP | +-------------------+ +-------------------+ | Lower-Layer | | Lower-Layer | | (MTU = 1500 | | (MTU = 1500 | +-------------------+ +-------------------+ End-host-1 End-host-2 | +--------+ | Srisuresh, et al. [Page 8] Internet-Draft NAT Behavioral Requirements for TCP March 2005 +-------------------| NAT |----------------+ +--------+ Say the application layer on end-host-1 is sending a payload of size 600 bytes. Even though the MTU is 1500 bytes, the payload is sent to the recipient in 2 TCP segments as follows, because the MSS is set to 536 bytes. TCP Segment 1: +-------+------------------------+----------+ |IP hdr |TCP hdr[Payload-Len=536]|Appl-data1| +-------+------------------------+----------+ TCP Segment 2: +-------+------------------------+----------+ |IP hdr |TCP hdr[Payload-Len=64] |Appl-data2| +-------+------------------------+----------+ A NAT device enroute may receive the TCP segments either in order or out of order. In either case, the NAT device needs to assemble the individual segments into a contiguous payload and make the complete payload available for the ALG to process prior to forwarding the segments transparently to another realm. In order to do this, a NAT device is required to enforce some type of queuing mechanism such that when all relevant segments of a payload are received, it is able to reassemble the TCP segments and make the contiguous payload available for ALG processing. For in-order segments, the NAT device needs to send a TCP ACK for the initial segments it received, but didnt forward to the recipient enpoint. This is done so the NAT device can prompt the sending endpoint to continue to send the remaining TCP segments. 4.3 Adjusting Sequence and Acknowledgement Numbers The following discussion on adjusting Sequence and Acknowledgement numbers is relevant only when a NAT device includes support for one or more embedded ALGs. When the embedded ALG on a NAT device modifies the TCP payload, the corresponding payload may increase or decrease in size. As a result, the NAT device is expected to remember the delta change and adjust sequence/acknowledgement numbers in all subsequent TCP packets within the session. Implementors of NAT devices often keep the delta changes in payload due to ALG processing within the NAT Srisuresh, et al. [Page 9] Internet-Draft NAT Behavioral Requirements for TCP March 2005 Session as an extension of the state information the NAT device keeps. 5. TCP behavioral requirements summary Below is a summary of all the TCP behavioral requirements. REQ-1: A NAT device MUST maintain Address and/or TCP Port Bindings. REQ-2: A NAT device MUST be capable of identifying the current phase of an end-to-end TCP session and use different idle timeout periods for each phase of the TCP Session. The timeouts used for each phase SHOULD be admin configurable. The recommended value for SYN timeout is 30 seconds. The recommended value for TCP session timeout is 30 minutes. Lastly, the recommended value for close timeout is 2 x MSL (Maximum Segment Lifetime) or 4 minutes. REQ-3: A NAT device MUST let the SYN packets through when the SYN Packets are received on a TCP connection which is in one of Connecting or Closing phase. REQ-4: A NAT device SHOULD employ necessary techniques to defend against well known DoS attacks. REQ-5: When session timer expires for an Established TCP connection, the NAT device MAY initiate sending TCP keep-alives to the clients prior to freeing up the Session state within NAT. REQ-6: When session timer expires for an Established TCP connection, the NAT device SHOULD send a RST packet to both halfs of a TCP connection and enter Closing state on the connection prior to freeing Up the NAT Session. 6. Security considerations The security considerations described in [RFC2663] for all variations of NATs are applicable here. The recommendations and requirements in this document do not effect the security properties of the NAT devices adversely. 7. Acknowledgements The authors would like to thank Nagendra Modadugu and Cullen Srisuresh, et al. [Page 10] Internet-Draft NAT Behavioral Requirements for TCP March 2005 Jennings for their feedback and comments. Srisuresh, et al. [Page 11] Internet-Draft NAT Behavioral Requirements for TCP March 2005 8. References 8.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, BCP 14, March 1997. [TCP] Postel, J., "Transmission Control Protocol (TCP) Specification", STD 7, RFC 793, September 1981. [HOST] Braden, R., "Requirements for Internet Hosts -- Communication Layers", RFC 1122, October 1989. [RFC2663] Srisuresh, P. and M. Holdrege, "IP Network Address Translator (NAT) Terminology and Considerations", RFC 2663, August 1999. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [BEH-GEN] Ford, B., Srisuresh, P., and S. Sivakumar, ôDesign Principles and General Behavioral Requirements for NATsö, draft-ford-behave-gen-01.txt (work in progress), March 2005. 8.2 Informative References [ASND] Reynolds, J. and J. Postel, "Assigned numbers", RFC 923, October 1984. [ICMP] Postel, J., "Internet Control Message Protocol", RFC 792, September 1981. [NAT-CMPL] Holdrege, M. and P. Srisuresh, "Protocol Complications with the IP Network Address Translator", RFC 3027, January 2001. [NAT-CHK] Ford, B. and D. Andersen, "Nat Check Web Site: http://midcom-p2p.sourceforge.net", June 2004. [V4-REQ] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995. [UNSAF] Daigle, L. and IAB, "IAB Considerations for Unilateral Srisuresh, et al. [Page 12] Internet-Draft NAT Behavioral Requirements for TCP March 2005 Self-Address Fixing (UNSAF) Across Network Address Translation", RFC 3424, November 2002. [BEH-UDP] Audet, F. and C. Jennings, "NAT Behavioral Requirements for Unicast UDPö, draft-ietf-behave-nat-00.txt (work in progress), January 2005. Authors' Addresses: Pyda Srisuresh Caymas Systems, Inc. 1179-A North McDowell Blvd. Petaluma, CA 94954 USA Phone: (707) 283-5063 E-mail: srisuresh@yahoo.com Senthil Sivakumar Cisco Systems, Inc. 170 West Tasman Dr. San Jose, CA 95134 USA Phone: Email: ssenthil@cisco.com Kaushik Biswas Cisco Systems, Inc. 170 West Tasman Dr. San Jose, CA 95134 USA Phone: +1 408 525 5134 Email: kbiswas@cisco.com Bryan Ford M.I.T. Laboratory for Computer Science 77 Massachusetts Ave. Cambridge, MA 02139 USA Phone: 1-617-253-5261 Email: baford@mit.edu Srisuresh, et al. [Page 13] Internet-Draft NAT Behavioral Requirements for TCP March 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Srisuresh, et al. [Page 14] Internet-Draft NAT Behavioral Requirements for TCP March 2005 Internet Society. Srisuresh, et al. [Page 15]