July 11, 2007

Seattle Conference on Scalability: SCTPs Reliability and Fault Tolerance

Google Tech Talks
June 23, 2007


Low cost clusters are usually built from commodity parts and use standard transport protocols like TCP/IP. Once systems become  large enough, reliability and fault tolerance become an important issue and TCP/IP often requires additional mechanisms to ensure reliability of the application. The Stream Control Transmission Protocol (SCTP) is a newly standardized transport protocol that provides additional mechanisms for reliability beyond that of TCP. The added reliability and fault tolerance of SCTP may function better for MapReduce-like distributed applications on large commodity clusters.

SCTP has the following features that provide additional levels of reliability and fault tolerance. Selective acknowledgment (SACK) is built-in to the protocol with the ability to express larger gaps than TCP; as a result, SCTP outperforms TCP under loss. For cluster nodes with multiple interfaces, SCTP supports multihoming, which transparently provides failover in the event of network path failure.  SCTP has the stronger CRC32c checksum which is necessary with high data rates and large scale systems. SCTP also allows multiple streams within a single connection, providing a solution to the head-of-line blocking problem present in TCP-based farming applications
like Google's MapReduce. Like TCP, SCTP provides a reliable data stream by default, but unlike TCP, messages can optionally age or reliability can be disabled altogether. The SCTP API provides both a one-to-one (like TCP) and a one-to-many (like UDP) socket style; use of a one-to-many style socket can reduce the number of file descriptors required by an application, making it more scalable.

Speakers: Brad Penoff, Mike Tsai, Alan Wagner

