RTMP vs SRT for Video Ingest

Screenshot of Norsk Studio showing an SRT input, a preview, and output to YouTube Live

Video streaming relies on various network protocols to transport content across devices. Among these, RTMP and SRT have emerged as two prominent solutions for video ingest, the process of sending media from a source (such as a live event or TV studio) to a streaming service or network. This article looks at RTMP vs. SRT for video ingest, exploring their technical foundations, unique properties, and usage considerations.

Video streaming foundations: IP, TCP, and UDP

Before we get into the specifics of SRT and RTMP, it’s important to understand how video is transmitted from one device to another on the internet. In the diagram below, both SRT and RTMP are stacked on top of other protocols in a structure known as layering, where protocols build upon other protocols to deliver data across networks.

Diagram showing internet layers from the physical at the bottom to the application layer at the top

Each layer handles a particular responsibility. At the base, we have the physical and data link layers, which deal with network interfaces and the structure of data for delivery over local networks (e.g., Ethernet or Wi-Fi). Above that is the Internet Protocol (IP), responsible for addressing and routing packets across networks. For example, you might be streaming video from a laptop on a Wi-Fi network at home, while the video service runs on servers connected via fiber in a data center. Despite the physical layer differences, both systems can exchange data reliably because they speak the same language at the IP layer, which uses its addressing system to ensure that packets know where they’re going, no matter the underlying network. 

Above the IP layer, we have the transport layer, where protocols such as TCP and UDP are the key players. These transport protocols determine how data is delivered

TCP (Transmission Control Protocol) is a transport protocol that is both reliable and connection-oriented; it guarantees that all data is delivered in the correct order and without error, and that a dedicated connection is established between the client and server before any data exchange can begin.  The connection request starts with a handshake that establishes two-way communication and confirms that the host on the other end is ready to receive data.

The sender transmits messages as a stream of bytes, and if it sends two messages in sequence, the receiving application is guaranteed to get the first one before the second. When data segments arrive out of order (something the underlying network is allowed to do!), TCP buffers the out-of-order data until all data can be re-ordered appropriately and delivered to the application. While these mechanisms are suitable for reliability, they can lead to what’s known as head-of-line blocking, where lost or delayed packets stall the delivery of subsequent packets, incurring long delays.

As a result, TCP on “poor” networks (i.e., one with packet drops, high latency, and/or packet re-ordering) is not always suitable for real-time applications, such as live streaming or video conferencing. This is where UDP-based protocols can excel.

Unlike TCP, UDP (User Datagram Protocol) is a connectionless protocol, meaning the sender transmits messages without establishing a connection or providing any delivery guarantees. UDP sends data as individual packets called datagrams, and there’s no guarantee that they’ll arrive in order, or even arrive at all; this is unreliable, but, counterintuitively, it’s precisely what makes it potentially attractive to real-time applications.

Imagine you’re in a live video call. If a single video frame is dropped or delayed, it’s usually better to skip it and move on than to pause the stream while waiting for the dropped frame to be re-transmitted. UDP enables this type of communication, making it ideal for scenarios such as live streaming, VoIP, and gaming, where low latency is more important than perfect delivery. 

Now that we have a primer on the foundations, let’s move on to examine their applications in RTMP and SRT.

RTMP

Real-Time Messaging Protocol (RTMP) is an application-level protocol designed for the transmission of multimedia transport streams (such as audio, video, or any other data) over a suitable transport protocol (usually TCP). RTMP connection begins with a handshake process. The handshake consists of three fixed-size packets (also referred to as chunks in official documentation), facilitating the exchange of essential information such as the RTMP version, timestamps, and random data used for peer validation.

After handshaking, RTMP organizes data into chunk streams, and multiple chunk streams can be multiplexed into a single TCP connection. Unlike the statically sized handshake chunks, the protocol fragments the transport streams into smaller variable-sized chunks. Each chunk stream is assigned a unique ID and carries audio, video, control messages (such as chunk size), or metadata. Chunking allows large messages at the higher protocol level to be broken into smaller messages, which is essential, as it enables audio and video to be interspersed so that smaller packets (such as those for audio) aren’t stuck behind potentially very large payloads (such as a video i-frame). The receiver uses the chunk stream IDs to sort the incoming chunks back into their proper streams and reassemble the original messages.

Because RTMP has been around for more than two decades, it does not support some features inherent to modern streaming, such as encryption and recent codecs like H.265, AV1, and Opus. For this reason, newer variations of the RTMP protocol have emerged, including the following:

  1. E-RTMP, or Enhanced RTMP, which enhances RTMP by adding features such as expanded codec support, advanced timestamp precision, multitrack capabilities, and a reconnect request feature
  2. RTMPS (Real-Time Messaging Protocol Secure), which is RTMP over a Transport Layer Security (TLS/SSL) connection
  3. RTMFP (Real-Time Media Flow Protocol), which is RTMP over User Datagram Protocol (UDP) instead of TCP

SRT

SRT (Secure Reliable Transport) enables secure and reliable data transmission across networks. While it can transfer any type of data, its primary use is for delivering audio and video streams. SRT builds on the UDP Data Transfer (UDT) project, which uses UDP for high-performance, high-volume data transfer. Although UDT didn’t specifically target live streaming, its packet loss recovery mechanism gave SRT a strong foundation on which to build.

UDT’s original design assumes applications can consistently fill input buffers to maintain optimal transmission rates. However, real-time video generates packets at relatively predictable intervals that are often much slower than the rapid data availability seen in file transfers. This mismatch causes buffer depletion, triggering resets in UDT’s transmission algorithm that disrupt the steady flow required for live streaming. More critically, when network congestion occurs, UDT prioritizes retransmitting packets from its loss list over accepting new data from the application. This behavior can block the UDT API (in much the same way as TCP suffers from head-of-line blocking), preventing new video frames from being processed, even though the video source cannot pause its real-time generation. This causes frame drops and compromises reliability, defeating the purpose of reliable transmission.

SRT reimagines this approach by sharing available bandwidth intelligently between original transmissions and retransmissions. Rather than allowing retransmissions to monopolize the connection, SRT implements a packet pacing system that reserves bandwidth overhead for recovery traffic while maintaining the primary data flow. When timing conflicts arise, SRT employs a “too-late packet drop” mechanism that discards older packets whose delivery window has expired, prioritizing newer, more relevant data. 

As a connection-oriented protocol, SRT embraces the concepts of connection establishment and session management. An SRT connection lifecycle is characterized by three distinct phases: initial engagement through a handshake process, active maintenance through continuous packet exchange, and termination either through explicit close commands or timeout conditions when communication ceases. During the initial handshake, SRT negotiates a minimum buffer delay, which allows it to absorb network variations and allows packet retransmission when needed.  The negotiation is very simple – simply choosing the larger of the delays suggested by either the sender or the receiver.

Like its UDT predecessor, SRT supports two fundamental connection configurations:

  • Caller-listener mode, where one endpoint (listener) waits passively for connection requests while the other (caller) initiates contact.
  • Rendezvous mode, where both endpoints simultaneously attempt to establish a connection.

SRT’s security is based on its implementation of the Advanced Encryption Standard (AES) algorithm, which encrypts the payload of SRT streams while leaving packet headers unencrypted to maintain essential routing and synchronization information.

Considerations when selecting a protocol for ingest

Here are a few factors you should consider when choosing whether to use SRT or RTMP as your video ingest protocol: 

Platform Support

The server you’re ingesting media to needs to support the protocol you intend to use. RTMP is a popular ingest protocol, and most media servers support it. The more recent SRT does not yet enjoy such ubiquitous support, although it’s rapidly catching up.

Our platform, Norsk, supports both. Creating workflows is as simple as dragging and dropping components and connecting them to define how the media should flow. In the example below, we are using SRT to acquire a source before repackaging it as RTMP for publication on YouTube Live.

Screenshot of Norsk Studio showing an SRT input, a preview, and output to YouTube Live, showing an example of RTMP vs. SRT for video ingest.

Interoperability with codecs 

RTMP has significant codec limitations supporting only a restricted set of audio and video formats, including H.264, AAC, and MP3. Modern codecs such as H.265/HEVC, VP9, AV1, and Opus are not supported by the original RTMP specification, which limits streaming quality and efficiency options. SRT, being payload-agnostic, can transport any type of video format, codec, resolution, or frame rate.

This was a core motivation for Enhanced RTMP (which Norsk supports), allowing you to stream formats such as H.265, AV1, and Opus over RTMP.  

Latency

SRT prioritizes predictable latency over minimal delay through its Timestamp-Based Packet Delivery (TSBPD) mechanism, which ensures consistent end-to-end timing regardless of network conditions. This is achieved by buffering packets based on sender timestamps, allowing SRT to absorb network jitter and reordering. A latency buffer of 3-4 times the RTT (time it takes to send a packet and receive a response) is recommended. For low-latency workflows, we have seen people go as low as 10ms — enough (on a local network) to recover from occasional packet collisions, while adding minimal latency.

RTMP, by contrast, can achieve even lower latency on reliable networks, but this comes at the expense of predictability. As long as everything is perfect, RTMP might be adding only a handful of milliseconds. Since it runs over TCP, it benefits from built-in retransmission and congestion control, eliminating the need for additional buffering at the application level. However, TCP’s head-of-line blocking and congestion control algorithms can cause delays to spike when packets are lost or the network becomes unstable. Ultimately, for poor networks, TCP can become completely unusable, whereas SRT (with a suitably chosen latency) might still deliver an acceptable stream.

Ultimately, the choice depends on your operating environment: SRT is better suited when you need consistent, predictable timing despite network variations, while RTMP may provide better latency on stable connections.

Security and Encryption

SRT includes AES encryption as a standard feature, protecting content with minimal overhead and configuration complexity. In contrast, RTMP relies on RTMPS (RTMP over TLS/SSL) for encryption, although not all implementations of RTMP support RTMPS.

So Which Protocol is for You?

RTMP and SRT are popular and relevant protocols for ingest, each serving distinct use cases in the streaming ecosystem. RTMP remains the go-to choice due to its broad compatibility and seamless integration with established platforms. At the same time, SRT excels in scenarios that demand ultra-low latency, robust network resilience, and broad codec support. The optimal choice depends on your specific requirements for latency, network conditions, security needs, and target platform compatibility. Norsk supports both RTMP and SRT, along with several other protocols for ingest — get in touch to find out more about the wide range of protocols, codecs, and formats Norsk supports.

Author

  • Kelvin Kirima is a developer at id3as, proficient in Rust, JavaScript/Typescript, PostgreSQL, and web technologies such as HTML and CSS.

    View all posts