How Norsk Ensures Your Live Streams Are in Sync

Syncing audio and video in a live stream in Norsk

Given the chaotic nature of incoming video streams, a live streaming system like Norsk must create order to deliver properly synchronized outputs. Norsk relies on a deliberate and robust strategy for mastering time from the point of ingestion, so that you can be confident your live streams are in sync.

[For an in-depth look at the challenges of dealing with time in live streaming workflows, see the white paper “The Hidden Complexities of Time in Live Streaming.”]

What clock do you use as your source of truth? At Norsk, we made the deliberate choice to use local system time as our reference. Specifically, we use the monotonic clock provided by the operating system—not the system clock that displays in the corner of your screen, because that one changes when the Network Time Protocol (NTP) determines that your computer has drifted from atomic time servers. The monotonic clock ticks smoothly forward, never backward, and provides a stable reference.

When any stream — whether SRT, NDI, or SDI — enters Norsk, it is immediately re-timestamped to align with this local clock domain, aligning all sources to the same time base and allowing Norsk to compare timestamps across different inputs directly. Audio and video from the same source are always adjusted together, ensuring they remain in sync as they are mapped into the unified timeline.

Once all incoming sources share the same clock domain, Norsk can mix, switch, and composite them predictably. The internal timeline becomes the reference against which every frame, sample, and output is measured.

Handling Embedded Timecodes and Media Timestamps

Many formats contain their own timing information; for instance, Presentation Timestamps (PTS) values in transport streams or SEI timecodes in H.264. Norsk uses these timestamps, when available, to help align related sources, but it doesn’t rely on them as the absolute truth. If two sources share matching embedded timecodes, Norsk can apply identical offsets when re-timestamping, preserving their original synchronization while still normalizing them to the local clock.

That balance of honoring incoming timing relationships without depending on them is key. Real-world encoders often output incorrect or drifting timestamps, so Norsk treats external time as a hint rather than a guarantee.

Dealing with Drift and Jitter

Even after re-timestamping, clocks can drift. Hardware devices like SDI encoders are notorious for this: They might claim 25 frames per second, but actually output slightly more or less than that. Over hours, that difference adds up. Norsk continuously monitors and compensates for such drift, keeping the effective frame and sample rates correct over time.

Then there’s clock jitter: the slight, unpredictable delays that occur because Norsk runs as software in containers on a non-real-time operating system, getting interrupted by other processes. Reading the time, being interrupted, and acting on it a few milliseconds later can skew precision. Norsk accounts for this as well, utilizing smoothing and feedback mechanisms to maintain stable synchronization even under load.

Fine-Tuning with Nudges

Sometimes, two sources need to be manually aligned when they arrive out of sync. Norsk provides nudge controls that let operators adjust the relative timing between sources or between audio and video tracks within a single source. There are two types of nudges within Norsk:

  • Intra-source nudge lets you adjust the relative timing of streams within a single source. Suppose you have a video and three audio tracks coming in via SRT, and one of the audio tracks is slightly ahead or behind the others. In that case, you can independently nudge each track to bring them back into alignment.
  • Inter-source nudge enables you to align different sources with one another. If you have two cameras covering the same 100-meter race, you can adjust one relative to the other to ensure the starter gun’s smoke appears simultaneously in both shots.

The traditional way to check this alignment is with a clapper board—the classic film production tool. Have someone stand where both cameras can see them with a clapper board. When they clap it shut, both the visual (the board closing) and the audio (the clap) are instantly recognizable in the video. An operator can use this reference to nudge sources into perfect alignment before an event begins, and they’ll stay synchronized for the duration. In Norsk Studio, you can control this using a sync component, shown below.

 

A complex workflow in Norsk Studio showing audio sync capability
A media control room in the cloud worfklow in Norsk Studio. The audio commentary link is highlighted in the blue circle.
A zoomed-in portion of a Norsk Studio workflow showing audio time sync
A closer look at the audio commentary sync node in Norsk Studio

Get in touch or set up a demo to learn more about how Norsk makes sure your streams are in sync.

Author

  • Kelvin Kirima is a developer at id3as, proficient in Rust, JavaScript/Typescript, PostgreSQL, and web technologies such as HTML and CSS.

    View all posts