Video Encoding and Transcoding: Complete Guide with Norsk Solutions

Video encoding in the Norsk Studio interface

If you are streaming video, you have likely encountered the terms encoding and transcoding. In fact, they’re the fundamental building blocks of the entire streaming media universe. Video encoding refers to compressing raw video into a smaller digital format suitable for transmission. Transcoding converts already encoded video into a different format or bitrate to ensure compatibility across various devices and platforms. This article examines the fundamentals of video encoding and transcoding, highlighting why Norsk is well-positioned to meet your encoding and transcoding needs.

Why does video need to be encoded and transcoded?

Raw video files are enormous; a single minute of uncompressed 4K video can consume over 12GB of storage. The primary purpose of encoding is to reduce the size of the data for easier storage or transmission. Video streaming engineers utilize specialized algorithms, known as codecs, for this reduction. The chosen video codec serves as a template that describes how the resulting data should be interpreted and ultimately decoded for playback. There are numerous video codecs to choose from, with the most popular being H.264 (also known as AVC), H.265 (HEVC), VP9, and AV1. As with most technology decisions, the particular codec you choose depends on your specific needs, so ensure that the streaming technologies and services you select are compatible with your requirements.

Transcoding takes this process further by converting already-encoded video into different formats, resolutions, or bitrates. This is essential for adaptive bitrate streaming (ABR), where you deliver the same video stream at multiple quality levels to accommodate various devices and network conditions. For example, a viewer on a smartphone with limited bandwidth receives a lower resolution and lower bitrate stream. In contrast, someone with a high-speed connection and a 4K display receives the premium-quality version.

How is video encoded?

Uncompressed video stores the color and brightness value of every pixel in every frame, ensuring perfect image quality for each frame. This results in extremely large file sizes. 

Encoding achieves compression by analyzing input files for patterns and redundancies and then storing these patterns more efficiently. Instead of storing the color value of every single pixel in every frame, encoders look for similarities across frames and represent them more compactly.

Different video codecs employ increasingly advanced techniques to identify and encode these patterns. When the encoding process retains every detail and the original file can be perfectly reconstructed, the result is called lossless compression. If some data is discarded to achieve better compression and the original can’t be fully recovered, it’s referred to as lossy.

Codecs employ what’s known as temporal compression to analyze how frames change over time. Rather than storing complete information for every frame, encoders create different types of frames:

  • I-frames (intra-coded frames) contain all the data needed to display a complete image.
  • P-frames (predictive frames) require the prior decoding of some other picture(s) to be decoded.
  • B-frames (Bi-directional frames) require the decoding of both previous and future frames to be displayed.

The gap between I-frames is known as the keyframe interval or GOP (group of pictures) size. For instance, with a keyframe interval of 2 seconds and a frame rate of 30 frames per second (fps), an I-frame appears every 60 frames. As a rule of thumb, longer intervals generate improved efficiency (i.e. better quality for a given bitrate), but mean that players can experience delays at playback start or when switching between qualities in an ABR setup, as they need an I-frame to start the decoding process.

Choosing the right encoder

Selecting the appropriate encoder requires evaluating multiple factors to ensure optimal performance, quality, and compatibility. Here are a few key considerations:

  • Codec support: Ensure the encoder supports all the video codecs you intend to use, such as H.264, H.265, VP9, or AV1. While H.264 offers broad compatibility, newer codecs like H.265 (HEVC) and AV1 provide superior compression efficiency for a given quality but require more processing power and may have varying levels of device support. Some encoders are limited to a narrow set of codecs, while others offer broad support with advanced configuration options.
  • Resolution and frame rate support: Does the encoder support the resolutions you need (e.g., 1080p, 4K, 8K)? Can it handle the desired frame rates (e.g., 30 fps, 60 fps)? Higher resolutions and frame rates require more computational resources from the encoder and result in larger file sizes. Therefore, ensure the encoder can efficiently handle your target specifications.
  • Bitrate control and quality: The bitrate directly influences output quality and file size. A robust encoder offers flexible bitrate control options, including constant bitrate (CBR) for consistent bandwidth usage (ideal for live streams) and variable bitrate (VBR) for optimizing quality within a target file size.
  • Encoding speed and performance (hardware vs. software): Encoders can leverage either software (CPU-based) or hardware (GPU or dedicated chip-based) for processing. Hardware encoding can provide significantly faster performance and lower power consumption. That makes it ideal for high-volume or real-time applications, such as live streaming. Software encoding, while typically slower, often provides greater flexibility and higher quality at very low bitrates. Some codecs, such as AV1, are so computationally intensive that for live content, they may only be practical with hardware encoding.
  • Input/output compatibility and streaming protocols: An encoder must integrate with your existing infrastructure. Consider its supported input sources (e.g., professional cameras, IP feeds), output container formats (e.g., MP4, TS, WebM), and essential streaming protocols (e.g., RTMP, HLS, MPEG-DASH, SRT). Compatibility here ensures your streaming platform can ingest your encoded content and deliver it to your audience.
  • Scalability and management features: As your streaming needs grow, your encoder solution should scale with you. Consider features like API integration for automated workflows, robust monitoring tools, and the ability to handle multiple simultaneous encoding jobs. 

Why choose Norsk for your video encoding and transcoding?

Advanced codec support

Norsk offers broad codec support, ranging from popular codecs like H.264 to newer standards such as H.265 (HEVC) and AV1. It also supports advanced audio formats, including AAC, MP3, Dolby Digital (AC3), Dolby Digital Plus (E-AC3), and Opus. This comprehensive audio support ensures broadcast-quality sound across all delivery formats.

Comprehensive input and output support

Norsk supports a wide range of input sources, including RTMP, SRT, WebRTC, Blackmagic DeckLink, SDI, HDMI, UDP TS, MP4 files, and even signal generators for testing. This flexibility enables you to ingest content from professional cameras, IP feeds, file-based workflows, or live streaming sources without compatibility concerns.

For outputs, Norsk supports major streaming protocols, including HLS, MPEG-DASH, CMAF, WebRTC, SRT, RTMP, and UDP TS, as well as file outputs in MP4 and transport stream formats. This comprehensive protocol support ensures your content reaches audiences on a wide range of platforms and devices.

Hardware acceleration

In addition to software encoding via x264 and x265, Norsk supports hardware acceleration on GPU and specialized video processing units (VPUs) to maximize throughput and offload tasks from the CPU, thereby reducing both costs and energy consumption. Norsk currently supports both the NETINT Quadra VPU and Nvidia family of GPUs.

Infrastructure agnostic

Norsk runs on cloud platforms such as Akamai, Google Cloud Platform, and AWS but you can also run it on-prem, which opens up new opportunities, including leveraging various high-grade VPUs (video processing units).

Simple to use

One great way to take advantage of Norsk is through Norsk Studio, which offers a simple drag-and-drop interface for creating live media workflows.

In the above screen capture, we take a live SRT stream and package it into a six-rendition ABR ladder before outputting it as low latency HLS. You’ll also notice an explicit audio encode, which handles the audio processing. You can do all this by simply dragging the component boxes to the canvas, connecting them, and hitting play. 

Video encoding in the Norsk Studio interface

You can configure or change various encode settings both from the Studio UI and also via simple YAML files. This gives you full control over all of the options each codec implementation makes available.

Configuring encoding and transcoding settings in Norsk Studio

Video encoding and transcoding are a critical piece of the streaming infrastructure, transforming raw video into optimized formats that can reach audiences anywhere, on any device. A platform like Norsk allows you to encode and transcode live video with speed, ease, and reliability.

 

Author

  • Kelvin Kirima is a developer at id3as, proficient in Rust, JavaScript/Typescript, PostgreSQL, and web technologies such as HTML and CSS.

    View all posts