HLS – Bitmovin

Providing a Premium Audio Experience in HLS with the Bitmovin Encoder

Mathew Carrigan — Mon, 01 Jul 2024 14:53:51 +0000

Introduction

Many streaming providers are looking for ways to offer a more premium and high quality experience to their users. One often overlooked component in streaming quality is audio – and more specifically which audio bitrates, channel layouts, and even audio languages are available and how these options can be delivered to the viewers on a range of devices. While there many ways of improving the video streaming quality & experience such as Per-Title Encoding, Multi-Bitrate Video, High Dynamic Range (HDR), and high resolutions, there are also some some great ways of enhancing a user’s experience with premium hls audio. Some of the most important considerations for audio streaming are:

Adaptive Streaming: serving multiple audio bitrates for various streaming conditions
Reduced Bandwidth & Device Compatibility: multi-codec audio for better compression at reduced bitrates
Improved User Experience: 5.1(or greater) surround sound or even lossless audio
Accessibility and Localization: such as multi-language or descriptive audio

You can learn even more about how audio encoding affects the streaming experience in this blog.

In Bitmovin’s 2023-24 Video Developer Report, we saw that immersive audio ranked in the top 15 areas for innovation; while audio transcription was the #1 ranked use-case for AI and ML. Furthermore, though AAC remains the the most widely used audio codec – mostly due to it’s wide device support, we see that both Dolby Digital/+ and Dolby Atmos are the #2 and #3 ranked audio codecs that streaming companies are either currently supporting or planning on supporting in the near future.

Audio codec usage – source: Bitmovin Video Developer Report

With HLS and its multivariant approach, this is all possible; but understanding just how to construct and organize your HLS multivariant playlist can be tricky at first. In this tutorial we will take a look at some best practices in HLS for serving alternate audio renditions as well as an example at the end of this article showcasing how to simply do this using the Bitmovin Encoder.

Basic audio stream packaging

The most basic way to package audio for HLS is to mux the audio track with each video track. This works for very simple configurations where you are only dealing with outputting a single AAC Stereo audio track at a single given bitrate. While the benefit of this approach is simplicity, it has many limitations such as not being able to support multi-channel surround sound, advanced codecs, and multi-language support. Additionally demuxing audio and video comes with benefit of using other muxing containers like fragmented MP4 or CMAF which don’t require client-side transmuxing. Additionally, keeping audio and video muxed together comes with inefficient storage and delivery as each video variant will have the audio duplicated. Similarly, demuxed audio and video allows for the use MP4 and CMAF containers which are more performant for client devices since they won’t have to demux or transmux the segments real-time.

A multivariant playlist output for this would look something like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=4255267,AVERAGE-BANDWIDTH=4255267,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440
manifest_1.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=3062896,AVERAGE-BANDWIDTH=3062896,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1920x1080
manifest_2.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1591232,AVERAGE-BANDWIDTH=1591232,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1600x900
manifest_3.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1365632,AVERAGE-BANDWIDTH=1365632,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720
manifest_4.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=862995,AVERAGE-BANDWIDTH=862995,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=960x540
manifest_5.m3u8

Audio/Video demuxing

A better approach is to demux the Audio and Video tracks – luckily HLS makes this simple by the use of HLS EXT-X-MEDIA playlists which is the standard way of declaring alternate content renditions for audio, subtitle, closed-captions, or video(mostly used alternative viewing angles such as in live sports). With the use of EXT-X-MEDIA to decouple audio from video, we can add in many great audio features such as supporting alternate/dubbed language tracks, surround sound tracks, multiple audio qualities, and multi-codec audio.

By supplying audio tracks with EXT-X-MEDIA tags, we can explicitly add each audio track that we want to output as well as group them together – Then we can correlate each Video Variant(EXT-X-STREAM-INF) to one of the grouped Audio Media Playlists.

Using the previous example of a single AAC Stereo Audio track, a demuxed audio/video output would look like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC_Stereo",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aac.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="AAC_Stereo"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="AAC_Stereo"
manifest_2.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1600x900,AUDIO="AAC_Stereo"
manifest_3.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="AAC_Stereo"
manifest_4.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=960x540,AUDIO="AAC_Stereo"
manifest_5.m3u8

Here, you can first see we declare a single Audio Media(EXT-X-MEDIA) playlist for our audio track and give it a group-id attribute value of “AAC_Stereo“. Then each Video Variant EXT-X-STREAM-INF tag uses the “AUDIO” attribute to associate its video track to the Audio Media group “AAC_Stereo“.

Multiple audio bitrates

But now let’s imagine we want to better optimize our Adaptive Streaming to deliver our AAC Stereo audio in multiple bitrates such as a high(196kbps) and low(64kbps) so that the higher resolution Video Variants can take advantage of higher quality+bitrate audio given the increase in bandwidth when streaming those variants. We can accomplish this by encoding our audio with both a low and high bitrate outputs and group them separately – then decide which Video Variant gets which Audio bitrate/quality. – For example, our 720p or below variants get the lower quality audio by default, and our full HD or above variants get the higher quality audio by default. Just think of that as defaults though, because most modern Players that stream HLS, will allow for independently picking which audio quality to play based on Adaptive-Bitrate streaming conditions.

An example of utilizing a low and a high AAC Stereo tracks would look like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac-stereo-64",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aac_64k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac-stereo-196",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aac_196k.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="aac-stereo-196"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="aac-stereo-196"
manifest_2.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1600x900,AUDIO="aac-stereo-196"
manifest_3.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="aac-stereo-64"
manifest_4.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=960x540,AUDIO="aac-stereo-64"
manifest_5.m3u8

In this example, we now have two audio tracks, one for each bitrate, and therefore have two Audio Media (EXT-X-MEDIA) playlists defined, each having unique GROUP-ID attribute, but the same NAME attribute. This is a good way declaring that the audio tracks are the same language, channel config, and codec, but at different qualities. Now, we can declare that each Video Variant(EXT-X-STREAM-INF) that is 720p or less sets the AUDIO group for that variant to the low bitrate Audio Track(GROUP-ID="aac-stereo-64") and those variants above 720p get the higher bitrate AUDIO group(GROUP-ID="aac-stereo-196") by default (but again, most Players can manage the audio tracks independently for optimal adaptive streaming).

This is at least an improvement on the previous single-bitrate audio packaging – But still, there are plenty of enhancements we can make!

More efficient AAC

The previous examples are all relying on Low Complexity AAC(AAC-LC) because this basic audio codec is supported by every playback device. It is necessary to always have at least one AAC-LC track to be able support older devices. However, most devices these days can support more efficient versions of AAC such as High Efficiency AAC(AAC-HE) which comes in two main versions: v2 which is used for bitrates up to 48kbps and v1 which is used for bitrates up to 96kbps.

So let’s adapt our previous example to not rely on 2 (or more) different AAC-LC audio tracks, and instead output one AAC-HE v1, one AAC-HE v2, and one AAC-LC rendition. The tricky part here is that we will want to group each of the above into a different GROUP-ID so that the Player client can decide which to use based on which codecs it supports – but we also will want each Video Variant to be able to use any of those audio tracks. To accomplish this, all we need to do is duplicate each Video Variant for each of the 3 unique Audio Media GROUP-IDs.

A note on grouping audio renditions

The apple authoring spec recommends creating one audio group for each pair of codec and channel count.

We now have have 3 different versions of the AAC codec so we will have 3 different audio groups.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_lc-stereo-128k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aaclc_128k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he1-stereo-64k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache1_64k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he2-stereo-32k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache2_32k.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="aac_lc-stereo-128k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.5",RESOLUTION=2560x1440,AUDIO="aac_he1-stereo-64k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.29",RESOLUTION=2560x1440,AUDIO="aac_he2-stereo-32k"
manifest_1.m3u8

## Repeat above approach for each additional Video Variant

In this example, you can see that we replicated the 1440p variant 3 times – 1 for reach Audio Media GROUP-ID which would then be repeated for each additional Video Variant. This will allow the client Player to decide for a given Video Variant, which audio track group to use based upon codec support and streaming conditions. Also take note how each Video Variant’s CODECS attribute is updated to represent the necessary audio codec identifier.

Surround sound audio

Now, let’s say we also want to be able to support 5.1 surround sound for those clients which can benefit from it. For this we can decide on which surround sound codec we want to support. Let’s use Dolby Digital AC-3 for this example. Since we are now relying on a more advanced audio codec for optimal surround experience, it is also be important to consider devices that may have 5.1 or greater speaker setups, but that can NOT support Dolby Digital. For this we will also include a secondary 5.1 track using basic AAC-LC codec. Now, we will create 2 new Audio Media playlists with unique GROUP-ID and NAME attributes.

A note on downmixing from 5.1 audio sources

In this example, we will assume the source has a Dolby Digital surround audio track. From that single audio source, we will create create our AC-3 surround track, implicitly convert to our AAC surround track, and automatically downmix the source 5.1 to our various AAC 2.0 Stereo outputs using the Bitmovin Encoder which is shown in sample code at the bottom of this article. Alternatively you can do all sorts of mixing, channel-swapping, as well as work with distinct audio input files like separate files for each channel for example. You can learn more about that here.

Don’t forget about grouping audio renditions

As previously mentioned, the apple authoring spec recommends creating one audio group for each pair of codec and channel count.

We now have have 5 different unique combinations of codecs and channel counts so we will have 5 different audio groups.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_lc-stereo-128k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aac_128k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he1-stereo-64k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache1_64k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he2-stereo-32k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache2_32k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_lc-5_1-320k",LANGUAGE="en",NAME="English - 5.1",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aac_lc_5_1_320k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="dolby",LANGUAGE="en",NAME="English - Dolby",CHANNELS="6",URI="audio_dolbydigital.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="aac_lc-stereo-128k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.5",RESOLUTION=2560x1440,AUDIO="aac_he1-stereo-64k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.29",RESOLUTION=2560x1440,AUDIO="aac_he2-stereo-32k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.29",RESOLUTION=2560x1440,AUDIO="aac_lc-5_1-320k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,ac-3",RESOLUTION=2560x1440,AUDIO="dolby"
manifest_1.m3u8


## Repeat above approach for each additional Video Variant

Here you can see that now we have the 1440p variant replicated a total of 5 times, once for each Audio Media GROUP-ID which allows the client Player to select the most appropriate audio and video track combination.

Again, note how each duplicated Video Variant has an updated CODECS attribute to represent the appropriate audio codec associated to it. One major reason we duplicate each Video Variant for each Audio Media GROUP-ID is that most devices cannot handle switching between audio codec’s during playback; so as Adaptive-Bitrate logic on the Player switches between different Video Variant’s it will pick the variant that has the same audio codec that it has been using. Additionally, in HLS, we cannot simply list the Video Variant once and add all of the various audio codecs to the CODECS attribute. This is because per HLS, the client device MUST be able to support all of the CODECS mentioned on a given Video Variant(EXT-X-STREAM-INF) to avoid possible playback failures. So instead, we separate out the Video Variants per each codec + channel number set.

Multi-language audio

This is all great, but what if I want to support additional dubbed audio language tracks or even Descriptive Audio tracks? Luckily, that is rather simple to do. We can just create additional AudioMedia playlists for each language and utilize the existing GROUP-IDs depending on which codecs and formats we want to support. We can use the existing GROUP-IDs which are logically grouped by Codec and Channel pairing per the Apple authoring spec, then we can add our additional language tracks to those existing groups.

#EXTM3U
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-VERSION:6
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V1-Stereo",NAME="English-Stereo",LANGUAGE="en",DEFAULT=NO,URI="audio_aache1_stereo.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V1-Stereo",NAME="Spanish-Stereo",LANGUAGE="es",DEFAULT=NO,URI="audio_aache1_stereo_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V2-Stereo",NAME="English-Stereo",LANGUAGE="en",DEFAULT=NO,URI="audio_aache2_stereo.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V2-Stereo",NAME="Spanish-Stereo",LANGUAGE="es",DEFAULT=NO,URI="audio_aache2_stereo_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-5.1",NAME="English-5.1",LANGUAGE="en",DEFAULT=NO,URI="audio_aaclc-5_1.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-5.1",NAME="Spanish-5.1",LANGUAGE="es",DEFAULT=NO,URI="audio_aaclc-5_1_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-Stereo",NAME="English-Stereo",LANGUAGE="en",DEFAULT=NO,URI="audio_aaclc_stereo.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-Stereo",NAME="Spanish-Stereo",LANGUAGE="es",DEFAULT=NO,URI="audio_aaclc_stereo_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AC-3-5.1",NAME="English-Dolby",LANGUAGE="en",CHANNELS="6",DEFAULT=NO,URI="dolby-ac3-5_1.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AC-3-5.1",NAME="Spanish-Dolby",LANGUAGE="es",CHANNELS="6",DEFAULT=NO,URI="dolby-ac3-5_1_es.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,ac-3",RESOLUTION=1280x720,AUDIO="AC-3-5.1".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.29",RESOLUTION=1280x720,AUDIO="AAC-HE-V2-Stereo".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.2",RESOLUTION=1280x720,AUDIO="AAC-LC-Stereo".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.2",RESOLUTION=1280x720,AUDIO="AAC-LC-5.1".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.5",RESOLUTION=1280x720,AUDIO="AAC-HE-V1-Stereo".0
video_720_3000000.m3u8

How does this differ from DASH?

In DASH, demuxed Audio and Video tracks are grouped into separate AdaptationSets for a given period. This means each given Video AdaptationSet is not directly linked to one specific Audio track, but rather the client Player independently picks a Video Representation from the Video AdaptationSet and a Audio Representation from the Audio AdaptationSet. So with DASH, we don’t have to worry about re-stating Video tracks for each group of Audio tracks as they are managed independently of each other.

Additional notes

The video codecs you choose to support may also determine which audio codecs and container formats you use. For example if you encode video to VP9 you may want to consider using vorbis or opus audio codecs.

In this example, we used AC-3 for Dolby Digital 5.1, but you may consider using Enhanced AC-3 or more commonly referred to as E-AC-3 for additional channel support(such as 7.1 or more) or spatial audio support like Dolby Atmos. Other premium surround sound codec options are DTS:HD and DTS:X.

Premium HLS audio example with the Bitmovin Encoder & Manifest Generator

Below linked GitHub sample is a pseudo-code example using the Bitmovin Javascript/Typescript SDK that demonstrates outputting multi-bitrate, multi-codec, multi-channel, and multi-language audio tracks. This can greatly enhance user’s experience as it allows for streaming the best quality and most appropriate audio for each device’s codec support and speaker channel configuration.

With the Bitmovin Encoder, we can use one master (Dolby Digital surround in this example) audio file/stream for each language and easily downmix it to 2.0 stereo or implicitly convert it to AAC 5.1. Then, once we simply create each desired audio track, we will use the Bitmovin Manifest Generator to create our HLS multivariant playlists.

Encoding Example For HLS With Multiple Audio Layers

The post Providing a Premium Audio Experience in HLS with the Bitmovin Encoder appeared first on Bitmovin.

WWDC 2024 HLS Updates for Video Developers

Andy Francis — Mon, 24 Jun 2024 01:14:26 +0000

Apple’s Worldwide Developer Conference is an annual event used to showcase new software and technologies in the Apple ecosystem. It was created with developers in mind, but sometimes new hardware and devices are announced and its keynote presentations have become must-see events for a much wider audience. There is also usually news about changes and additions to the HTTP Live Streaming (HLS) spec and associated video playback APIs. These HLS updates are often necessary to support new features and capabilities of the announced OS and hardware updates. This post will expand on Apple’s “What’s new in HTTP Live Streaming” document, with additional context for the latest developments that content creators, developers, and streaming services should be aware of.

The lastest HLS updates for 2024

The first draft of the HLS spec (draft-pantos-http-live-streaming) was posted in 2009, then superseded by RFC 8216 in 2017. There are usually draft updates published once or twice per year with significant updates and enhancements. A draft proposal was shared on June 7, that details proposed changes to the spec to be added later this year. Let’s look at some of the highlights below.

Updated Interstitial attributes

In May 2021, Apple introduced HLS Interstitials to make it easier to create and deliver interstitial content like branding bumpers and mid-roll ads. Now, new attributes have been introduced for Interstitial EXT-X-DATERANGE tags, aimed at enhancing viewer experience and operational flexibility.

X-CONTENT-MAY-VARY: This attribute provides a hint regarding coordinated playback across multiple players. It can be set to “YES” or “NO”, indicating whether all players receive the same interstitial content or not. If X-CONTENT-MAY-VARY is missing, it will be considered to have a value of “YES”.

X-TIMELINE-OCCUPIES: Determines if the interstitial should appear as a single point “POINT” or a range “RANGE” on the playback timeline. If X-TIMELINE-OCCUPIES is missing, it will be considered to have a value of “POINT”. “RANGE” is expected to be used for ads in live content.

X-TIMELINE-STYLE: Specifies the presentation style of the interstitial—either as a “HIGHLIGHT” separate from the content or as “PRIMARY”, integrated with the main media. If X-TIMELINE-STYLE is missing, it is considered to have a value of “HIGHLIGHT”. The “PRIMARY” value is expected to be used for content like ratings bumpers and post-roll dub cards.

More detail is available in the WWDC Session “Enhance ad experiences with HLS interstitials“.

Example timeline for using HLS Interstitials with new RANGE attribute – source: WWDC 2024

Signal enhancements for High Dynamic Range (HDR) and timed metadata

HDR10+

Previously, the specification had not defined how to signal HDR10+ content in a multi-variant HLS playlist. Now you can use the SUPPLEMENTAL-CODECS attribute with the appropriate format, followed by a slash and then the brand (‘cdm4’ for HDR10+). The example Apple provided shows the expected syntax: SUPPLEMENTAL-CODECS=”hvc1.2.20000000.L123.B0/cdm4″. For a long time, HDR10+ was only supported on Samsung and some Panasonic TVs, but in recent years it has been added by other TV brands and dedicated streaming devices like Apple TV 4K and a few Roku models.

Dolby Vision with AV1

Dolby Vision has been the more popular and widespread dynamic HDR format (compared to HDR10+) and now with Apple adding AV1 decoders in their latest generation of processors, they’ve defined how to signal that content within HLS playlists. They are using Dolby Vision Profile 10, which is Dolby’s 10-bit AV1 aware profile. HLS will now support 3 different Dolby Vision profiles: 10, 10.1 and 10.4. Profile 10 is “true” Dolby Vision, 10.1 is their backward compatible version of HDR10 and 10.4 their backward compatible version of Hybrid Log Gamma (HLG). For profiles 10.1 and 10.4, you need to use a SUPPLEMENTAL-CODECS brand attribute and the correct VIDEO-RANGE. For these, 10.1 should use ‘db1p’ and PQ, and 10.4 should use ‘db4h’ and HLG. The full example codec string they provided is: CODECS=”av01.0.13M.10.0.112″,SUPPLEMENTAL-CODECS=”dav1.10.09/db4h”,VIDEO-RANGE=HLG.

If you’re interested in Apple’s overall AV1 Support, you can find more details in this blog post.

Enhanced timed metadata support

HLS now supports multiple concurrent metadata tracks within Fragmented MP4 files, enabling richer media experiences with timed metadata (‘mebx’) tracks. This will enable new opportunities for integrating interactive elements and dynamic content within HLS streams. .

Metrics and logging advancements

The introduction of the AVMetrics API to AVFoundation will allow developers to monitor performance and playback events. This opt-in interface lets you select which subsets of events to monitor and provides detailed insights into media playback, allowing you to optimize streaming experiences further.

More details are available in the AVFoundation documentation and the WWDC 2024 session “Discover media performance metrics in AVFoundation”.

Common Media Client Data (CMCD) standard integration

HLS now supports the CMCD standard, enhancing Quality of Service (QoS) monitoring and delivery optimization through player and CDN interactions. AVPlayer only implemented the preferred mode of transmitting data via HTTP request headers. They have not included support for all of the defined keys and for now is only supported in iOS and tvOS v18 and above. There was no mention of support in Safari.

Bitmovin and Akamai debuted our joint CMCD solution at NAB 2023. You can learn more in our blog post or check out our demo.

FairPlay content decryption key management

As part of ongoing improvements, HLS is deprecating AVAssetResourceLoader for key loading in favor of AVContentKeySession. AVContentKeySession was first introduced at WWDC 2018 and until now, Apple had been supporting both methods of key loading for content protection in parallel. Using AVContentKeySession promises more flexibility and reliability in content key management, aligning with evolving security and operational requirements. This move means any existing use of AVAssetResourceLoader must be transitioned to AVContentKeySession.

Conclusion

The recent HLS updates show Apple’s commitment to enhancing media streaming capabilities across diverse platforms and scenarios. For developers and content providers, staying updated with these advancements not only ensures compliance with the latest standards but also unlocks new opportunities to deliver compelling streaming experiences to audiences worldwide.

If you’re interested in being notified about all of the latest HLS updates or you want to request features or provide feedback, you can subscribe to the IETF hls-interest group.

The post WWDC 2024 HLS Updates for Video Developers appeared first on Bitmovin.

The Essential Guide to SCTE-35

Andy Francis — Sat, 20 Jan 2024 01:13:24 +0000

Everything you need to know about SCTE-35, the popular event signaling standard that powers dynamic ad insertion, digital program insertion, blackouts and more for TV, live streams and on-demand video.

What is SCTE?

The acronym SCTE is short for The Society of Cable and Telecommunications Engineers. SCTE is a non-profit professional organization that creates technical standards and educational resources for the advancement of cable telecommunications engineering and the wider video industry. When talking about it, you may hear people abbreviate SCTE with the shorthand slang “Scutty”.

SCTE was founded in 1969 as The Society of Cable Television Engineers, but changed its name in 1995 to reflect a broader scope as fiber optics and high-speed data applications began playing a bigger role in the cable tv industry and became the responsibility of its engineers. Currently there are over 19,000 individual SCTE members and nearly 300 technical standards in their catalog, including SCTE-35, which will be the focus of this post.

What is SCTE-35?

SCTE-35 was first published in 2001 and is the core signaling standard for advertising and program control for content providers and content distributors. It was initially titled “Digital Program Insertion Cueing Message for Cable” but recent revisions have dropped “for Cable” as it has proven useful and versatile enough to be extended to OTT workflows and streaming applications. There have been several revisions and updates published to incorporate member feedback and adapt to advancements in the industry, most recently on Nov, 30 2023.

SCTE-35 signals are used to identify national and local ad breaks as well as program content like intro/outro credits, chapters, blackouts, and extensions when a live program like a sporting event runs long. Initially, these messages were embedded as cue tones that dedicated cable tv hardware or equipment could pick up and enable downstream systems to act on. For modern streaming applications, they are usually included within an MPEG-2 transport stream PID and then converted into metadata that is embedded in HLS and MPEG-DASH manifests.

SCTE-35 markers and their applications for streaming video

While SCTE-35 markers are primarily used for ad insertion in OTT workflows, they can also signal many other events that allow an automation system to tailor the program output for compliance with local restrictions or to improve the viewing experience. Let’s take a look at some common use cases and benefits of using SCTE-35 markers.

Use cases and benefits of SCTE-35

Ad Insertion – As mentioned above, inserting advertisements into a video stream is the main use case for SCTE-35 markers. They provide seamless splice points for national, local and individually targeted dynamic ad replacement. This allows for increased monetization opportunities for broadcasters and content providers by enabling segmenting of viewers into specific demographics and geographic locations. When ad content can be tailored for a particular audience, advertisers are willing to pay more, leading to higher revenue for content providers and distributors.

Ad Break Example.
source: SCTE-35 specification

Program boundary markers – Another common use case is to signal a variety of program boundaries. This includes the start and end of programs, chapters, ad breaks and unexpected interruptions or extensions. Many of these become particularly useful in Live-to-VOD scenarios. Ad break start/end markers can be used as edit points in a post-production workflow to automate the removal of ads for viewers with ad-free subscriptions. A program end marker can be used to trigger the next episode in a series for binge viewing sessions if so desired. All of these markers open new possibilities for improving the user experience and keeping your audience happy and engaged.

Blackouts and alternate content – Another less common, but important use case is to signal blackouts, when a piece of content should be replaced or omitted from a broadcast. This often applies to regional blackouts for sporting events. Respecting blackout restrictions is crucial for avoiding fines and loss of access to future events. Using SCTE-35 allows your automation system to take control and ensure you are compliant.

Workflow example with Program Boundaries and Blackouts.
source: SCTE-35 specification

Types of SCTE-35 markers

SCTE-35 markers are delivered in-band, meaning they are embedded or interleaved with the audio and video signals. There are five different command types defined in the specification. The first 3 are legacy commands: splice_null(), splice_schedule() and splice_insert(), but splice_insert() is still used quite often. The bandwidth_reservation() command may be needed in some satellite transmissions, but the most commonly used command with modern workflows is time_signal(). Let’s take a closer look at the 2 most important command types, splice_insert and time_signal.

splice_insert commands

Splice_insert commands are used to mark splice events, when some new piece of content like an ad should be inserted in place of the program or a switch from an ad break back into the main program. Presentation time stamps are used to note the exact timing of the splice, enabling seamless, frame accurate switching.

time_signal commands

Time_signal commands can also be used to insert new content at a splice point, but together with segmentation descriptors, they can handle other use cases like the program boundary markers mentioned above. This enables the segmenting and labeling of content sections for use by downstream systems.

Using SCTE-35 markers in streaming workflows

MPEG-2 transport streams

In MPEG-2 transport streams, SCTE markers are carried in-band on their own PID within the transport stream mux. These streams are usually used as contribution or backhaul feeds and in most cases are not directly played by the consumer. They may be delivered over dedicated satellite or fiber paths or via the public internet through the use of streaming protocols like SRT or proprietary solutions like Zixi.

HLS

The Bitmovin Live Encoder supports a range of different HLS tags that are written when SCTE-35 triggers are parsed from the MPEG-TS input stream. Multiple marker types can be enabled for each HLS manifest. Which marker types to use depends on the consumer of the HLS manifest. An example consumer would be a Server Side Ad Insertion (SSAI) service. They usually state in their documentation which HLS tags they support for signaling SCTE-35 triggers.

EXT_X_CUE_OUT_IN: Ad markers will be inserted using #EXT-X-CUE-OUT and #EXT-X-CUE-IN tags.

EXT_OATCLS_SCTE35: Ad markers will be inserted using #EXT-OATCLS-SCTE35 tags. They contain the base64 encoded raw bytes of the original SCTE-35 trigger.
EXT_X_SPLICEPOINT_SCTE35: Ad markers will be inserted using #EXT-X-SPLICEPOINT-SCTE35 tags. They contain the base64 encoded raw bytes of the original SCTE-35 trigger.
EXT_X_SCTE35: Ad markers will be inserted using #EXT-X-SCTE35 tags. They contain the base64 encoded raw bytes of the original SCTE-35 trigger.
EXT_X_DATERANGE: Ad markers will be inserted using #EXT-X-DATERANGE tags as specified in the HLS specification. They contain the ID, start timestamp, and hex-encoded raw bytes of the original SCTE-35 trigger.

Example HLS manifest with Cue Out, duration and Cue In tags:

#EXTINF:4.0,
2021-07/video/hls/360/seg_18188.ts
#EXT-X-CUE-OUT:120.000
…
#EXTINF:4.0,
2021-07/video/hls/360/seg18218.ts
#EXT-X-CUE-IN

Example HLS manifest using EXT-OATCLS-SCTE35 tag with base64 encoded marker:

#EXTINF:4.0,
2021-07/video/hls/360/seg_18190.ts
#EXT-OATCLS-
SCTE35:/DBcAAAAAAAAAP/wBQb//ciI8QBGAh1DVUVJXQk9EX+fAQ5FUDAxODAzODQwMDY2NiEEZAIZQ1VFSV0JPRF/3wABLit7AQVDMTQ2NDABAQEKQ1VFSQCAMTUwKnPhdcU=

Note: You can copy the base64 encoded marker above, (beginning with the first / after SCTE35: ) and paste it into this payload parser to see the full message structure.

MPEG-DASH

In MPEG-DASH streams, SCTE-35 defined breaks and segments are added as new periods to the .mpd file.



   



    
   
     
       
               /DAlAAAAAAAAAP/wFAUAAAAEf+/+kybGyP4BSvaQAAEBAQAArky/3g==

With SCTE messages embedded in the stream, various forms of automation can be triggered, whether it’s server or client-side ad insertion, content switching, interactive elements in the application or post-production processing.

Bitmovin Live Encoding SCTE Support

SCTE message pass-through and processing

Bitmovin supports the parsing of SCTE-35 triggers from MPEG-TS input streams for Live Encodings, the triggers are shown below as splice decisions. The triggers are then mapped to HLS manifest tags.

Splice Decisions

Certain SCTE-35 triggers signal that an advertisement or break (to from the original content starts or ends. The following table describes how the Bitmovin Live Encoder treats SCTE-35 trigger types and SCTE-35 Segmentation Descriptor types as splice decision points, and the compatibility of those types with the different command types, Spice Insert and Time Signal.

✓= Supported

= Not currently supported

Segmentation UPID Type (Start/End)	Descriptor Type Name	SPLICE_INSERT	TIME_SIGNAL
–		✓
0x10, 0x11	PROGRAM	✓
0x20, 0x21	CHAPTER	✓
0x22, 0x23	BREAK	✓	✓
0x30, 0x31	PROVIDER_ADVERTISEMENT	✓	✓
0x32, 0x33	DISTRIBUTOR_ADVERTISEMENT	✓	✓
0x34, 0x35	PROVIDER_PLACEMENT_OPPORTUNITY	✓	✓
0x36, 0x37	DISTRIBUTOR_PLACEMENT_OPPORTUNITY	✓	✓
0x40, 0x41	UNSCHEDULED_EVENT	✓
0x42, 0x43	ALTERNATE_CONTENT_OPPORTUNITY	✓
0x44, 0x45	PROVIDER_AD_BLOCK	✓
0x46, 0x47	DISTRIBUTOR_AD_BLOCK	✓
0x50, 0x51	NETWORK	✓

Live cue point insertion API

In addition to the SCTE-35 pass-through mode, Bitmovin customers can insert new ad break cue points in real-time, using live controls in the user dashboard or via API. These can be inserted independently of existing SCTE-35 markers in the input stream and may be useful for live events when the time between ads is variable depending on breaks in the action. This allows streamers that don’t have SCTE-35 markers embedded in their source to take advantage of the same downstream ad insertion systems for increased monetization.

API Call:

POST /encoding/encodings/{encoding_id}/live/scte-35-cue
    {
      "cueDuration": 60, // duration in seconds between cue tags (ad break length)
    }

The #EXT-X-CUE-OUT tag will be inserted into the HLS playlist, signaling the start and duration of a placement opportunity to the DAI provider. Based on the cueDuration and the segment length, the #EXT-X-CUE-IN tag will be inserted after the configured duration and the ad opportunity will end, continuing the live stream.

HLS manifest with Cue Out, duration and Cue In tags inserted via the API call above:

#EXTINF:4.0,
    2021-07-09-13-18-34/video/hls/360_500/segment_18188.ts
    #EXT-X-CUE-OUT:60.000
    #EXTINF:4.0,
    2021-07-09-13-18-34/video/hls/360_500/segment_18189.ts
    ...
    #EXTINF:4.0,
    2021-07-09-13-18-34/video/hls/360_500/segment_18203.ts
    #EXT-X-CUE-IN
    #EXTINF:4.0,
    2021-07-09-13-18-34/video/hls/360_500/segment_18204.ts

Want to get started using SCTE-35 in your streaming workflow? Get in touch to let us know how we can help.

Resources

Tutorial: Bitmovin Live Encoding with SCTE-35, HLS and SSAI

Guide: Bitmovin Live Encoding and AWS MediaTailor for SSAI

Guide: Bitmovin Live Encoding with Broadpeak.io for SSAI

SCTE website

SCTE-35 specification

SCTE-35 payload parser

Bitmovin Live Encoding data sheet

The post The Essential Guide to SCTE-35 appeared first on Bitmovin.

Streamlining Video Playback: Unveiling Bitmovin’s Player SDK for Flutter

Adam Massaro — Mon, 09 Oct 2023 15:19:59 +0000

Developing mobile applications, particularly with video streaming included, can present significant challenges depending on how it’s done. Smaller development teams or those without extensive video technical expertise may find it straining and affect their deployment/launch timeline. Traditionally, creating professional applications for both iOS and Android requires experienced developers proficient in the native code languages. However, with Flutter, development processes are streamlined, taking some of the heavy lifting off of the team and eliminating the need for device and platform-specific experts. Along with the Flutter framework, dedicated player Software Development Kits (SDKs) are essential in helping to optimize deployment and enable a consistent user experience across a wide range of devices.

In this blog, we will do a deep dive into Flutter, showcasing what it is, its benefits and drawbacks, its use cases, and more, along with how Bitmovin’s dedicated Player SDK for Flutter plays a significant role in making it easier to stream video in applications.

What is Flutter?

Flutter is a UI software toolkit created by Google that has gained much traction with its user-friendly functionality. It is an open-source framework made for cross-platform development, so developers can use it to build apps with a native-like experience on different devices, such as Android, iOS, and Web. You can read more about it in our other blog on the 5 Ways React Native & Flutter Can Simplify Video Streaming Workflows.

Flutter Development UI workflow

What are the benefits and drawbacks of Flutter for app development and video streaming?

Like any technology, Flutter has its pros and cons. Understanding these can help developers make informed decisions when choosing Flutter for their video streaming workflows.

Benefits

Cross-Platform Development
- Flutter was literally built for this purpose, allowing developers to write code once and use it across multiple platforms. This helps drive a faster time-to-market as teams can reduce development time and effort.
App Performance
- Known to be highly performant, applications built with Flutter are compiled directly into native code, offering better performance than other hybrid solutions.
Hot Reload
- To check out app updates in real-time, developers can utilize Flutter’s hot reload feature, which enables developers to see changes without losing the current application state. This helps speed up the development process and make it more dynamic, as modifications can be made as needed.
Customizable UI
- With many widgets and extensive UI customization options, Flutter provides developers the tools to create better, visually appealing interfaces.
Strong Community Support
- Since it has a robust and active community thanks to it being developed by Google, Flutter provides developers with access to numerous resources, libraries, tools, and the ability to ask questions directly to the community if/when needed.

Drawbacks

Limited Libraries
- Although Flutter’s library support is growing, it’s still not as extensive as older frameworks such as React Native. Also, depending on the solution and tool, they might not support Flutter, limiting developers and forcing them to take additional time and effort for implementation.
Large File Size
- Flutter apps tend to have a larger file size than native apps, which could affect the download and installation process, especially for users with limited device storage.
Learning Curve
- While Dart is potentially easier for developers familiar with JavaScript or Java to pick up, it’s less widely used than other programming languages, which could make the learning curve higher and the language harder to grasp.

How Flutter compares to other Frameworks

Regarding cross-platform mobile app development, React Native is another popular choice for developers that you can read more about in Bitmovin Launches Support for React Native. It, along with the native Android and iOS frameworks, have advantages and potential drawbacks.

React Native

React Native was developed by Facebook and allows developers to build mobile apps using JavaScript and React. Like Flutter, it provides features such as hot reloading and offers access to plenty of plugins created and used by its large community and third-party providers. However, compared to Flutter, React Native may fall short in performance as it uses a JavaScript bridge to communicate with native modules, which can slow down an app’s performance.

Traditional Native Framework Development

To develop apps natively on specific platforms, teams will need expertise in that specific native development language. For Android, this would mean Java or Kotlin, and for iOS, Objective-C or Swift. Native apps perform better as they are built specifically for that platform in its code, have a more natural user experience, and have access to all device features. However, the apps will only be able to be used for that platform and have no cross-platform capabilities, which can increase development time and cost. Additionally, teams must maintain both codebases, which can further strain development resources.

In comparison, Flutter offers a balanced mix of high performance, rapid development, and cost efficiency, making it a viable choice for many developers and businesses.

Which industries and use cases does Flutter fit well with?

Depending on a company’s specific needs, Flutter can be utilized across any industry, especially when it involves streaming video. These industries and use cases include:

E-commerce
- Develop engaging shopping experiences showcasing products with video, customizable widgets, and animations.
Social Media
- With its cross-platform development and rich UI components, Flutter is ideal for building interactive social media apps.
Education & eLearning
- Create interactive and user-friendly eLearning apps with high-quality video, enhancing the learning experience for users.
Entertainment & OTT
- Build out high-performance applications for video playback and a seamless user experience across devices.
Health & Fitness
- Fitness apps can be created with a range of features such as video workouts, live sessions, health tracking, and more.
Religion and House of Worship
- Enables apps to connect congregations, facilitate donations, and offer seamless video streaming experiences across devices.
News and Publishing
- Flutter fits well with news organizations as it can facilitate real-time updates, multimedia integration, and seamless video streaming capabilities.
Online Events
- With user-friendly and interactive interfaces and the ability to stream high-quality video content directly to users, online event apps benefit from Flutter.
Esports and Gaming
- Flutter enables gaming communities to connect with real-time updates and seamless video streaming, enhancing the gaming experience.

Community and Support

Flutter has a growing developer community and as it’s affiliated with Google, it gets a good amount of attention. With access to plenty of tutorials, libraries, tools, and other items, development teams can leverage multiple resources as well as tap into community knowledge to overcome app and video streaming challenges. The community support, ensuring platform updates, bug fixes, and performance optimizations. Additionally, Google and others host regular events that include topics on Flutter or are focused solely on it, giving developers opportunities to learn, network, and stay up to date on the latest Flutter trends and updates.

For more on Flutter and video streaming, check out Bitmovin’s developer community, which focuses on video workflow aspects and questions on deploying Bitmovin’s solutions across devices.

Getting Started with Bitmovin’s Flutter SDK

Bitmovin’s dedicated Flutter SDK is an open-source wrapper for our native mobile SDKs, making integrating our Player into Flutter apps built for iOS and Android devices easier. We’re focused on simplifying the streaming process and making our existing developer-friendly APIs available for Flutter. The SDK offers a range of features that concentrate on streamlining deployment for developers and delivering the highest quality of experience for viewers during video playback.

Bitmovin’s dedicated Flutter Player SDK

These features include live and on-demand video playback, UI customization, adaptive streaming, content protection with DRM integration, and more, reducing the time it takes to get to market and helping development teams focus on other items for their apps. You can access it all on our dedicated Flutter GitHub repository.

Conclusion

Flutter has emerged as a robust and efficient framework for building high-quality video streaming apps in the ever-evolving landscape of video streaming. Its cross-platform capabilities, performance, and customizable UI make it an ideal choice for developers. Moreover, with dedicated SDKs like Bitmovin’s Flutter SDK, developers can further optimize their video streaming workflows, deliver a superior viewing experience, and bring their apps to market faster.

Whether you’re a seasoned developer or just starting, test out the Bitmovin Player across all the devices you want to cover, especially on Flutter, by signing up for our 30-day free trial. Trial users also get complete access to our other solutions, such as VOD and Live Encoding, Analytics, and Streams.

The post Streamlining Video Playback: Unveiling Bitmovin’s Player SDK for Flutter appeared first on Bitmovin.

Unlocking the Highest Quality of Experience with Common-Media-Client-Data (CMCD) – What Is It and What Are the Benefits

Daniel Weinberger — Thu, 14 Sep 2023 15:23:14 +0000

As video workflows get more detailed, companies face numerous challenges in delivering a seamless viewing experience to their audiences. One of the biggest hurdles is the ability to make sense of disjointed sets of information from different points in the video delivery workflow. When a client experiences buffering or other playback issues, it can be difficult to pinpoint the root cause within a workflow. Do You rack your brain wondering if it’s a problem with the manifest, the client’s Adaptive Bitrate (ABR) algorithm, or the Content Delivery Network (CDN)? To create a clearer picture for streaming platforms and the CDNs delivering the content, this is where Common-Media-Client-Data (CMCD) comes into play.

What is CMCD and Why is it Important?

CMCD is an open specification and tool developed by the Web Application Video Ecosystem (WAVE) project launched by the Consumer Technology Association (CTA). Its focus is to allow media players to communicate data back to CDNs during video streaming sessions. It provides a standardized protocol for exchanging information between the client and the CDN, bridging the gap between client-side quality of experience (QOE) metrics and server-side quality of service (QOS) data. By providing the transmission of this detailed data and information, CMCD-enabled video streaming services can facilitate better troubleshooting, optimization, and dynamic delivery adjustments by CDNs.

With CMCD, media clients can send key-value pairs of data to CDNs, providing valuable insights into the streaming session. This data includes information such as encoded bitrate, buffer length, content ID, measured throughput, session ID, playback rate, and more. By capturing and analyzing this data, CDNs can gain a deeper understanding of the client’s streaming experience and make informed decisions to improve performance and address any issues.

What data is tracked and how is data sent and processed with CMCD?

The data points for CMCD are thorough, giving you the detailed metrics you need to verify your viewer’s experience along with how to optimize it. The metrics include:

Encoded bitrate
Buffer length
Buffer starvation
Content ID
Object duration
Deadline
Measured throughput
Next object request
Next range request
Object type
Playback rate
Requested maximum throughput
Streaming format
Session ID
Stream type
Startup
Top bitrate

There are three common methods for sending CMCD data from the client to the CDN: custom HTTP request headers, HTTP query arguments, or JSON objects independent of the HTTP request. The choice of method depends on the player’s capabilities and the CDN’s processing requirements and could also differ by platform. In browsers, HTTP query arguments are preferred over HTTP request headers as headers would cause OPTIONS requests in addition to see if the CDN allows the usage of these headers, adding additional round-trip times. Other platforms like Android don’t have this limitation.

It is recommended to sequence the key-value pairs in alphabetical order to reduce the fingerprinting surface exposed by the player. Additionally, including a session ID (sid) and content ID (cid) with each request can aid in parsing and filtering through CDN logs for specific session and content combinations.

The Role of CMCD in Video Streaming Optimization

CMCD plays a crucial role in optimizing video streaming by enabling comprehensive data analysis and real-time adjustments. Combining client-side data with CDN logs, CMCD allows for the correlation of metrics and the identification of issues that affect streaming performance. This holistic view empowers CDNs to take proactive measures to address buffering, playback stalls, or other quality issues.

With CMCD, CDNs can segment data based on Live and Video on Demand (VOD) content, monitor CDN performance, identify specific subscriber sessions, and track the journey of media objects from the CDN to the player and screen. This level of insight enables CDNs to optimize content delivery, manage bandwidth allocation, and ensure a smooth and consistent streaming experience for viewers.

Adoption of CMCD in the Industry

Akamai and Bitmovin CMCD Workflow

The adoption and implementation of CMCD in video workflows are still developing. Many in the video streaming industry are evaluating it at the moment but haven’t made significant moves. However, there are notable players in the market who have taken the lead in incorporating CMCD into their platforms. One such example is Akamai, a prominent CDN provider. Akamai has been actively working on CMCD in collaboration with the Bitmovin Player.

Live Demo

Together, Akamai and Bitmovin have developed a demo presenting the capabilities and benefits of CMCD. The demo shows how CMCD data can be sent by the Bitmovin Player to the CDN.

What are the benefits of CMCD and how can it be implemented on the Bitmovin Player?

As listed above, there are clear benefits to implementing CMCD for video playback. Some of the benefits of CMCD that can be achieved with the Bitmovin player are:

Troubleshooting errors and finding root causes faster
- CMCD makes Player sessions visible in CDN logs so you can trace error sessions through the Player and CDN to quickly find the root cause, reducing the cost associated with users experiencing errors on your platform.
Combine Playback sessions and CDN logs with common session & content identifiers
- Improve your operational monitoring by giving a clearer view of content requests from Player and how those are handled by the CDN.
Improve the quality of experience and reduce rebuffering by enabling pre-fetching
- Through CMCD, the CDN is aware of the Player’s current state and the content it most likely needs next. This allows the CDN to prepare and deliver the next packet the Player needs faster, reducing the time your viewers are waiting.
Integration with Bitmovin’s Analytics
- Monitor every single user session and gain granular data on audience, quality, and ad metrics that ensure a high quality of experience for viewers while helping you pinpoint error sessions rapidly with CMCD data.

As Bitmovin is continuing to explore CMCD’s capabilities, we’ve made it easy to set up and deploy into video workflows through our Github. If you’re wondering how it should be working or want to see it before taking the steps to implement it, you can check out our Bitmovin Web Player Samples.

Additionally, if you have any questions or have any feedback on our experience using it, join our Bitmovin Developer community and comment on the running dialog around our CMCD implementation.

Future Implications and Industry Outlook

While CMCD is still in its early stages of adoption, its potential impact on the video streaming industry is significant. As more embrace CMCD, the ability to gather and analyze comprehensive data will become a standard practice and its benefits will become increasingly evident. This data-driven approach will enable continuous improvements in streaming performance and video workflows. This was a major reason that we at Bitmovin took this project on as transparency is key and CMCD makes the issues easier to find and address, increasing viewer and client satisfaction.

Interest in CMCD will continue to grow with new implementations and use cases, leading the industry to realize the gains from reducing buffering and delivering better, streams to viewers. Our partnership with Akamai is just one step in how we are committed to advancing video streaming technology for content providers and providing a seamless viewing experience for audiences worldwide.

The post Unlocking the Highest Quality of Experience with Common-Media-Client-Data (CMCD) – What Is It and What Are the Benefits appeared first on Bitmovin.

Completing the WebRTC Playback Experience – Enabling Rewind During Real-Time Live Streams

Jacob Arends — Tue, 12 Sep 2023 17:18:27 +0000

Live streaming has solidified its role as a pivotal component of modern video workflows, enabling platforms and media companies to captivate audiences with that sense of witnessing events as they happen. This trend has gained even greater momentum during and after the pandemic, as users craved live experiences that spanned a variety of interests – from sports enthusiasts catching their favorite games to at-home yoga classes on fitness platforms or students enrolling in online courses. To meet this demand, users sought the closest thing to real-time immersion, and this is where WebRTC came into play (pun intended).

What is WebRTC?

WebRTC is an open-source streaming technology that enables real-time data transport, whether that’s video, audio, or other data channels. Initially developed by Google in 2011, it has found widespread adoption in various industries that benefit from real-time communication, such as video conferencing, education, and gaming. The open-source, peer-to-peer technology allows for end-to-end encryption over an HTTPS connection and is compatible with all major browsers and platforms. The use of WebRTC skyrocketed through the use of video conferencing tools that have continued to grow in popularity since the pandemic, as well as in online gaming, where thousands of avid viewers could engage with their favorite content at near real-time latency.

Where does WebRTC fit in the OTT streaming industry?

The OTT streaming industry is currently dominated by 2 streaming protocols: Dash & HLS. However, DASH & HLS are not ideal for achieving lower latency live streaming. Typically, viewers experience between 8-30 seconds of latency due to the need to download segments before playback, meaning the closer to the live edge, the more potential for issues with video buffers and ABR (adaptive bitrate) decisions.

WebRTC takes streaming services a step further by enabling real-time (sub-second latency) streaming experiences. For live events, such as sports or education, it allows services to provide an opportunity for interactivity and contribution without fear of introducing latency in data transfer. Unlike DASH and HLS, WebRTC does not buffer; it prioritizes low latency so viewers can be assured that what they are seeing is happening in real-time.

To summarize the key benefits of WebRTC:

Ultra-Low Latency – WebRTC enables sub-second playback, ideal for live events, online gaming, and other interactive applications.
Cross-Platform Compatibility – WebRTC is supported by all major web browsers and platforms, ensuring broad compatibility and ease of adoption.
End-to-end Encryption – WebRTC incorporates robust security features, including end-to-end encryption, which ensures the privacy and security of communications.
Open Source – WebRTC benefits from a growing developer community that collaborates and innovates to bring continuous improvement to the technology.

However, WebRTC’s benefits come with a drawback: the inability to rewind or start the event from the beginning. This limitation affects many industries and their applications that require content review or replay, particularly in sports, where users want the ability to review and relive key moments.

What industries does WebRTC affect with this issue?

As streaming technology evolves and viewer expectations shift, low and real-time latency become more important, along with the ability to go back and see what they saw a few seconds before. This major playback feature affects many of the industries where real-time streaming is already crucial to the viewer experience, including:

Sports Broadcasting and Betting – Viewers often want to rewatch critical moments, goals, or plays during a live event, which can also affect micro-betting and in-game wagering.

Live selling and auctions – Buyers may want to check what was said about the product or previous items that were listed, requiring the need to browse back through the stream.

Webinars and Conferences – Webinars and virtual conferences may involve important presentations and discussions that can’t be revisited.
Gaming – Fans like to watch gameplay, or players can strategize by rewinding and analyzing previous actions.
Live Events and Performances – Live events, such as concerts or theater performances, need to provide instant replays of key moments or highlights.
Online Education – Students may need to rewind and review parts of a lecture or lesson for better understanding.
Emergency Services and Video Surveillance – Being able to analyze real-time video footage is crucial for making informed decisions and investigations.
Telemedicine – Medical professionals may need to go back to previous portions of a patient’s session to make accurate diagnoses or treatment recommendations.

This list highlights the importance of considering the specific requirements of an application when choosing a streaming technology. To address the replay/rewind issue, Bitmovin and Dolby.io collaborated to build a solution to enable these industries and use cases to dramatically improve the playback experience their viewers want and demand.

How we developed it – Dolby.io x Bitmovin Hackathon Project

During Bitmovin’s quarterly Hackathon in August 2023, Bitmovin engineers partnered with the team at Dolby.io to achieve the following objective:

Create a single live video player experience with real-time streaming and full rewind/review capabilities.

What tools did we use?

Bitmovin’s Player enables countless viewers to experience top-quality playback on all devices across the globe. With its rich feature set, streaming services can deliver their unique experience without compromising on quality.

Bitmovin’s Live Encoder is a resilient live streaming software platform that takes RTMP, SRT, or Zixi inputs and outputs to HLS and DASH for delivery to digital streaming services. Paired with Bitmovin CDN for delivery and storage.

Dolby.io’s Real-time Streaming (formerly Millicast) delivers a WebRTC-based CDN for large-scale streaming that is fast, easy, and reliable for delivering real-time video.

Videon EdgeCaster EZ Encoder is a portable appliance that brings cloud functionality on premises with LiveEdge. In this way, it combines the flexibility of software encoders with the power and reliability of hardware solutions. Regular software updates ensure support for the most advanced features and the latest industry standards.

What did we do?

Workflow diagram showing the source journey from Videon Edgecaster, to Dolby.io & Bitmovin Live Encoder, to Bitmovin Player

Using a Videon Edgecaster to create a dual RTMP output of a live source input, one RTMP output was delivered to Dolby.io’s service to create a real-time WebRTC stream, while the other was delivered to Bitmovin’s Live Encoder to create a standard Live HLS stream.

Dolby.io’s Real-time Streaming service accepts SRT, RTMP, and WHIP/WebRTC, making it easy to convert broadcast-grade streams into WebRTC for sub-second distribution around the globe and at scale.

The stream URLs from both Dolby.io and Bitmovin Live Encoder are now available to the demo page hosting the Bitmovin Player. From here, the player can then choose to load the Dolby.io stream as a WHEP/WebRTC source or the Bitmovin Live Encoder stream as a Live HLS source.

The Bitmovin Player’s open-source UI framework and extensive developer-friendly APIs allow development teams to create unique experiences. So, for the viewer experience, when the user selects the ‘LIVE’ control in the player UI and moves playback to the live edge, they would be viewing the WHEP/WebRTC source from Dolby.io. The user could then drag the timeline marker backward or use the custom “skip” control configured to timeshift back 30 seconds, in which case they would be viewing the live HLS source from the Bitmovin Live Encoder.

This gives the viewer the option to view their content in real-time with full review capability right back to the beginning of the live session. Additionally, by using Dolby.io’s Simulcasting solution, the viewer experience is always at the highest available quality, with advanced ABR logic working for both sources.

Example of how playback on the Bitmovin Player works with Dolby.io

What’s Next?

At Bitmovin, we are currently evaluating official support for WebRTC in the Bitmovin Player. While we’ve been able to address key playback issues, there is room for improvement and clear steps to elaborate on this very successful skunk-works project with Dolby.io. For example, we did not extend the project to use accurate timing information from the segments (like `prft` boxes) or playlists, so the solution could be more accurate and adaptive in understanding where the live edge of the live HLS stream was in comparison to the live encoding time to correctly synchronize with the real-time WebRTC stream. Using the Bitmovin Live Encoder, we could also extend the solution to include live-to-VOD workflows to allow users to watch the replay of a live event after it has ended or even reuse the content while a live event is still running.

Bitmovin and Dolby.io will continue the alliance to address market needs for live workflows where real-time streaming can provide an opportunity for services to enhance their viewers’ experience.

The post Completing the WebRTC Playback Experience – Enabling Rewind During Real-Time Live Streams appeared first on Bitmovin.

Everything you need to know about Apple’s new Managed Media Source

Daniel Weinberger — Tue, 20 Jun 2023 20:15:26 +0000

At their 2023 Worldwide Developer conference, Apple announced a new Managed Media Source API. This post will explain the new functionality and improvements over prior methods that will enable more efficient video streaming and longer battery life for iOS devices. Keep reading to learn more.

Background and the “old” MSE
New Managed Media Source in Safari 17
Airplay with MMS
Migration from MSE to MMS
Next Steps

Background and the “old” MSE

The first internet videos of the early 2000s were powered by plugins like Flash and Quicktime, separate software that needed to be installed and maintained in addition to the web browser. In 2010, HTML5 was introduced, with its tag that made it possible to embed video without plugins. This was a much simpler and more flexible approach to adding video to websites, but had some limitations. Apple’s HTTP Live Streaming (HLS) made adaptive streaming possible, but developers wanted more control and flexibility than native HLS offered, like the ability to select media or play DRM-protected content. In 2013, the Media Source Extension (MSE) was published by the W3C body, providing a low-level toolkit that gave more control for managing buffering and resolution for adaptive streaming. MSE was quickly adopted by all major browsers and is now the most widely used web video technology…except for on iPhones. MSE has some inefficiencies that lead to greater power use than native HLS and Apple’s testing found that adding MSE support would have meant reducing the battery life, so all the benefits of MSE have been unavailable on iPhone…until now.

New Managed Media Source in Safari 17

With MSE, it can be difficult to achieve the same quality of playback possible with HLS, especially with lower power devices and spotty network conditions. This is partly because MSE transfers most control over the streaming of media data from the User Agent to the application running in the page. But the page doesn’t have the same level of knowledge or even goals as the User Agent, and may request media data at any time, often at the expense of higher power usage. To address those drawbacks and combine the flexibility provided by MSE with the efficiency of HLS, Apple created a new Managed Media Source API (MMS).

Advantages of Managed Media Source over MSE.
Image source: WWDC23 presentation

The new “managed” MediaSource gives the browser more control over the MediaSource and its associated objects. It makes it easier to support streaming media playback on mobile devices, while allowing User Agents to react to changes in memory usage and networking capabilities. MMS can reduce power usage by telling the webpage when it’s a good time to load more media data from the network. When nothing is requested, the cellular modem can go into a low power state for longer periods of time, increasing battery life. When the system gets into a low memory state, MMS may clear out buffered data as needed to reduce memory consumption and keep operations of the system and the app stable. MMS also tracks when buffering should start and stop, so the browser can detect low buffer and full buffer states for you. Using MMS will save your viewers bandwidth and battery life, allowing them to enjoy your videos for even longer.

Airplay with MMS

One of the great things about native HLS support in Safari is the automatic support for AirPlay that lets viewers stream video from their phone to compatible Smart TVs and set top boxes. Airplay requires a URL that you can send, but that doesn’t exist in MSE, making them incompatible. But now with MMS, you can add an HLS playlist to a child source element for the video, and when the user AirPlays your content, Safari will switch away from your Managed Media Source and play the HLS stream on the AirPlay device. It’s a slick way to get the best of both worlds.

Code snippet for adding AirPlay Support with Managed Media Source. Image source: WWDC23 presentation

Migration from MSE to MMS

The Managed Media Source is designed in a backwards compatible way. This means that changing the code from creating a MediaSource object to creating a ManagedMediaSource object after checking if the API is available is the first step:

function getMediaSource() {
    if (window.ManagedMediaSource) {
        return new window.ManagedMediaSource();
    }
    if (window.MediaSource) {
        return new window.MediaSource();
    }

    throw “No MediaSource API available”;
}
const mediaSource = getMediaSource();

As the MMS supports all methods the “old” MSE does, this is all to get you started, but doesn’t unleash the full power of this new API. For that, you need to handle different new events:

mediaSource.addEventListener(“startstreaming”, onStartStreamingHandler);

The startstreaming event indicates that more media data should now be loaded from the network.

mediaSource.addEventListener(“endstreaming”, onStopStreamingHandler);

The endstreaming event is the counterpart of startstreaming and signals that for now no more media data should be requested from the network. This status can also be checked via the streaming attribute on the MMS instance. On devices like iPhones (once fully available) and iPad, requests that follow these two hints benefit from the fast 5G network and allows the device to get into low power mode in between request batches.

In addition, the current implementation also offers hints about a preferred, suggested quality to download. The browser suggests if a high, medium, or low quality should be requested. The user agent may base this on facts like network speed, but also additional details like user settings about enabled data saver modes. This can be read from the MMS instance’s quality property and any change is signaled via an qualitychange event:

mediaSource.addEventListener(“qualitychange”, onQualityChangeHandler);

It remains to be seen if the quality hint will still be available in the future as it offers some risk of fingerprinting.

As the MMS may remove any date range at any given time (as opposed to the MSE’s behavior where this could only happen during the process of appending data), it is strongly recommended to check if the data needed next is still present or needs to be re-downloaded.

Next Steps

Managed Media Source is already available in the current Safari Tech Preview on macOS and Safari 17 on iPadOS 17 beta and can be enabled as an experimental feature on iOS 17 beta. Once generally available on iOS, without being an experimental feature, this will finally bring lots of flexibility and choices to Safari, other browsers, and Apps with WebViews on iOS. It would even be possible to finally support DASH streams on iOS, while keeping web apps power efficient.

Apple has already submitted the proposal of the Managed Media Source API to the World Wide Web Consortium (W3C), which is under discussion and might lead to an open standard other browser vendors could adopt.

Bitmovin will be running technical evaluations to fully explore and understand the benefits of MMS, including how it performs in various real-world environments. We will closely follow the progress from Apple and consider introducing support for MMS into our Web Player SDK once it advances from being an experimental feature on iOS. Stay tuned!

If you’re interested in more detail, you can watch the replay of the media formats section from WWDC23 here and read the release notes for Safari 17 beta and iOS & iPadOS 17 beta. You can also check out our MSE demo code and our blog about developing a video player with structured concurrency.

The post Everything you need to know about Apple’s new Managed Media Source appeared first on Bitmovin.

Low Latency vs. Target Latency: Why there isn’t always a need for speed

Igor Oreper — Wed, 04 Jan 2023 10:14:59 +0000

Low latency has been a hot topic in video streaming for a while now. It’s the trendy keyword you hear at every trade show throughout the year; it even ranks high in our annual Video Developer Report as one of the biggest headaches for brands to achieve or the one they are very interested in deploying.

However, despite the huge amount of conversations low latency generates, it’s also one of the most difficult terms to define. This is because each use case could require playback delay to be in the hundreds of milliseconds (ultra-low latency) to a span of 1-5 seconds (low latency), meaning your perception of what low latency is can differ significantly from another’s point of view. Additionally, low latency is limiting because there is a high probability you’re sacrificing quality for a fast video startup time. You are also likely to pay higher prices and work with multiple vendors to get the specific hardware or software you need to facilitate each step of the video workflow.

Furthermore, not every video streaming service needs low latency, even though it’s constantly requested by startups to enterprise-level businesses. A better question may not be “How do I minimize my live stream’s delay?” but “What is the target latency I want my audience to have?” Target latency is a feature not mentioned often and one we will explore in this blog, as it can make a world of difference to the playback experience you’re offering your viewers.

Back to basics – What are Low and Target Latency, and how are they achieved?

If you are unfamiliar with low latency, it essentially refers to minimizing the delay between a live on-site production of an event and a specific viewer watching it over the Internet. Standard HLS and DASH streams have a delay of 8 to 30 seconds, depending on stream settings and a particular viewer’s streaming environment (e.g., the protocol used, buffer size, bandwidth connection, device, and location). For a stream to be considered low latency, it can’t have more than 5 seconds of broadcast delay, with some workflows needing as low as a few hundred milliseconds for ultra-low latency, as stated above. There are several ways to achieve this very low broadcast delay, each with its benefits and costs. However, all methods available in the market today are not standardized, and they all require each piece of your video supply chain to support a chosen low-latency streaming technology, from the live encoder and packager to the CDN and player. This is important as it drives costs and limits your flexibility in selecting a best-of-breed technology stack.

On the other hand, target latency is a predefined time delay so the entire audience can watch the same stream simultaneously. The stream is not affected by the likely differences between individual viewers’ circumstances, meaning that everyone in that group can experience the same live event at the same time or very close to it. This stream synchronization can be achieved by choosing a specific buffer size across the target audience and managing playback to a target delay while attempting to cater to viewers who represent the lowest common denominator (e.g., slowest to fill buffer). You can set the target latency directly in the Bitmovin Player using the targetLatency property, enabling you to design the user experience as you want.

How do both affect the viewer experience?

The benefits of low latency revolve around getting viewers their content fast, similar to broadcast speeds, which helps make them feel more connected with the live event. Live sports is an excellent example of where low latency plays a prominent role in the viewing experience. It helps combat the “Noisy Neighbor Effect,” where your audience can be negatively affected when seeing notifications or hearing cheers from neighbors when something happens before they see it on their screen. This also applies to real-time betting, which requires a stream to be available in ultra-low latency to see real-time results. Low latency is also critical for live seminars, esports, fitness classes, and many other live interactive use cases to help keep your audience engaged and up-to-date with what’s happening at that moment.

The biggest downside of the available low latency solutions is that they do not permit players to buffer enough content, which leads to playback interruptions when streaming conditions are less than ideal (e.g., poor wifi, an ISP problem, device performance). This alone can quickly lead to slow video starts, rebuffering, decreased stream quality, and other performance issues, creating a terrible experience for the user.

Any video streaming service can use target latency in a way that minimizes any downside to the viewer experience. This is because you can set the delay for a consistent experience for the entire audience or for a predefined audience, ensuring your viewers will have a better quality of experience due to increased stream stability and control during playback. For example, if you offer a second screen experience like a chat feature within the live event, target latency will keep everyone at the same live point so that it feels more like a live event. The only potential downside of the target latency solution is for viewers who may be using different video streaming services, which may cause them to be at different live points relative to their neighbors.

What does this mean for a business’s bottom line?

Pricing concerns are one of the top priorities when evaluating what is best for your business. From each part of your setup to the encoding and bandwidth requirements, low-latency workflows have the potential to be more expensive. This is because each component of your video supply chain must support the low-latency streaming technology you choose and can potentially expand to multiple ones if you’re offering low-latency streaming across different platforms (e.g., iOS and Android). Due to the complexity, it can take numerous vendors and a lot of integration for you to achieve low latency needs. This is a fundamental challenge as high costs inevitably limit realizing these capabilities, especially in tough economic times.

Target latency, on the other hand, requires only client-side software changes, so implementation and operational costs are relatively low, as you won’t need to buy and integrate specialized components.

Wrapping up

Reduced latency of 8-10 seconds is already achievable for most video streaming services today using standardized HLS and DASH protocols which already support a broad range of devices compared to (ultra) low latency solutions. Video streaming services should carefully consider the real-world pros and cons of (ultra) low latency vs. target latency solutions as they continue to push the limits in delivering the best viewer experience to their audiences.

The post Low Latency vs. Target Latency: Why there isn’t always a need for speed appeared first on Bitmovin.

Player Version V7 & V8 is Chromecast HLS Compatible with Enhanced DRM Support

Reinhard Grandl — Tue, 01 Sep 2020 09:10:14 +0000

Since the release of player version v5.2, Bitmovin has improved support for HLS streams, including playback on Chromecast and enhanced DRM handling. This support carries across to the latest implementation – Web SDK v8

Chromecast HLS Playback

After introducing HLS streaming to our HTML5 based player, we took the next step and ported it to Chromecast making our HLS streams Chromecast compatible. We have been supporting MPEG-DASH on Chromecast for a while and it has been very well adopted by our customers. Why did we go the extra mile to port our own HTML5/JS-based implementation of an HLS player to Chromecast instead of using the existing Media Player Library (MPL)? Features! Using the HLS support of Chromecast’s MPL might be sufficient for some use-cases, but we wanted to empower our customers to make use of all of the great features from the desktop and mobile player, such as support of separated audio and video tracks, subtitles, and comprehensive API just to mention a few.

Enhanced Configuration Options for DRM

Player version v5.2 also features some important improvements to our DRM support. The player is now capable of interpreting DRM initialization information, usually present in the PSSH box of a segment, given in the manifest file instead. This makes our support for DRM-protected content even more versatile and increases our encoder compliance further.
In addition, we introduced a configuration object that allows the specification of advanced options of the DRM key system, such as distinctiveIdentifier or persistentState. More information about possible configuration options can also be found in our HTML5 Player configuration documentation.
We also introduced the support of HLS segments, not starting with key-frames, improved the startup performance and added additional events and API calls.

What’s Next?

For the next player version, we have planned another major step in the area of increased subtitle support with full WebVTT enablement. The latest improvement to the v8 Web Player includes better Edge browser support. As we see advertising capabilities getting more and more important for many of our customers, we will also extend our VAST support and introduce VPAID to the Bitmovin Video Player. So stay tuned.
To learn more about the Bitmovin Video Player please check out the following pages

The post Player Version V7 & V8 is Chromecast HLS Compatible with Enhanced DRM Support appeared first on Bitmovin.

Optimal Adaptive Streaming Formats MPEG-DASH & HLS Segment Length

Stefan Lederer — Thu, 09 Apr 2020 08:35:06 +0000

One of the first questions when starting with adaptive streaming formats such as MPEG-DASH or HLS is how long do you generate the used media segments of the content. The segmentation of the content is necessary, as this enables the switching between the different video/audio qualities during the streaming session. The following figure gives a short overview of that process where multiple qualities of a video are encoded, chunked into segments, and requested by the streaming client/player.

However, the question for the optimal segment length is not easy to answer and depends on the environment (fixed access vs. mobile users), the content (premium vs. non-premium/UGC), e.g. short segments are good to adapt quickly to bandwidth changes and prevent stalls, but longer segments may have a better encoding efficiency and quality, and last but not least, also webserver/CDN configurations, such as enabled/disabled HTTP1.1/persistent connections.
So, let’s have a look at this topic in more detail: We did a detailed analysis of this topic based on different evaluations and datasets, which helps you to understand the influencing factors of the segment length decision and which provides you an indication of optimal segment lengths for your content and use case.

Typical DASH and HLS Chunk Sizes

For the following detailed evaluation of segment sizes, we created a dataset which is encoded and multiplexed using different segment sizes, ranging from 2 seconds (i.e., Microsoft Smooth Streaming) to 10 seconds per segment (recommended by Apple HTTP Streaming) with some steps in between and at the lower and higher end, which results in the sizes of the segments of 1, 2, 4, 6, 10 and 15 seconds, which we took as the basis for the following evaluations.

Segment Length Decision: Encoding Efficiency and Quality?

To enable seamless switching between the different quality representations of adaptive streaming formats such as HLS or DASH, it is required to maintain fixed I-frame positions in the video, e.g., after 48 frames, an I-frame has to be set in a 24 frames-per-second (FPS) video and a segment length of two seconds. This is necessary to guarantee I-frames at the beginning of each segment, which is needed to be able to switch representations between different segments. By doing so at the beginning of a new segment, the decoder does not need any references to previous frames or segments and therefore the new segment can have frames in different resolutions, bitrates, or framerates. Fixed I-frame positions can be achieved by restricting the group-of-picture (GOP) size of the encoder to the desired segment size of the content. As a consequence, from the encoding point of view, smaller segment sizes have a disadvantage because of the higher number of segments in the final encoding, and due to this, there are also more I-frames needed to guarantee representation switching at the segment boundaries. This leads to a lower encoding efficiency because I-frames, which cannot leverage temporal prediction, need more bits for encoding than predicted (P-) frames and so the overall quality of the content gets worse in comparison to conventional encoding at the same bitrate such as used for HTTP progressive download or segments with longer segment sizes. This problem is well-known and needs to be considered in content generation for adaptive HTTP streaming.
As a consequence of this lower encoding performance introduced by the fixed GOP sizes, the following evaluation demonstrates the effect of the different segment sizes on the encoding quality in terms of PSNR. This table shows the PSNR values for different segment sizes and provides evidence that this needs to be considered in the evaluation process for the segment sizes of adaptive HTTP streaming systems.

Segment Length (GOP-Frames)	1 sec. (24)	2 sec. (48)	4 sec. (96)	6 sec. (144)	10 sec. (240)	15 sec. (360)
300kbit	35,83	36,51	36,98	37,14	37,31	37,31
1200kbit	38,24	39,78	40,02	40,02	40,10	40,17

As shown, small segment sizes can affect the overall quality of the content by up to 1,5 dB of the PSNR value. However, the influence of this effect is reduced significantly by an increase in segment size. As shown in the following figure, segment sizes with lengths smaller than two seconds perform very poorly. In combination with other factors such as network characteristics shown in the following evaluation, such small segments (e.g. 1-second segment length) should generally be avoided.

Segment Length Decision: Avoiding Stalls, Streaming Performance and Web Server/CDN Configuration

From a network/internet perspective, there are also a lot of influencing factors that have to be considered. E.g. longer segment lengths may cause stalls using wireless internet connection with high bandwidth changes, but short segment lengths may result in poor streaming performance due to overhead produced by requests and the influence of the network delay. To investigate this, we built up an evaluation environment to emulate standard Internet connections, in order to show the impact of the segment size of adaptive streaming content, as well as other factors such as HTTP server configuration (e.g. allowing persistent connections). For this purpose, a standard HTTP Web server was used to enable persistent HTTP 1.1-compliant connections as well as non-persistent HTTP 1.0-compliant connections. We also emulated the network characteristics of a last-mile (e.g., ADSL) Internet connection and added a network delay of 150 ms for this evaluation.
The optimal segment size of the given network configuration scenario for both cases, with and without the usage of HTTP1.1/persistent connections, was evaluated. For this purpose, the performance results of the 1, 2, 4, 6, 10, and 15-second segment length versions of Big Buck Bunny of the dataset were analyzed and interpolated to a graph showing the performance of the segment sizes in terms of effective media throughput. As shown in the following figure, the optimal segment size for this network setting would be between 2 and 3 seconds if one uses web servers/CDNs using HTTP 1.1 persistent connections, and between 5 and 8 seconds without using them (e.g. using HTTP 1.0). The effective media throughput of the optimal segment lengths of both configurations differs only by about 50 kbit/s. The reason why the effective media throughput does not improve when increasing the segment size is that the available bandwidth in the evaluation setting changes over time. When longer segments are used, the client is not able to adjust as flexibly and quickly as it would be possible with shorter segments and therefore the overall bitrate deteriorates for longer segment lengths. On the other hand, the influence of the network delay (RTT) increases when using smaller segment lengths. This especially affects the non-persistent/HTTP1.0 connection results, because in this case there is one round-trip-time (RTT) required for establishing the TCP connection to the server after each segment. But also the persistent connection/HTTP1.1 results suffer from the influence of the delay when using very small segments, which is visible in the version with a segment length of one second in the following figure. In this case, half of the RTT necessary for requesting the segment becomes significant and the average throughput decreases.

CONCLUSIONS

Based on the results of these evaluations, as well as our experiences from customer deployments, Bitmovin would recommend using DASh or HLS chunk sizes of around 2 to 4 seconds, which is a good compromise between encoding efficiency and flexibility for stream adaption to bandwidth changes. It is also recommended to use Web servers and CDNs that enable persistent HTTP connections, as this is an easy and cost-effective way to increase streaming performance. Thus, in doing so, the effective media throughput and QoS can be increased without any changes to the client’s implementation, by simply choosing the right segment length.
We hope this blog post helps you when creating your content with the optimal segment length for your use case. If you have further questions on this, please do not hesitate to contact us. You can also have a look at our support section including tips on encoding, the Bitmovin Player in general and analytics.

Encode MPEG-DASH & HLS Content

Encode your content with the same technology as Netflix and YouTube in a way that it plays everywhere with low startup delay and no buffering with the Bitmovin Cloud Encoding Service.
Best regards,
Stefan from the Bitmovin Team!
[Free Download: Video Developer Report 2020 – Key insights into the evolving technology trends of the digital video industry]
Follow me on Twitter: @slederer
Follow Bitmovin on Twitter: @bitmovin

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform?A comprehensive guide to Video on Demand (VOD)
Video Technology [2022]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2022 Guide]

The post Optimal Adaptive Streaming Formats MPEG-DASH & HLS Segment Length appeared first on Bitmovin.

HLS – Bitmovin

Providing a Premium Audio Experience in HLS with the Bitmovin Encoder

Introduction

Basic audio stream packaging

Audio/Video demuxing

Multiple audio bitrates

More efficient AAC

A note on grouping audio renditions

Surround sound audio

A note on downmixing from 5.1 audio sources

Don’t forget about grouping audio renditions

Multi-language audio

How does this differ from DASH?

Additional notes

Premium HLS audio example with the Bitmovin Encoder & Manifest Generator

WWDC 2024 HLS Updates for Video Developers

The lastest HLS updates for 2024

Updated Interstitial attributes

Signal enhancements for High Dynamic Range (HDR) and timed metadata

HDR10+

Dolby Vision with AV1

Enhanced timed metadata support

Metrics and logging advancements

Common Media Client Data (CMCD) standard integration

FairPlay content decryption key management

Conclusion

The Essential Guide to SCTE-35

Everything you need to know about SCTE-35, the popular event signaling standard that powers dynamic ad insertion, digital program insertion, blackouts and more for TV, live streams and on-demand video.

Table of Contents

What is SCTE?

What is SCTE-35?

SCTE-35 markers and their applications for streaming video

Use cases and benefits of SCTE-35

Types of SCTE-35 markers

splice_insert commands

time_signal commands

Using SCTE-35 markers in streaming workflows

MPEG-2 transport streams

HLS

MPEG-DASH

Bitmovin Live Encoding SCTE Support

SCTE message pass-through and processing

Splice Decisions

Live cue point insertion API

Resources

Streamlining Video Playback: Unveiling Bitmovin’s Player SDK for Flutter

What is Flutter?

What are the benefits and drawbacks of Flutter for app development and video streaming?

Benefits

Drawbacks

How Flutter compares to other Frameworks

React Native

Traditional Native Framework Development

Which industries and use cases does Flutter fit well with?

Community and Support

Getting Started with Bitmovin’s Flutter SDK

Conclusion

Unlocking the Highest Quality of Experience with Common-Media-Client-Data (CMCD) – What Is It and What Are the Benefits

What is CMCD and Why is it Important?

What data is tracked and how is data sent and processed with CMCD?

The Role of CMCD in Video Streaming Optimization

Adoption of CMCD in the Industry

Live Demo

What are the benefits of CMCD and how can it be implemented on the Bitmovin Player?

Future Implications and Industry Outlook

Completing the WebRTC Playback Experience – Enabling Rewind During Real-Time Live Streams

What is WebRTC?

Where does WebRTC fit in the OTT streaming industry?

What industries does WebRTC affect with this issue?

How we developed it – Dolby.io x Bitmovin Hackathon Project

What tools did we use?

What did we do?

What’s Next?

Everything you need to know about Apple’s new Managed Media Source

Background and the “old” MSE

New Managed Media Source in Safari 17

Airplay with MMS

Migration from MSE to MMS

Next Steps

Low Latency vs. Target Latency: Why there isn’t always a need for speed