Andy Francis – Bitmovin

WWDC 2024 HLS Updates for Video Developers

Andy Francis — Mon, 24 Jun 2024 01:14:26 +0000

Apple’s Worldwide Developer Conference is an annual event used to showcase new software and technologies in the Apple ecosystem. It was created with developers in mind, but sometimes new hardware and devices are announced and its keynote presentations have become must-see events for a much wider audience. There is also usually news about changes and additions to the HTTP Live Streaming (HLS) spec and associated video playback APIs. These HLS updates are often necessary to support new features and capabilities of the announced OS and hardware updates. This post will expand on Apple’s “What’s new in HTTP Live Streaming” document, with additional context for the latest developments that content creators, developers, and streaming services should be aware of.

The lastest HLS updates for 2024

The first draft of the HLS spec (draft-pantos-http-live-streaming) was posted in 2009, then superseded by RFC 8216 in 2017. There are usually draft updates published once or twice per year with significant updates and enhancements. A draft proposal was shared on June 7, that details proposed changes to the spec to be added later this year. Let’s look at some of the highlights below.

Updated Interstitial attributes

In May 2021, Apple introduced HLS Interstitials to make it easier to create and deliver interstitial content like branding bumpers and mid-roll ads. Now, new attributes have been introduced for Interstitial EXT-X-DATERANGE tags, aimed at enhancing viewer experience and operational flexibility.

X-CONTENT-MAY-VARY: This attribute provides a hint regarding coordinated playback across multiple players. It can be set to “YES” or “NO”, indicating whether all players receive the same interstitial content or not. If X-CONTENT-MAY-VARY is missing, it will be considered to have a value of “YES”.

X-TIMELINE-OCCUPIES: Determines if the interstitial should appear as a single point “POINT” or a range “RANGE” on the playback timeline. If X-TIMELINE-OCCUPIES is missing, it will be considered to have a value of “POINT”. “RANGE” is expected to be used for ads in live content.

X-TIMELINE-STYLE: Specifies the presentation style of the interstitial—either as a “HIGHLIGHT” separate from the content or as “PRIMARY”, integrated with the main media. If X-TIMELINE-STYLE is missing, it is considered to have a value of “HIGHLIGHT”. The “PRIMARY” value is expected to be used for content like ratings bumpers and post-roll dub cards.

More detail is available in the WWDC Session “Enhance ad experiences with HLS interstitials“.

Example timeline for using HLS Interstitials with new RANGE attribute – source: WWDC 2024

Signal enhancements for High Dynamic Range (HDR) and timed metadata

HDR10+

Previously, the specification had not defined how to signal HDR10+ content in a multi-variant HLS playlist. Now you can use the SUPPLEMENTAL-CODECS attribute with the appropriate format, followed by a slash and then the brand (‘cdm4’ for HDR10+). The example Apple provided shows the expected syntax: SUPPLEMENTAL-CODECS=”hvc1.2.20000000.L123.B0/cdm4″. For a long time, HDR10+ was only supported on Samsung and some Panasonic TVs, but in recent years it has been added by other TV brands and dedicated streaming devices like Apple TV 4K and a few Roku models.

Dolby Vision with AV1

Dolby Vision has been the more popular and widespread dynamic HDR format (compared to HDR10+) and now with Apple adding AV1 decoders in their latest generation of processors, they’ve defined how to signal that content within HLS playlists. They are using Dolby Vision Profile 10, which is Dolby’s 10-bit AV1 aware profile. HLS will now support 3 different Dolby Vision profiles: 10, 10.1 and 10.4. Profile 10 is “true” Dolby Vision, 10.1 is their backward compatible version of HDR10 and 10.4 their backward compatible version of Hybrid Log Gamma (HLG). For profiles 10.1 and 10.4, you need to use a SUPPLEMENTAL-CODECS brand attribute and the correct VIDEO-RANGE. For these, 10.1 should use ‘db1p’ and PQ, and 10.4 should use ‘db4h’ and HLG. The full example codec string they provided is: CODECS=”av01.0.13M.10.0.112″,SUPPLEMENTAL-CODECS=”dav1.10.09/db4h”,VIDEO-RANGE=HLG.

If you’re interested in Apple’s overall AV1 Support, you can find more details in this blog post.

Enhanced timed metadata support

HLS now supports multiple concurrent metadata tracks within Fragmented MP4 files, enabling richer media experiences with timed metadata (‘mebx’) tracks. This will enable new opportunities for integrating interactive elements and dynamic content within HLS streams. .

Metrics and logging advancements

The introduction of the AVMetrics API to AVFoundation will allow developers to monitor performance and playback events. This opt-in interface lets you select which subsets of events to monitor and provides detailed insights into media playback, allowing you to optimize streaming experiences further.

More details are available in the AVFoundation documentation and the WWDC 2024 session “Discover media performance metrics in AVFoundation”.

Common Media Client Data (CMCD) standard integration

HLS now supports the CMCD standard, enhancing Quality of Service (QoS) monitoring and delivery optimization through player and CDN interactions. AVPlayer only implemented the preferred mode of transmitting data via HTTP request headers. They have not included support for all of the defined keys and for now is only supported in iOS and tvOS v18 and above. There was no mention of support in Safari.

Bitmovin and Akamai debuted our joint CMCD solution at NAB 2023. You can learn more in our blog post or check out our demo.

FairPlay content decryption key management

As part of ongoing improvements, HLS is deprecating AVAssetResourceLoader for key loading in favor of AVContentKeySession. AVContentKeySession was first introduced at WWDC 2018 and until now, Apple had been supporting both methods of key loading for content protection in parallel. Using AVContentKeySession promises more flexibility and reliability in content key management, aligning with evolving security and operational requirements. This move means any existing use of AVAssetResourceLoader must be transitioned to AVContentKeySession.

Conclusion

The recent HLS updates show Apple’s commitment to enhancing media streaming capabilities across diverse platforms and scenarios. For developers and content providers, staying updated with these advancements not only ensures compliance with the latest standards but also unlocks new opportunities to deliver compelling streaming experiences to audiences worldwide.

If you’re interested in being notified about all of the latest HLS updates or you want to request features or provide feedback, you can subscribe to the IETF hls-interest group.

The post WWDC 2024 HLS Updates for Video Developers appeared first on Bitmovin.

Everything you need to know about Apple AV1 Support

Andy Francis — Thu, 13 Jun 2024 14:46:40 +0000

This post was originally published in Sept 2023. It has been updated several time with the latest news and developments, most recently on June 13, 2024 with information about Apple’s AV1 Dolby Vision support.

Apple made waves across the video encoding and streaming communities when they announced the iPhone 15 Pro and 15 Pro Max would have a dedicated AV1 hardware decoder, making them the first Apple devices with official AV1 codec support. We’ve compiled all the details from their announcement, the HLS interest group, and product release notes to bring you everything you need to know about Apple AV1 codec support. If you’re looking for more information about AV1 playback on Android, Smart TVs and set-top boxes, you can find more information at https://bitmovin.com/av1-playback-support/. Otherwise, keep reading to learn more!

Hints that Apple AV1 support was coming
iPhone 15 Pro announcement
More details about HDR, DRM, HLS and Safari support for AV1
Apple M3 processor announcement
Apple M4 processor iPad announcement
AV1 Software Decoding Support?
Ready to take advantage of AV1 Encoding?
Related links

Hints that Apple AV1 support was coming

Prior to the iPhone 15 announcement in September 2023, there were several indications that Apple would eventually support AV1. Back in 2018, Apple joined the Alliance for Open Media, the organization responsible for creating and promoting AV1 encoding and many took it as a sign that Apple would eventually support AV1. More recently, updates to Apple’s AVFoundation core media framework showed the addition of a new global variable “kCMVideoCodecType_AV1“, and earlier in 2023, the Safari 16.4 Beta release notes actually showed AV1 support was coming, but it was removed without comment shortly after and never added to Safari 16. AV1 WebCodecs support did eventually become available as an experimental option in the Safari Technology Preview, but enabling it didn’t seem to have any effect.

Still with all of these hints being dropped, the announcements of Apple’s M series of processors and the most recent update to the HLS draft specification in May 2023 all came and went with no mention of AV1. Everyone who was paying close attention and anticipating Apple AV1 support was left disappointed, especially knowing how much weight their decision carried for the rest of the streaming ecosystem. Overall AV1 adoption has been slower than many had hoped and expected, and Apple’s lack of support was often cited as a reason to wait and avoid updating video encoding stacks.

iPhone 15 Pro announcement

This all changed on September 12, 2023, when Apple announced their new A17 Pro mobile processor would include support for AV1 hardware decoding. You can watch the full replay here, with the section about the 15 Pro’s new processor beginning at 1:01:20. VP of the Apple Silicon Engineering Group, Sribalan Santhanam presented the new A-series processor and shared details about the industry’s first 3 nm chip, including a 6-core CPU and a new Pro-class, 6-core GPU. It also has a 16-core neural engine that can process up to 35 trillion operations per second and run machine learning models on the device, without sending personal data to the cloud. It also includes a dedicated engine for Apple’s own ProRes codec in addition to the big one for video streaming services, the AV1 hardware decoder.

Block diagram of Apple’s A17 Pro chip, highlighting dedicated AV1 decoder – Image source: Apple iPhone 15 Pro announcement

“We also included a dedicated AV1 decoder, enabling more efficient and high-quality video experiences for streaming services.”

Sribalan Santhanam – VP, Apple Silicon Engineering Group

More details about HDR, DRM, HLS and Safari support for AV1

After the presentation, co-author of the HLS specification Roger Pantos shared more details via the hls-interest mailing list. He confirmed that indeed, that both the iPhone 15 Pro and 15 Pro Max would be the first Apple devices with hardware decoding support for AV1 video content. The dedicated hardware meant that in addition to Standard Dynamic Range (SDR) content, it would also support High Dynamic Range (HDR10) as well as content that was protected by FairPlay Streaming DRM, things that software decoders typically cannot handle well or securely. Playback would be supported in Apple’s native AVPlayer or AVSampleBufferDisplayLayer, including using Media Source Extensions (MSE), or Managed Media Source (MMS) as Apple calls their new version, under an experimental setting on iOS Safari.

HLS playback of AV1 will work without any new signaling requirements, just the regular CODEC and VIDEO-RANGE attributes. The SCORE attribute can also be used to force the playback client to prefer AV1 over other encodings, but renditions encoded with AVC and/or HEVC should still be included for older devices and AirPlay support. The WebKit blog provided more information about Safari 17.0, confirming support for the AV1 video codec was added on devices with hardware decoding support. They also shared this html code snippet for presenting single-file progressive video that has been encoded with AV1, HEVC and VP9, which allows the browser to choose the best option for playback. It should be noted that outside of very short clips, adaptive streaming with HLS is preferred over progressive streaming in order to provide the best quality of experience and bandwidth efficiency.

html snippet for multi-codec progressive video with AV1, HEVC and VP9 – Image source: webkit.org blog

The ‘type’ attribute signals the type of container being used and ‘codecs’ parameter string lets the browser know which codec was used and other characteristics like profile, level, color space, bit depth and dynamic range. This informs the browser and lets it decide whether it supports those attributes or needs to fall back on an older codec. It’s also possible to use a simpler codecs=”av01”, but it’s best to provide as much detail as possible if you can. More information on the AV1 codecs parameter string from the Alliance for Open Media can be found here, and details about codec and profile parameters are available in this IETF doc.

While not directly related to the Apple AV1 news, Safari 17.0 also added a new media player stats overlay similar to YouTube’s “stats for nerds”. This is a nice addition for video developers doing any troubleshooting and will be very helpful as people begin experimenting with adding AV1 encoding. It’s available to anyone who checks the “Show features for web developers” box in the advanced settings of Safari.

New Media stats overlay feature available in Safari 17.0 – Image source: webkit.org blog

Apple M3 processor announcement

In late October 2023, Apple announced their newest generation of desktop processors would include AV1 hardware decoders. This includes the M3, M3 Pro and M3 Max chips, meaning all new models of Macbooks, iMacs and desktop computers with an M3 processor will support AV1 video playback. Some were disappointed that the M3 did not also include support for AV1 encoding, but for video playback, the decoding is all that really matters, so this will be another nice wave of new devices that streaming services can target with AV1 encoded video.

Apple’s new M3 family of processors with AV1 decoding support (Source: Apple)

Apple M4 processor iPad announcement

Announced in May 2024, the new iPad Pro is powered by Apple’s latest system on a chip, the M4. The media engine of the M4 supports multiple codecs, including H.264, HEVC, ProRes and now AV1, making it the most advanced media processor ever in an iPad. With this, Apple continues their march toward full AV1 support. Will the Vision Pro 2 be next?

Apple AV1 Dolby Vision Support

Usually around the time of Apple’s World Wide Developer Conference there are some new updates or features around HLS and AVPlayer. During WWDC24, Apple shared a “What’s new in HTTP Live Streaming 2024” doc with several interesting new additions. For AV1 specifically, they called out support for using Dolby Vision Profile 10, which is Dolby’s 10-bit AV1 aware profile. Apple now supports 3 different Dolby Vision profiles: 10, 10.1 and 10.4. Profile 10 is “true” Dolby Vision, 10.1 is their backward compatible version of HDR10 and 10.4 their backward compatible version of Hybrid Log Gamma (HLG). For profiles 10.1 and 10.4, you need to use a SUPPLEMENTAL-CODECS attribute and the correct VIDEO-RANGE. For these, 10.1 should use ‘db1p’ and PQ, and 10.4 should use ‘db4h’ and HLG. The full example codec string they provided is: CODECS=”av01.0.13M.10.0.112″,SUPPLEMENTAL-CODECS=”dav1.10.09/db4h”,VIDEO-RANGE=HLG.

AV1 Software Decoding Support?

When Apple released the iPhone 6s with the A9 chip, it became the first iOS device to support HEVC(H.265) hardware decoding, which included support for FairPlay Streaming with HEVC. When this happened, they also included an HEVC software decoder as part of the next iOS and macOS updates for older devices without hardware support. While the software decoding didn’t support FairPlay Streaming, it was still a big boost for HEVC support and was one of the first things we wondered about after seeing the AV1 decoder announcement.

Unfortunately when asked, Roger Pantos shared that Apple would not be shipping an AV1 video software decoder at this time. He did confirm that iOS 17 does include some AV1 codec support, but only for still images using the Alliance for Open Media’s AVIF format. For now, we can only hope that AV1 video software decoding (like Meta is already using in their iOS apps) will be coming soon.

Screenshot comparing H.264, VP9 and AV1 video codec quality for low bandwidth streams. Source: Meta Engineering Blog

Ready to take advantage of AV1 Encoding?

Bitmovin has been ready for AV1 adoption to spread for some time now, dating back to 2017 when we partnered with Mozilla to enable AV1 playback in the Firefox browser using the Bitmovin Player. We’ve added AV1 codec support to our Per-Title and 3-pass encoding optimizations and just recently made AV1 encoding available in our dashboard UI, so now you can perform your first AV1 encode without any code, API calls, or configuration necessary! Bitmovin’s AV1 encoding has supported DASH streaming together with Widevine content protection for a long time, but we’ve now also added support for fMP4 in HLS playlists together with FairPlay content protection to take advantage of Apple AV1 support for premium content. It’s also available in our free trial, so there’s never been a better time to check it out and begin taking advantage of the bandwidth savings and quality improvements that AV1 can provide.

Bitmovin Dashboard Encoding Configuration with new AV1 video codec support

Click here to start your free trial today!

Read the latest info about our AV1 playback support and device testing here.
Learn how using Bitmovin’s Per-Title Encoding together with AV1 can let you stream 4K video at bitrates that had been limited to Standard Definition with older codecs.
Check out our AV1 hub and download our datasheet to learn all about the codec’s development, performance and how it can lower your CDN costs.

The post Everything you need to know about Apple AV1 Support appeared first on Bitmovin.

New Firefox AV1 support for Encrypted Media Extensions

Andy Francis — Thu, 30 May 2024 01:12:17 +0000

This post covers some recent updates, focusing on the new Firefox AV1 support in Encrypted Media Extensions. Bitmovin has been supporting and advocating for use of the AV1 codec for several years, even though there have been gaps in playback support preventing adoption for some workflows. Slowly but surely, those gaps are being filled and the reasons not to use AV1 are going away. Keep reading to learn more.

Firefox 125 adds support for encrypted AV1

A couple of years ago, Bitmovin began testing several different combinations of AV1 encoding, muxing and DRM support across browsers and playback devices. We were somewhat surprised to learn that even though Firefox was the first major browser to support AV1 playback, they had not implemented support for encrypted AV1 as they had for other codecs. We found there was actually an open bug/request filed 5 years ago.

Shortly after we began watching closely, there was an update…

Ouch. Once the ticket got reassigned, Bitmovin got involved and gave our feedback that for premium/studio content, this support would be needed soon. We also provided a Widevine-protected sample for them to use in testing. Fast-forward to this spring, we saw some action on the ticket and support for AV1 with Encrypted Media Extensions was officially added to Firefox 125!

This means premium content workflows can now use AV1 on all of the major desktop browsers. Apple added support to Safari last fall, including with FairPlay Streaming, but for now it’s limited to devices with AV1 hardware decoders (iPhone 15 Pro, iPad Pro, new Macs with M3 processors).

Previous Bitmovin and Firefox AV1 collaboration

Way back in 2017, before the AV1 spec was finalized, Bitmovin and Firefox collaborated on the first HTML5 AV1 playback. Because the bitstream was still under development and subject to change, Bitmovin and Mozilla agreed on a common codec string to ensure compatibility between the version in the Bitmovin encoder and the decoder in Mozilla Firefox. It was made available in Mozilla’s experimental development version, Firefox Nightly, for users to manually enable.

Even earlier in 2017, Bitmovin demonstrated the first broadcast quality AV1 live stream at NAB, winning a Best of Show award from Streaming Media Magazine.

Other recent AV1 playback updates

Android adds dav1d decoder

In March 2024, VideoLAN’s “dav1d” became available to all Android devices running Android 12 or higher. Apps need to opt-in to using AV1 for now, but according to Google, most devices can at least keep up with software decoding of 720p 30fps video. YouTube initially opted to begin using dav1d on devices without a hardware decoder, but may have reverted that decision, likely due to battery concerns on phones. For plug-in Android devices, dav1d is still a great option and a welcome addition to the ecosystem.

iPad Pro gets AV1 playback support with M4 processor

In early May 2024, Apple continued their march toward full AV1 support with the announcement of their new M4 chip, which will power the new iPad Pro. The Media Engine of M4 is the most advanced to come to iPad, supporting several popular video codecs, like H.264, HEVC, and ProRes, in addition to AV1.

Ready to get started with AV1?

Bitmovin has added AV1 codec support to our Per-Title and 3-pass encoding optimizations and made AV1 encoding available in our dashboard UI, so now you can perform your first AV1 encode without any code, API calls, or configuration necessary! Bitmovin’s AV1 encoding has supported DASH streaming together with Widevine content protection for a long time, but we’ve now also added support for fMP4 in HLS playlists together with FairPlay content protection to take advantage of Apple AV1 support for premium content. It’s also available in our free trial, so there’s never been a better time to check it out and begin taking advantage of the bandwidth savings and quality improvements that AV1 can provide.

Website: Bitmovin’s AV1 hub

Blog: State of AV1 Playback Support

Blog: Everything you need to know about Apple’s AV1 Support

Blog: 4K video at SD bitrates with AV1

The post New Firefox AV1 support for Encrypted Media Extensions appeared first on Bitmovin.

The State of AV1 Playback Support: 2024

Andy Francis — Thu, 16 May 2024 14:51:10 +0000

This post was originally published in October 2022. It has been updated with new developments, most recently on May 16, 2024 with news about Apple’s iPad AV1 decoder and Firefox encrypted media extensions support.

In this post, I’ll be taking a look at the current state of AV1 playback support, covering which browsers, mobile devices, smart TVs, consoles and streaming sticks are compatible with the AV1 codec right now. I’ll also touch on some of the incredible bandwidth savings companies like Netflix are seeing with AV1 and detail the latest announcements, rumors and speculation around future AV1 playback support.

AV1: The Story So Far (2017-2023)

Back in 2017, Bitmovin debuted the world’s first AV1 live encoding at the NAB Show in Las Vegas, earning a Best of NAB award. While it was an exciting proof of concept at the time, AV1 playback support was extremely limited and large-scale production usage wouldn’t come until years later. In 2020, YouTube and Netflix began delivering AV1 to the first compatible Android devices, and last year Netflix shared details about their expanded use of AV1 for 4K streams.

Netflix also published a report that showed over the course of one month in early 2022, 21% of their streamed content benefited from the most recent improvements in codec efficiency, like Per-Title optimized AV1 and HEVC. They estimated that without those improvements, total Netflix traffic globally would have been around 24% higher, proving that you can see massive bandwidth and overall cost savings by encoding just a portion of your most popular content with AV1.

Apple adds AV1 hardware decoding support to iPhone 15 Pro and new Macbooks

Many of us who have been tracking the adoption and progress of AV1 were disappointed when the announcements for Apple’s M-series processors over the past couple years did not include AV1 hardware decoding support. But on September 12, 2023, the big moment we’ve been waiting for finally arrived when Apple announced that the A17 Pro chip in their new iPhone 15 Pro would include a dedicated AV1 decoder. This is a big line in the sand for Apple and for the wider industry and will hopefully prove to be the day that revitalized interest and momentum for AV1 adoption across the industry.

Apple A17 Pro chip in iPhone 15 Pro with dedicated AV1 decoder

“We also included a dedicated AV1 decoder, enabling more efficient and high-quality video experiences for streaming services.”

Sribalan Santhanam – VP, Apple Silicon Engineering Group

After the presentation, co-author of the HLS spec Roger Pantos shared more details via the hls-interest mailing list:

The iPhone 15 Pro (both screen sizes) will be the first Apple product to support hardware decode of AV1 content. This includes SDR, HDR10, and content protected by FairPlay Streaming, played back through either AVPlayer or AVSampleBufferDisplayLayer (including MSE on Safari).

There is no new signaling necessary for HLS, just the regular content-specific values for the CODECS and VIDEO-RANGE attributes in the MVP. If you wish, you can use the SCORE attribute to make the client prefer AV1 over other encodings (but please continue to provide renditions encoded with AVC and/or HEVC for compatibility with earlier devices and AirPlay).

A month later in October 2023, Apple announced their newest generation of desktop processors would include AV1 hardware decoders. This includes the M3, M3 Pro and M3 Max chips, meaning all new models of Macbooks, iMacs and desktop computers with an M3 processor will also support AV1 video playback.

Earlier in 2023, while everyone was waiting for Apple to officially support AV1, Meta took matters into their own hands, sharing how they brought AV1 to their Reels videos for Facebook and Instagram, including on iOS devices. This became possible through ongoing open source software decoding efficiency improvements, in particular with the dav1d decoder, developed by VideoLAN. Meta also said they believe for their video products, AV1 is the most viable codec for the coming years. The image below shows how they significantly improved visual quality with AV1 over VP9 and H.264, while keeping the bitrate constant.

Screenshot comparing video codec quality for low bandwidth streams. Source: Meta Engineering Blog

At Bitmovin we also believe in the potential of AV1 and have explored the possibilities of software decoding on mobile devices. At a recent internal hackathon, one of our senior software engineers, Roland Kákonyi, built a custom iOS player using the dav1d decoder that was able to decode and smoothly play 1080p AV1 content. We’ll continue exploring this further as a way to fill gaps in playback coverage for devices lacking hardware support.

AV1 Playback Support News in 2024

Following 2023’s big announcements from Apple, 2024 got off to a strong start with Android, Firefox and (again) Apple adding new AV1 playback support. The barriers and arguments against adopting AV1 continue falling, slowly, but surely.

Android adds dav1d decoder

Firefox adds AV1 support in Encrypted Media Extensions

While Firefox was the first major browser to support AV1 playback, a long-standing bug (or lack of implementation) prevented DRM-protected AV1 from playing. When Apple added support to Safari for HLS + FairPlay streaming, it meant Firefox was the only major browser that still did not support premium, secure content. That changed in April 2024, when Firefox 125 added AV1 support in encrypted media extensions, meaning Widewine-protected AV1 is now supported.

iPad Pro gets AV1 playback support with M4 processor

Current State of AV1 Playback support

To answer the question of current playback support as thoroughly as possible, we created several sample streams with different combinations of containers, muxings and DRM. While there will be some exceptions and omissions, especially when you go back to the 2021 and 2020 models, I’ll use the emojis below to show the general level of support you can expect from these platforms and brands right now and give the full results of our direct testing in the table at the end.

Fully Supported – Successful AV1 playback with all test streams, including DRM

Partial or Documented Support – Successfully played at least one, but not all of our test streams OR the product documentation claims AV1 playback support, but has not yet been verified by Bitmovin

Not Supported – AV1 playback not supported here currently

Browsers and Operating Systems

Chrome

Edge

Firefox

Safari*

Android

Windows

iOS / macOS **

*Safari 17 or later, when a hardware decoder is present

**AV1 is also supported in Chrome and Firefox on macOS

Generally speaking, the Chrome browser and Android ecosystem handle AV1 well across phones, tablets, smart TVs and set-top boxes/streaming sticks. Unfortunately, the same cannot be said for Safari and iOS where support had been lacking until the iPhone 15 Pro announcement.

Firefox was the first major browser to support AV1, and recently Firefox 125 added support for AV1 in Encrypted Media Extensions, meaning Widevine-protected content is now playable.

The Edge browser on Windows 10 and later supports AV1, but you may need to install the free AV1 Video Extension from the Microsoft Store.

For more details about the specific versions and less common browsers that support AV1, check out the table from CanIUse.com here.

Smart TVs

Android TV

Google TV

Samsung

Sony

Amazon Fire TV

As mentioned, Android handles AV1 quite nicely, which also applies to the Smart TVs running Android TV and Google TV operating systems. These include Sony Google TV models from 2021 on and many Amazon Fire TV models as far back as 2020. (FireOS is based on Android)

Samsung TVs (and phones) from late 2020 onward have AV1 hardware decoders and were mentioned by Netflix as some of the first outlets for their 4K AV1 content.

LG has developer documentation stating AV1 is supported for their UHD TVs and projectors running WebOS 5.0 and above, although our testing on some 2020 models was unsuccessful.

Consoles and Streaming Sticks

Amazon Fire TV Stick 4K Max

Playstation 4 Pro

Xbox One

Roku Streaming Stick 4K

Playstation 4 Pro was also called out by Netflix as one of the targets for their 4K AV1 streams and it takes advantage of GPU-accelerated decoding. Netflix didn’t publicly mention delivering AV1 to Xbox One, but the same decode libraries that the PS4 Pro uses were first made available for Xbox One, so it should be possible.

The Amazon Fire TV Stick 4K Max has AV1 + DRM support, making it one of the cheapest and best options for giving older 4K TVs an AV1 upgrade.

Roku is a little bit of a gray area at the moment. Officially, they still do not support AV1 as an adaptive streaming video codec, but newer models like the Roku Ultra that have a USB port do support AV1 playback via USB media. There does appear to be some level of support for AV1 adaptive streaming, as the YouTube “stats for nerds” overlay reveals a combination of AV1 video and opus audio playing on many of the popular recommended videos. Hopefully wider support is coming, but in the meantime, did confirm successful playback of our single file “progressive” AV1 MP4 files on the Streaming Stick 4K.

YouTube “Stats for nerds” showing AV1 video playing on Roku Streaming Stick 4K

Looking Ahead: Future AV1 Playback Support

Even with gaps in support on some platforms, there is plenty of opportunity to see tangible bandwidth savings and quality improvements from AV1 right now and thankfully, the future looks even brighter. Intel, AMD, Samsung and Qualcomm have all announced additional AV1 support coming at the chip level.

Will Apple add AV1 software decoding support for older devices?

There have been several indications that Apple would eventually support AV1. Apple joined the Alliance for Open Media, the organization responsible for creating and promoting AV1 encoding, back in 2018, which many took as a sign that Apple would eventually support it. We’re hopeful that with the addition of AV1 hardware decoding support to the iPhone Pro 15, iPad Pro and Macbooks, Apple will also add official HLS support and fallback software decoding for older devices that are capable.

Conclusion

While AV1 support and adoption has been on the rise and we’ve seen some encouraging announcements, universal support like we have with H.264 is just not there yet. That means AV1 will need to be part of a multi-codec approach for the foreseeable future, but that’s ok! Not that long ago, it took millions of views to offset the higher encoding costs of AV1, but with recent improvements, we’ve seen the break-even point drop to as low as 4,000 views! So for a whole lot of content, encoding with AV1 can already save you money right now and those savings will only increase as more supporting devices become available.

Ready to get started with AV1 encoding? You can try it for free with a Bitmovin Trial, sign up here!

: Playback supported

: Playback not supported

: Not yet tested

	fMP4 (DASH)	fMP4 with Widevine and Playready (DASH)	Single file “progressive” MP4 (.mp4)	Single file “progressive” MP4 + Widevine (DASH)	WebM (DASH)	WebM + Widevine (DASH)	Single file “progressive” WebM (DASH)	Single file “progressive” WebM + Widevine (DASH)		fMP4 (HLS)	fMP4 + Fairplay (HLS)
Chrome
Edge
Firefox
Safari
Android Native
Android Web
iOS
Fire TV Max
Fire TV Max Web (Silk Browser)
Roku Streaming Stick 4K
Samsung Tizen (2020 and 2021)

The post The State of AV1 Playback Support: 2024 appeared first on Bitmovin.

The AI Video Research Powering a Higher Quality Future

Andy Francis — Sun, 05 May 2024 22:06:17 +0000

*This post was originally published in June 2023. It was updated in May 2024 with more recent research publications and updates.*

This post will summarize the current state of Artificial Intelligence (AI) applications for video in 2024, including recent progress and announcements. We’ll also take a closer look at AI video research and collaboration between Bitmovin and the ATHENA laboratory that has the potential to deliver huge leaps in quality improvements and bring an end to playback stalls and buffering. This includes ATHENA’s FaRes-ML, which was recently granted a US Patent. Keep reading to learn more!

AI for video at NAB 2024
FaRes-ML granted US Patent
Recent Bitmovin and ATHENA AI Research
Generative AI for Adaptive Video Streaming
DeepVCA: Deep Video Complexity Analyzer
DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement
Previous Bitmovin and ATHENA AI Research
Better quality with neural network-driven Super Resolution upscaling
Less buffering and higher QoE with applied machine learning
Challenges ahead
Learn more
AI Video Glossary

AI for video at NAB 2024

At NAB 2024, the AI hype train continued gaining momentum and we saw more practical applications of AI for video than ever before. We saw various uses of AI-powered encoding optimization, Super Resolution upscaling, automatic subtitling and translations, and generative AI video descriptions and summarizations. Bitmovin also presented some new AI-powered solutions, including our Analytics Session Interpreter, which won a Best of Show award from TV Technology. It uses machine learning and large language models to generate a summary, analysis and recommendations for every viewer session. The early feedback has been positive and we’ll continue to refine and add more capabilities that will help companies better understand and improve their viewers’ experience.

L to R: Product Manager Jacob Arends, CEO Stefan Lederer and Engineer Peter Eder accepting the award for Bitmovin’s AI-powered Analytics Session Interpreter

Other AI highlights from NAB included Jan Ozer’s “Beyond the Hype: A Critical look at AI in Video Streaming” presentation, NETINT and Ampere’s live subtitling demo using OpenAI Whisper, and Microsoft and Mediakind sharing AI applications for media and entertainment workflows. You can find more detail about these sessions and other notable AI solutions from the exhibition floor in this post.

FaRes-ML granted US Patent

For a few years before this recent wave of interest, Bitmovin and our ATHENA project colleagues have been researching the practical applications of AI for video streaming services. It’s something we’re exploring from several angles, from boosting visual quality and upscaling older content to more intelligent video processing for adaptive bitrate (ABR) switching. One of the projects that was first published in 2021 (and covered below in this post) is Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML). We’re happy to share that FaRes-ML was recently granted a US Patent! Congrats to the authors, Christian Timmerer, Hadi Amirpour, Ekrem Çetinkaya and the late Prof. Mohammad Ghanbari, who sadly passed away earlier this year.

Recent Bitmovin and ATHENA AI Research

In this section, I’ll give a short summary of projects that were shared and published since the original publication of this blog, and link to details for anyone interested in learning more.

Generative AI for Adaptive Video Streaming

Presented at the 2024 ACM Multimedia Systems Conference, this research proposal outlines the opportunities at the intersection of advanced AI algorithms and digital entertainment for elevating quality, increasing user interactivity and improving the overall streaming experience. Research topics that will be investigated include AI generated recommendations for user engagement and AI techniques for reducing video data transmission. You can learn more here.

DeepVCA: Deep Video Complexity Analyzer

The ATHENA lab developed and released the open-source Video Complexity Analyzer (VCA) to extract and predict video complexity faster than existing method’s like ITU-T’s Spatial Information (SI) and Temporal Information (TI). DeepVCA extends VCA using deep neural networks to accurately predict video encoding parameters, like bitrate, and the encoding time of video sequences. The spatial complexity of the current frame and previous frame are used to rapidly predict the temporal complexity of a sequence, and the results show significant improvements over unsupervised methods. You can learn more and access the source code and dataset here.

DeepVCA’s spatial and temporal complexity prediction process

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

DIGITWISE leverages the concept of a digital twin, a digital replica of an actual viewer, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement (XGBoost). The DIGITWISE system architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. It can also be used to optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.You can learn more here.

System overview of DIGITWISE user engagement prediction

Previous Bitmovin and ATHENA AI Research

Better quality with neural network-driven Super Resolution upscaling

The first group of ATHENA publications we’re looking at all involve the use of neural networks to drive visual quality improvements using Super Resolution upscaling techniques.

DeepStream: Video streaming enhancements using compressed deep neural networks

Deep learning-based approaches keep getting better at enhancing and compressing video, but the quality of experience (QoE) improvements they offer are usually only available to devices with GPUs. This paper introduces DeepStream, a scalable, content-aware per-title encoding approach to support both CPU-only and GPU-available end-users. To support backward compatibility, DeepStream constructs a bitrate ladder based on any existing per-title encoding approach, with an enhancement layer for GPU-available devices. The added layer contains lightweight video super-resolution deep neural networks (DNNs) for each bitrate-resolution pair of the bitrate ladder. For GPU-available end-users, this means ~35% bitrate savings while maintaining equivalent PSNR and VMAF quality scores, while CPU-only users receive the video as usual. You can learn more here.

DeepStream system architecture

LiDeR: Lightweight video Super Resolution for mobile devices

Although DNN-based Super Resolution methods like DeepStream show huge improvements over traditional methods, their computational complexity makes it hard to use them on devices with limited power, like smartphones. Recent improvements in mobile hardware, especially GPUs, made it possible to use DNN-based techniques, but existing DNN-based Super Resolution solutions are still too complex. This paper proposes LiDeR, a lightweight video Super Resolution network specifically tailored toward mobile devices. Experimental results show that LiDeR can achieve competitive Super Resolution performance with state-of-the-art networks while improving the execution speed significantly. You can learn more here or watch the video presentation from an IEEE workshop.

Quantitative results comparing Super Resolution methods. LiDeR achieves near equivalent PSNR and SSIM quality scores while running ~3 times faster than its closest competition.

Super Resolution-based ABR for mobile devices

This paper introduces another new lightweight Super Resolution network, SR-ABR Net, that can be deployed on mobile devices to upgrade low-resolution/low-quality videos while running in real-time. It also introduces a novel ABR algorithm, WISH-SR, that leverages Super Resolution networks at the client to improve the video quality depending on the client’s context. By taking into account device properties, video characteristics, and user preferences, it can significantly boost the visual quality of the delivered content while reducing both bandwidth consumption and the number of stalling events. You can learn more here or watch the video presentation from Mile High Video.

System architecture for proposed Super Resolution based adaptive bitrate algorithm

Less buffering and higher QoE with applied machine learning

The next group of research papers involve applying machine learning at different stages of the video workflow to improve QoE for the end user.

FaRes-ML: Fast multi-resolution, multi-rate encoding

Fast multi-rate encoding approaches aim to address the challenge of encoding multiple representations from a single video by re-using information from already encoded representations. In this paper, a convolutional neural network is used to speed up both multi-rate and multi-resolution encoding for ABR streaming. Experimental results show that the proposed method for multi-rate encoding can reduce the overall encoding time by 15.08% and parallel encoding time by 41.26%. Simultaneously, the proposed method for multi-resolution encoding can reduce the encoding time by 46.27% for the overall encoding and 27.71% for the parallel encoding on average. You can learn more here.

FaRes-ML flowchart

ECAS-ML: Edge assisted adaptive bitrate switching

As video streaming traffic in mobile networks increases, utilizing edge computing support is a key way to improve the content delivery process. At an edge node, we can deploy ABR algorithms with a better understanding of network behavior and access to radio and player metrics. This project introduces ECAS-ML, Edge Assisted Adaptation Scheme for HTTP Adaptive Streaming with Machine Learning. It uses machine learning techniques to analyze radio throughput traces and balance the tradeoffs between bitrate, segment switches and stalls to deliver a higher QoE, outperforming other client-based and edge-based ABR algorithms. You can learn more here.

ECAS-ML system architecture

Challenges ahead

The road from research to practical implementation is not always quick or direct or even possible in some cases, but fortunately that’s an area where Bitmovin and ATHENA have been working together closely for several years now. Going back to our initial implementation of HEVC encoding in the cloud, we’ve had success using small trials and experiments with Bitmovin’s clients and partners to provide real-world feedback for the ATHENA team, informing the next round of research and experimentation toward creating viable, game-changing solutions. This innovation-to-product cycle is already in progress for the research mentioned above, with promising early quality and efficiency improvements.

Many of the advancements we’re seeing in AI are the result of aggregating lots and lots of processing power, which in turn means lots of energy use. Even with processors becoming more energy efficient, the sheer volume involved in large-scale AI applications means energy consumption can be a concern, especially with increasing focus on sustainability and energy efficiency. From that perspective, for some use cases (like Super Resolution) it will be worth considering the tradeoffs between doing server-side upscaling during the encoding process and client-side upscaling, where every viewing device will consume more power.

Learn more

Want to learn more about Bitmovin’s AI video research and development? Check out the links below.

Analytics Session Interpreter webinar

AI-powered video Super Resolution and Remastering

Super Resolution blog series

Super Resolution with Machine Learning webinar

Athena research

MPEG Meeting Updates

GAIA project blogs

AI Video Glossary

Machine Learning – Machine learning is a subfield of artificial intelligence that deals with developing algorithms and models capable of learning and making predictions or decisions based on data. It involves training these algorithms on large datasets to recognize patterns and extract valuable insights. Machine learning has diverse applications, such as image and speech recognition, natural language processing, and predictive analytics.

Neural Networks – Neural networks are sophisticated algorithms designed to replicate the behavior of the human brain. They are composed of layers of artificial neurons that analyze and process data. In the context of video streaming, neural networks can be leveraged to optimize video quality, enhance compression techniques, and improve video annotation and content recommendation systems, resulting in a more immersive and personalized streaming experience for users.

Super Resolution – Super Resolution upscaling is an advanced technique used to enhance the quality and resolution of images or videos. It involves using complex algorithms and computations to analyze the available data and generate additional details. By doing this, the image or video appears sharper, clearer, and more detailed, creating a better viewing experience, especially on 4K and larger displays.

Graphics Processing Unit (GPU) – A GPU is a specialized hardware component that focuses on handling and accelerating graphics-related computations. Unlike the central processing unit (CPU), which handles general-purpose tasks, the GPU is specifically designed for parallel processing and rendering complex graphics, such as images and videos. GPUs are widely used in various industries, including gaming, visual effects, scientific research, and artificial intelligence, due to their immense computational power.

Video Understanding – Video understanding is the ability to analyze and comprehend the information present in a video. It involves breaking down the visual content, movements, and actions within the video to make sense of what is happening.

The post The AI Video Research Powering a Higher Quality Future appeared first on Bitmovin.

NAB Video AI Highlights

Andy Francis — Fri, 26 Apr 2024 18:47:09 +0000

For the past few years, AI has been one of the top buzzwords at the NAB Show. While other hot topics like “web3” seem to have peaked and faded, interest in video AI has continued to grow and this year there were more practical solutions being showcased than ever before. A personal highlight for Bitmovin was winning a TV Technology Best of Show award for our AI-powered Analytics session interpreter. Keep reading to learn more about other interesting and useful applications of AI that we saw at NAB 2024.

Table of Contents

NAB Video AI Highlights: 2024

While there was some variation in implementation and features, the majority of the AI solutions I encountered at NAB fell into one of these categories:

Generative AI (genAI) for video creation, post-production, or summaries and descriptions
Automatic subtitling and captioning with multi-language translations
Object or event detection and indexing
Video quality enhancement

This summary is definitely not exhaustive, but highlights some of the things that stood out to me on the show floor and in the conference sessions. Please let us know in the comments if you saw anything else noteworthy.

Booths and Exhibits

Adobe

Adobe has been showing AI-powered editing and post-production tools as part of their creative suite for a couple years now and they seem to be continuously improving. They teased a new Firefly video model that will be coming to Premiere Pro later this year that will enable a few new Photoshop-like tools for video. Generative Extend will allow you to extend clips with AI generated frames for perfectly timed edits and the new Firefly model will also enable object removal, addition, and replacement. They’ve also implemented content credentials into the platform that will signal when generative AI was used in the creation process and which models were used, as they prepare for supporting 3rd party genAI models like OpenAI’s Sora.

Amazon Web Services (AWS)

AWS had one of the busiest booths in the West hall and were showcasing several AI-powered solutions, including using genAI for creating personalized ads and Intel’s Video Super Resolution upscaling. But they also had the most eye-catching and fun application of AI in the South Hall, a genAI golf simulator where you could design and play your own course.

AWS GenAI-powered golf simulator

axle.ai

Axle.ai was sharing their face, object, and logo recognition technology that can index recognized objects and search for matching objects in other videos or clips. Their software also has automatic voice transcription and translation capabilities. It can run either on-premises or in the cloud and integrates with Adobe Premiere, Final Cut Pro and other editing suites. While other companies offer similar capabilities, they stood out as being particularly focused on these use cases.

BLUEDOT

BLUEDOT was showcasing a few different solutions for improving QoE in the encoding and processing stage. Their DeepField-SR video super resolution product uses a proprietary deep neural network to upscale video up to 4K resolution, leveraging FPGAs. They were also showing AI-driven perceptual quality optimized video encoding.

BLUEDOT’s AI-driven perceptual quality optimization- image source: blue-dot.io

Twelve Labs

Twelve Labs was featuring their multimodal AI for Media & Entertainment workflows, aiming to bring human-like understanding to video content. They use both video and audio information to inform object and event detection and indexing. This enables you to easily find moments in a video, like when a certain player scores or when a product is mentioned. They also power generative text descriptions of videos and clips. Their solution seemed more flexible than others I saw and can be integrated into media asset management systems, editing software or OTT streaming workflows.

Conference Sessions and Presentations

Beyond the Hype: A Critical look at AI in Video Streaming

In this session, as the title suggests, Jan Ozer took a close look at the current state of AI applications for video streaming workflows. He conducted several interviews with executives and product leaders ahead of NAB and shared his notes and links to the full interviews. He also called out a few times that many of the companies featured, including Bitmovin, have been researching and working on AI-powered video solutions for several years now, even before the current wave of hype. He shared Bitmovin’s new Analytics session interpreter and our Super Resolution capabilities, which you can hear more about in his interview with our VP of Product, Reinhard Grandl.

Jan Ozer’s interview with Bitmovin’s Reinhard Grandl for his Beyond the Hype NAB presentation

Some other things that stood out for me included Interra Systems’ BATON Captions, which uses natural language processing to break text in a more natural, human readable way. This is a small, subtle feature that can really make a big difference in improving accessibility and the viewer experience, that I haven’t heard anyone else focus on. DeepRender also caught my attention with their claims of an AI-based video codec that will have 45% better compression than VVC by the end of 2024. That’s a really bold claim and I’ll be watching to see if they live up to the hype. Video of the session is available here, thanks to Dan Rayburn and the Streaming Summit.

Running OpenAI’s Whisper Automatic Speech Recognition on a Live Video Transcoding Server

This was a joint presentation led by NETINT’s COO Alex Liu and Ampere’s Chief Evangelist Sean Varley. They presented a practical demo of real-time live transcoding and subtitling using NETINT’s T1U Video Processing Unit (VPU) together with Ampere’s Altra Max CPU running OpenAI Whisper. The NETINT VPU is capable of creating dozens of simultaneous adaptive bitrate outputs with H.264, H.265 and AV1 codecs. The Ampere processor was being positioned as a more environmentally-friendly option for AI inference workflows, consuming less power than similarly capable GPUs. While there were some hiccups with the in-room A/V system, the live captioning demo was impressive and worked very well. Video of the session is available here, again thanks to Dan Rayburn and the Streaming Summit.

Sean Varley and Alex Liu presenting NETINT and Ampere’s Live transcoding and subtitling workflow at NAB 2024

Leveraging Azure AI for Media Production and Content Monetization Workflows

Microsoft’s Andy Beach and MediaKind’s Amit Tank led this discussion and showcase of using genAI in media and entertainment workflows. They discussed how AI can help with each part of the production and delivery workflow to boost monetization. This included things like brand detection, contextual ad placements, metadata automation, translations, captioning and personalization. One area they discussed that I hadn’t heard anyone else talk about was using AI for content localization, not just for language translation via captions and dubbing, but for compliance with local and regional norms and in some cases regulations. For example, some areas and countries may prefer or even require removal or censorship of things like alcohol and drug use or guns and excessive violence, so AI can help automate content preparation in different ways for a global audience. They also shared their own personal “most-used” AI applications, which included Microsoft’s Copilot and related AI add-ons to Teams and other Microsoft products.

Video AI use cases across the media supply chain, presented by Microsoft and MediaKind at NAB 2024

Did you see an interesting or innovative use of AI at NAB that wasn’t mentioned here? Please let us know in the comments!

The post NAB Video AI Highlights appeared first on Bitmovin.

AI-powered Video Super Resolution and Remastering

Andy Francis — Fri, 12 Apr 2024 15:18:37 +0000

AI has been the hot buzz word in tech the past couple of years and we’re starting to see more and more practical applications for video emerging from the hype, like automatic closed-captioning and language translation, automated descriptions and summaries, and AI video Super Resolution upscaling. Bitmovin has especially focused on how AI can provide value for our customers, releasing our AI Analytics Session Interpreter earlier this year and we’re looking closer at several other areas of the end-to-end video workflow.

We’re very proud of how our encoder maintains the visual quality of the source files, while significantly reducing the amount of data used, but now we’re exploring how we can actually improve on the quality of the source file for older and standard definition content. Super Resolution implementations have come a long way in the past few years and have the potential to give older content new life and make it look amazing on Ultra-High Definition screens. Keep reading to learn about Bitmovin’s progress and results.

What is video Super Resolution and how does it work?

Super Resolution refers to the process of enhancing the quality or increasing the resolution of an image or video beyond its original resolution. The original methods of upscaling images and video involved upsampling by using mathematical functions like bilinear and bicubic interpolation to predict new data points in between sampled data points. Some techniques used multiple lower-resolution images or video frames to create a composite higher resolution image or frame. Now AI and machine learning (ML) based methods involve training deep neural networks (DNNs) with large libraries of low and high-resolution image pairs. The networks learn to map the differences between the pairs, and after enough training they are able to accurately generate a high-resolution image from a lower-resolution one.

Bitmovin’s AI video Super Resolution exploration and testing

Super Resolution upscaling is something that Bitmovin has been investigating and testing with customers for several years now. We published a 3-part deep dive back in 2020 that goes into detail about the principles behind Super Resolution, how it can be incorporated into video workflows and the practical applications and results. We won’t fully rehash those posts here, so check them out if you’re interested in the details. But one of the conclusions we came to back then, was that Super Resolution was an especially well-suited application for machine learning techniques. This is even more true now, as GPUs have gotten exponentially more powerful over the past 4 years, while becoming more affordable and accessible as cloud resources.

Nvidia’s GPU computation capabilities over the last 8 years – source: Nvidia GTC 2024 keynote

ATHENA Super Resolution research

Bitmovin’s ATHENA research lab partner has also been looking into various AI video Super Resolution approaches. In a proposed method called DeepStream, they demonstrated how a DNN enhancement-layer could be included with a stream to perform Super Resolution upscaling on playback devices with capable GPUs. The results showed this method could save ~35% bitrate while delivering equivalent quality. See this link for more detail.

Other Super Resolution techniques the ATHENA team has looked at involve upscaling on mobile devices that typically can’t take advantage of DNNs due to lack of processing power and power consumption/battery concerns. Lightweight Super Resolution networks specifically tailored for mobile devices like LiDeR and SR-ABR Net have shown positive early outcomes and performance.

AI-powered video enhancement with Bitmovin partner Pixop

Bitmovin partner Pixop specializes in AI and ML video enhancement and upscaling. They’re also cloud native and fellow members of NVIDIA’s Inception Startup Program. They offer several AI-powered services and filters including restoration, Super Resolution upscaling, denoising, deinterlacing, film grain and frame rate conversion that automate tedious processes that used to be painstaking and time consuming. We’ve found them to be very complementary to Bitmovin’s VOD Encoding and have begun trials with Bitmovin customers.

One application we’re exploring is digital remastering of historic content. We’ve been able to take lower resolution, grainy and generally lower quality content (by today’s standards) through Pixop’s upscaling and restoration, with promising results. The encoded output was not only a higher resolution, but also the application of cropping, graining and color correction resulted in a visually more appealing result, allowing our customer to re-monetize their aged content. The image below shows a side-by-side comparison of remastered content with finer details.

Side-by-side comparison of AI remastered content

Interested in giving your older content new life with the power of AI video Super Resolution? Get in touch here.

Globo, Google Cloud and Bitmovin: Taking Quality to New Heights

Andy Francis — Wed, 10 Apr 2024 17:28:53 +0000

Globo’s content and reach

When it comes to content scale and audience reach, Globo is on par with Hollywood and the big US broadcasters with over 3,000 hours of entertainment content being produced each year. The viewership numbers are equally impressive with forty-nine million Brazilians watching the daily, one-hour newscast and Globo’s Digital Hub attracting eight out of ten Brazilians with internet access. The Digital Hub hosts a variety of content categories, from news, sports, and entertainment to live events such as the Olympics, Carnival, and the FIFA World Cup. Globo also runs a subscription video on demand (SVOD) service called Globoplay that streams live sports, licensed content, as well as movies and television series produced by Estúdios Globo, the largest content production studio in Latin America.

Globo standard of quality

Globo has worked hard to build and become known for the “Globo Standard of Quality”. This included creating the optimal viewing experience together with award-winning content, delivered in stunning visual quality. To develop that reputation, Globo became one of the first mainstream broadcasters outside of the US to offer content in 4K, adopting it as a new standard across its platforms and devices. It has already produced hundreds of hours of 4K content (including HDR) with over a thousand hours of encoding output with its telenovelas and original series. The early adoption of 4K is even more impressive for Globo as Brazil is ranking 79th on the list of countries by Internet connection speed. In order to deliver high-quality video, operators cannot just work with higher bitrates but rather have to find the optimal encoder that achieves both quality, speed, and cost-efficiency at the same time. In the past, 4K encoding was accomplished with on-premises hardware encoders. As the next update cycle of the appliances was fast approaching, Igor Macaubas, Head of Online Video Platform, and Lucas Stephanou, Video Platform Product Owner at Globo, decided to conduct a thorough evaluation of vendors, and ultimately chose Bitmovin.

“We are not willing to compromise the visual integrity of our content and we hold ourselves to strict perception-quality standards. Bitmovin’s renowned 3-Pass Encoding exceeded our expectations and ensures that high perceptual quality can still be delivered while streaming at optimal bandwidth levels.”

– Lucas Stephanou (Video Platform Product Owner, Globo)

Globoplay, powered by Bitmovin VOD Encoding on Google Cloud

Globo handles a massive VOD library of over a million titles, and with 12 variants in their HEVC bitrate stack — encoding demands are high. Bitmovin’s VOD encoding service running on Google Cloud gave Globo the capability to encode a 90-minute video asset in 14 minutes across the entire HEVC ladder. This is a realtime factor of 6.4 times, which resulted in a quantifiable impact on time-to-market. Globo saw the business need for fast turnaround time in encodes and chose Bitmovin as the clear front runner in this regard.

Bitmovin VOD Encoding on Google Cloud is an easy-to-use, fully-managed video transcoding software-as-a-service (SaaS). Bitmovin VOD Encoding allows customers to efficiently stream any type of on-demand content to any viewing device. Customers use Bitmovin VOD Encoding for a wide range of on-demand streaming use cases, including Subscription Video on Demand (SVOD), Transactional VOD (TVOD), and Ad-supported VOD (AVOD) services, online training, and other use cases. Bitmovin’s Emmy Award® winning multi-codec outputs and per-scene and per-title content-aware transcoding produce higher visual quality video outputs at lower bit rates than other file-based transcoding SaaS to optimize content delivery and reduce streaming cost. Bitmovin VOD Encoding is available for purchase on Google Cloud Marketplace.

Bitmovin’s 3-Pass Encoding algorithm uses machine learning and AI to examine the video on a scene-by-scene basis. It analyzes the content’s complexity multiple times to optimize intra-frame and inter-frame compression. This helps determine the ideal resolution and bitrate combinations that maximize the quality and efficiency. All together, this ensures the visual elements of the video are not degraded in the encoding process and prevents unnecessary overhead data that might impact the viewing experience.

Processing HD and 4K video with Globo’s volume requires computing resources that would exceed the CapEx budgets of most companies. This is where the Google Cloud’s flexibility and on-demand compute power really shine. Together with Bitmovin’s split-and-stitch technology, single encoding jobs run significantly faster with parallel processing and spikes in demand are handled with ease and throughput that is just not possible with on-premises encoding. Customers also have the option to deploy Bitmovin VOD Encoding as a managed service running in the Bitmovin account or as a single tenancy running in the customer’s Google Cloud account. This allows encoding costs to be applied toward any annual spending commitments.

“Globo is known to set quality standards. We want our viewers to experience our great content in stunning video quality. Our 4K workflows have been relying on hardware encoders, but we wanted to test the power of the cloud and conducted a thorough vendor evaluation based on video quality. Bitmovin’s encoding quality and speed convinced us across the board. And, since using Bitmovin’s encoding service running on Google Cloud, we are spending a fraction of the cost by bringing our capital cost down without spending more on operational cost.”

– Igor Macaubas (Head of Online Video Platform, Globo)

Olympics in 8K

One prime example of this collaboration innovating and pushing the boundaries of video quality is from the Tokyo Olympics in 2021, where 8K VOD content from the Olympics was delivered to viewers at home via Globoplay. This marked the first time that the Olympics were viewable in 8K resolution outside of Japan. 8K video has 16x the resolution of HD and 4x that of 4K, so it requires an enormous amount of processing power and advanced compression to lower the data rates for delivery to end users. 4K and 8K content is also referred to as Ultra High Definition (UHD) and is usually mastered in a High Dynamic Range (HDR) format that allows for brighter highlights, more contrast and a wider color palette. Hybrid-Log Gamma (HLG) is an HDR format that was developed for broadcast applications and backward compatibility with Standard Dynamic Range (SDR) television sets.

After receiving the HLG mastered content from Intel in Japan, Globo utilized Bitmovin VOD Encoding on Google Cloud’s compute instances for efficient parallel processing with Bitmovin’s VOD Encoding API. 8K/60p transcoding was performed using the High Efficiency Video Coding (HEVC) codec, creating an optimized adaptive bitrate ladder. At this stage, Bitmovin’s 3-pass encoding was key for transforming the content into a compatible size for transport over broadband internet connections, without sacrificing the stunning 8K visual quality. The 8K content was then delivered via Globo’s own Content Delivery Network (CDN) infrastructure to subscribers of Globoplay with 8K Samsung TVs.

“Our 3-Pass Encoding proved to be the right encoding mode. It ensured high perceptual quality could still be delivered while streaming at optimal bandwidth levels. With our split-and-stitch technology running on Google Cloud’s scalable infrastructure, we were able to deliver both speed and quality for this time-sensitive content.”

– Stefan Lederer (CEO, Bitmovin)

Learn more about Bitmovin’s VOD Encoding SaaS here.

NVIDIA GTC24: Highlights for Video Streaming Workflows

Andy Francis — Fri, 05 Apr 2024 20:38:00 +0000

NVIDIA GTC Video Streaming Workflow Highlights

NVIDIA GTC (GPU-Technology Conference) is an annual conference with training and exhibition for all aspects of GPU(Graphics Processing Unit) accelerated computing. GTC 2024 was held in March with the tagline “The Conference for the Era of AI” and as expected, generative AI was a huge focus this year. There were also several other emerging applications of AI including advanced robotics, autonomous vehicles, climate modeling and new drug discovery.

When GPUs were first introduced, they were mainly used for rendering graphics in video game systems. In the mid-late ‘90s, NVIDIA’s programmable GPUs opened up new possibilities for accelerated video decoding and transcoding workflows. Even though GPUs may now be more associated with powering AI solutions, they still play an important role for many video applications and there were several sessions and announcements covering the latest video-related updates at GTC24. Keep reading to learn more about the highlights.

Video technology updates

In a session titled NVIDIA GPU Video Technologies: New Features, Improvements, and Cloud APIs, Abhijit Patait, Sr. Director of Multimedia and AI at NVIDIA, shared the latest updates and new features available for processing video with their GPUs. Some highlights that are now available in NVIDIA’s Video Codec SDK 12.2:

15% quality improvement for HEVC encoding, thanks to several enhancements:
- UHQ (Ultra-high quality) tuning info for latency-tolerant use cases
- Increased lookahead analysis
- Temporal filtering for noise reduction
- Unidirectional B-frames for latency-sensitive use cases
Encode 8-bit content as 10-bit for higher quality (HEVC and AV1)

Comparison of HEVC encodings with equivalent quality using 18Mbps with HQ tuning, but only 10Mbps with the new UHQ tuning – source: GTC24

There were also several “Connect with Experts” sessions held where attendees could meet and ask questions of various NVIDIA subject matter experts. In the Building Efficient Video Transcoding Pipelines Enabling 8K session, they shared how multiple NVENC instances can be used in parallel for split-frame encoding to speed up 8K transcoding workflows. This topic is also covered in detail in their developer blog here.

Split frame encoding with NVIDIA GPUs – source: NVIDIA developer blog

VMAF-CUDA: Faster video quality analysis

Snap and NVIDIA gave a joint presentation around a collaborative project they worked on (including participation from Netflix) to optimize and implement VMAF (Video Multi-Method Assessment Fusion) quality calculations on NVIDIA CUDA cores. CUDA (Compute Unified Device Architecture) cores are general-purpose processing units available on NVIDIA GPUs that allow for parallel processing and applications that are complementary to the dedicated GPU circuits.

NVIDIA GPU video capabilities and components – source: nvidia.com

During the talk, they explained how implementing VMAF-CUDA enabled Snap to run their video quality assessments in parallel to the transcoding being done on NVIDIA GPUs. The new method runs several times faster and more efficiently than running VMAF on CPU instances. It was so successful that Snap is now planning to transition all VMAF calculations to GPUs, even for transcoding workflows that are CPU-based. They also published the technical details in this blog post for those interested in learning more.

VMAF calculation speed comparison, GPU vs CPU – source: NVIDIA developer blog

Netflix Vision AI workflows

In a joint presentation by Netflix and NVIDIA, Streamed Video Processing for Cloud-Scale Vision AI Services, they shared how Netflix is using computer vision and AI at scale throughout their stack. Netflix is a bit unique not only in their massive scale, but also that they are vertically integrated and have people working on every part of the chain from content creation through distribution. This opens a lot of opportunities for using AI along with the challenge of deploying solutions at scale.

They shared examples from:

Pre-production: Storyboarding, Pre-visualization
Post-production: QC, Compositing and visual fx, Video search
Promotional media: Generating multi-format artwork, posters, trailers; Synopsis
Globalization/localization of content: Multi-language subtitling and dubbing

They also discussed the pros and cons of using an off-the-shelf framework like NVIDIA’s DeepStream SDK for computer vision workflows (ease of use, efficiency of set up) vs building your own modular workflow (customization, efficiency of use) with components like CV-CUDA Operators for pre- and post-processing of images and TensorRT for deep-learning inference.

They also went into some detail on one application of computer vision in the post-production process, where they used object detection to identify when the clapperboard appeared in footage and sync the audio with the moment it closed, with sub-frame precision. This is something that has been a tedious, manual process for editors for decades in the motion picture industry and now they are able to automate it with consistent, precise results. While this is really more on the content creation side, it’s not hard to imagine how this same method could be used for automating some QA/QC processes for those on the content processing and distribution side.

Ready to try GPU encoding in the cloud?

Bitmovin VOD Encoding now supports the use of NVIDIA GPUs for accelerated video transcoding. Specifically, we use NVIDIA T4 GPUs on AWS EC2 G4dn instances, which are now available to our customers simply by using our VOD_HARDWARE_SHORTFORM preset. This enables incredibly fast turnaround times using both H.264 and H.265 codecs. For time-critical short form content like sports highlights and news clips, it can make a huge difference. You can get started today with a Bitmovin trial and see the results for yourself.

Split-and-Stitch Encoding with incredible speed, quality and scale

Andy Francis — Wed, 13 Mar 2024 17:09:44 +0000

Introduction

In the early days of digital video, encoding a full-length movie could take several hours or even days to complete, depending on the settings and techniques that were used. Over time, as processor speeds increased and specialized hardware was introduced, encoding turnaround times decreased, but it was usually an incremental, linear response to the advancements in technology. Once cloud computing resources became readily available and opened new possibilities, cloud-native encoding services like Bitmovin disrupted the status quo with massive gains for encoding speed and turnaround times. This potential was unlocked by developing an innovative new technique known as split-and-stitch encoding.

What is split-and-stitch encoding?

As the name suggests, split-and-stitch encoding is a method of encoding that involves splitting a file into smaller chunks, encoding those chunks separately, and then stitching them back together. These smaller chunks being encoded in parallel with separate cloud computing resources led to huge leaps in shortening turnaround times. Prior to that, digital videos were processed linearly, which was an unnecessary limitation carried over from film and tape processing workflows, where the physical medium was actually a limiting factor.

How fast is split-and-stitch encoding?

Back in 2015 when Bitmovin first implemented our encoder on the Google Compute Engine (now Google Cloud Platform) we were able to achieve encoding speeds of 66x real-time running in their cloud, as mentioned here. With some further optimization, we became the first to reach 100x real-time encoding speeds.

The actual turn-around times for your encoding jobs will depend on a lot of factors including source format, codec(s), resolution, duration and advanced features like Dolby Vision, but even with very complex 4K HDR workflows, your encodes will run faster than real-time using split and stitch. Below is a real-world example of an H.264/AAC encoding that ran faster than 92x real-time.

Running split-and-stitch encoding in the cloud means your individual encoding jobs run faster than real-time, but it also means that you can scale to run many jobs in parallel which allows large backlogs to be cleared in hours instead of weeks. You also have the capacity to handle spikes of content with no impact on queue time.

What are the advantages of Bitmovin’s split-and-stitch encoding?

Bitmovin has over a decade of experience developing and refining our split-and-stitch implementation. We built our system to take advantage of spot and preemptible instances to keep costs down, while surpassing the quality of single instance encodes with innovations like 3-pass encoding and Smart Chunking. Our intelligent workload orchestration allows you to manage priority and resource scheduling with capacity for thousands of jobs per hour.

Bitmovin also supports using multiple codecs and packaging formats together with split-and-stitch, including H.264 (AVC), H265 (HEVC), VP9 and AV1 with both HLS and DASH, where other platforms may be limited to H.264 and HLS. We’ve also implemented fast decode enhancements for large J2K and ProRes mezzanine source files that reduce the overall turnaround time even further.

What is Smart Chunking?

In 2023, Bitmovin made some key changes and updates to our VOD Encoder with a new feature called Smart Chunking. This further increased the potential visual quality and turnaround times that were possible with split and stitch by decoupling the split-and-stitch chunk duration from the user-defined segment duration. This allows for variable chunk size depending on the type of codec and the complexity of encoding, enabling many immediate improvements and future optimizations. Using Smart Chunking means we can segment chunks at the optimal points with better bitrate distribution, providing more consistent quality without any noticeable dips.

In the graph below, you can see a comparison of an encoding job run with and without Smart Chunking. While the overall quality is similar, in the blue version (without Smart Chunking) there are several lower quality outlier frames. By using Smart Chunking (orange version) the lowest 1% of frames in terms of quality were improved by an average of 6 VMAF points, which is a noticeable difference. The lowest 0.1% improved by 22 VMAF points and the single worst frame gained a massive 60 VMAF points.

Is split-and-stitch always the best approach?

The steps of analyzing, splitting and reassembling chunks of video do add some overhead processing time to the encoding process. For longer episodic content or movies, the added time is negligible compared to the time saved by using split-and-stitch. But, for shorter videos like ads and news clips that are time-sensitive, the pre-processing can make using split-and-stitch less advantageous.

For these cases, Bitmovin has 2 solutions. First, we’ve added support for hardware encoding with Nvidia T4 GPUs. They can deliver the same quality of video encoding, up to four times faster than CPUs, with H.264 (AVC) and H.265 (HEVC) codec support. We also have a new “accelerated mode” that uses pre-warmed cloud compute resources, so you no longer have to wait for new instances to be started. This has made a huge impact on overall encoding job turnaround time, lowering queuing times from minutes to <10 seconds.

Ready to get started with split-and-stitch encoding?

Bitmovin’s split-and-stitch encoding with Smart Chunking is enabled by default and doesn’t require any special configuration. You can get started quickly with our dashboard encoding wizard without any coding required. Get going today with our free trial and see the results for yourself by clicking here!

The post Split-and-Stitch Encoding with incredible speed, quality and scale appeared first on Bitmovin.

Andy Francis – Bitmovin

WWDC 2024 HLS Updates for Video Developers

The lastest HLS updates for 2024

Updated Interstitial attributes

Signal enhancements for High Dynamic Range (HDR) and timed metadata

HDR10+

Dolby Vision with AV1

Enhanced timed metadata support

Metrics and logging advancements

Common Media Client Data (CMCD) standard integration

FairPlay content decryption key management

Conclusion

Everything you need to know about Apple AV1 Support

Hints that Apple AV1 support was coming

iPhone 15 Pro announcement

More details about HDR, DRM, HLS and Safari support for AV1

Apple M3 processor announcement

Apple M4 processor iPad announcement

Apple AV1 Dolby Vision Support

AV1 Software Decoding Support?

Ready to take advantage of AV1 Encoding?

Related links

New Firefox AV1 support for Encrypted Media Extensions

Table of Contents

Firefox 125 adds support for encrypted AV1

Previous Bitmovin and Firefox AV1 collaboration

Other recent AV1 playback updates

Android adds dav1d decoder

iPad Pro gets AV1 playback support with M4 processor

Ready to get started with AV1?

Related Links

The State of AV1 Playback Support: 2024

Table of Contents

AV1: The Story So Far (2017-2023)

Apple adds AV1 hardware decoding support to iPhone 15 Pro and new Macbooks

AV1 Playback Support News in 2024

Android adds dav1d decoder

Firefox adds AV1 support in Encrypted Media Extensions

iPad Pro gets AV1 playback support with M4 processor

Current State of AV1 Playback support

Browsers and Operating Systems

Smart TVs

Consoles and Streaming Sticks

Looking Ahead: Future AV1 Playback Support

Will Apple add AV1 software decoding support for older devices?

Conclusion

The AI Video Research Powering a Higher Quality Future

AI for video at NAB 2024

FaRes-ML granted US Patent

Recent Bitmovin and ATHENA AI Research

Generative AI for Adaptive Video Streaming

DeepVCA: Deep Video Complexity Analyzer

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

Previous Bitmovin and ATHENA AI Research

Better quality with neural network-driven Super Resolution upscaling

Less buffering and higher QoE with applied machine learning

Challenges ahead

Learn more

AI Video Glossary

NAB Video AI Highlights

NAB Video AI Highlights: 2024

Booths and Exhibits

Adobe

Amazon Web Services (AWS)

axle.ai

BLUEDOT

Twelve Labs

Conference Sessions and Presentations

Beyond the Hype: A Critical look at AI in Video Streaming

Running OpenAI’s Whisper Automatic Speech Recognition on a Live Video Transcoding Server

Leveraging Azure AI for Media Production and Content Monetization Workflows

AI-powered Video Super Resolution and Remastering

What is video Super Resolution and how does it work?

Bitmovin’s AI video Super Resolution exploration and testing

ATHENA Super Resolution research

AI-powered video enhancement with Bitmovin partner Pixop

Related Links

Globo, Google Cloud and Bitmovin: Taking Quality to New Heights

Globo’s content and reach

Globo standard of quality