Innovation – Bitmovin

The AI Video Research Powering a Higher Quality Future

Sun, 05 May 2024 22:06:17 +0000

*This post was originally published in June 2023. It was updated in May 2024 with more recent research publications and updates.*

This post will summarize the current state of Artificial Intelligence (AI) applications for video in 2024, including recent progress and announcements. We’ll also take a closer look at AI video research and collaboration between Bitmovin and the ATHENA laboratory that has the potential to deliver huge leaps in quality improvements and bring an end to playback stalls and buffering. This includes ATHENA’s FaRes-ML, which was recently granted a US Patent. Keep reading to learn more!

AI for video at NAB 2024
FaRes-ML granted US Patent
Recent Bitmovin and ATHENA AI Research
Generative AI for Adaptive Video Streaming
DeepVCA: Deep Video Complexity Analyzer
DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement
Previous Bitmovin and ATHENA AI Research
Better quality with neural network-driven Super Resolution upscaling
Less buffering and higher QoE with applied machine learning
Challenges ahead
Learn more
AI Video Glossary

AI for video at NAB 2024

At NAB 2024, the AI hype train continued gaining momentum and we saw more practical applications of AI for video than ever before. We saw various uses of AI-powered encoding optimization, Super Resolution upscaling, automatic subtitling and translations, and generative AI video descriptions and summarizations. Bitmovin also presented some new AI-powered solutions, including our Analytics Session Interpreter, which won a Best of Show award from TV Technology. It uses machine learning and large language models to generate a summary, analysis and recommendations for every viewer session. The early feedback has been positive and we’ll continue to refine and add more capabilities that will help companies better understand and improve their viewers’ experience.

L to R: Product Manager Jacob Arends, CEO Stefan Lederer and Engineer Peter Eder accepting the award for Bitmovin’s AI-powered Analytics Session Interpreter

Other AI highlights from NAB included Jan Ozer’s “Beyond the Hype: A Critical look at AI in Video Streaming” presentation, NETINT and Ampere’s live subtitling demo using OpenAI Whisper, and Microsoft and Mediakind sharing AI applications for media and entertainment workflows. You can find more detail about these sessions and other notable AI solutions from the exhibition floor in this post.

FaRes-ML granted US Patent

For a few years before this recent wave of interest, Bitmovin and our ATHENA project colleagues have been researching the practical applications of AI for video streaming services. It’s something we’re exploring from several angles, from boosting visual quality and upscaling older content to more intelligent video processing for adaptive bitrate (ABR) switching. One of the projects that was first published in 2021 (and covered below in this post) is Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML). We’re happy to share that FaRes-ML was recently granted a US Patent! Congrats to the authors, Christian Timmerer, Hadi Amirpour, Ekrem Çetinkaya and the late Prof. Mohammad Ghanbari, who sadly passed away earlier this year.

Recent Bitmovin and ATHENA AI Research

In this section, I’ll give a short summary of projects that were shared and published since the original publication of this blog, and link to details for anyone interested in learning more.

Generative AI for Adaptive Video Streaming

Presented at the 2024 ACM Multimedia Systems Conference, this research proposal outlines the opportunities at the intersection of advanced AI algorithms and digital entertainment for elevating quality, increasing user interactivity and improving the overall streaming experience. Research topics that will be investigated include AI generated recommendations for user engagement and AI techniques for reducing video data transmission. You can learn more here.

DeepVCA: Deep Video Complexity Analyzer

The ATHENA lab developed and released the open-source Video Complexity Analyzer (VCA) to extract and predict video complexity faster than existing method’s like ITU-T’s Spatial Information (SI) and Temporal Information (TI). DeepVCA extends VCA using deep neural networks to accurately predict video encoding parameters, like bitrate, and the encoding time of video sequences. The spatial complexity of the current frame and previous frame are used to rapidly predict the temporal complexity of a sequence, and the results show significant improvements over unsupervised methods. You can learn more and access the source code and dataset here.

DeepVCA’s spatial and temporal complexity prediction process

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

DIGITWISE leverages the concept of a digital twin, a digital replica of an actual viewer, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement (XGBoost). The DIGITWISE system architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. It can also be used to optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.You can learn more here.

System overview of DIGITWISE user engagement prediction

Previous Bitmovin and ATHENA AI Research

Better quality with neural network-driven Super Resolution upscaling

The first group of ATHENA publications we’re looking at all involve the use of neural networks to drive visual quality improvements using Super Resolution upscaling techniques.

DeepStream: Video streaming enhancements using compressed deep neural networks

Deep learning-based approaches keep getting better at enhancing and compressing video, but the quality of experience (QoE) improvements they offer are usually only available to devices with GPUs. This paper introduces DeepStream, a scalable, content-aware per-title encoding approach to support both CPU-only and GPU-available end-users. To support backward compatibility, DeepStream constructs a bitrate ladder based on any existing per-title encoding approach, with an enhancement layer for GPU-available devices. The added layer contains lightweight video super-resolution deep neural networks (DNNs) for each bitrate-resolution pair of the bitrate ladder. For GPU-available end-users, this means ~35% bitrate savings while maintaining equivalent PSNR and VMAF quality scores, while CPU-only users receive the video as usual. You can learn more here.

DeepStream system architecture

LiDeR: Lightweight video Super Resolution for mobile devices

Although DNN-based Super Resolution methods like DeepStream show huge improvements over traditional methods, their computational complexity makes it hard to use them on devices with limited power, like smartphones. Recent improvements in mobile hardware, especially GPUs, made it possible to use DNN-based techniques, but existing DNN-based Super Resolution solutions are still too complex. This paper proposes LiDeR, a lightweight video Super Resolution network specifically tailored toward mobile devices. Experimental results show that LiDeR can achieve competitive Super Resolution performance with state-of-the-art networks while improving the execution speed significantly. You can learn more here or watch the video presentation from an IEEE workshop.

Quantitative results comparing Super Resolution methods. LiDeR achieves near equivalent PSNR and SSIM quality scores while running ~3 times faster than its closest competition.

Super Resolution-based ABR for mobile devices

This paper introduces another new lightweight Super Resolution network, SR-ABR Net, that can be deployed on mobile devices to upgrade low-resolution/low-quality videos while running in real-time. It also introduces a novel ABR algorithm, WISH-SR, that leverages Super Resolution networks at the client to improve the video quality depending on the client’s context. By taking into account device properties, video characteristics, and user preferences, it can significantly boost the visual quality of the delivered content while reducing both bandwidth consumption and the number of stalling events. You can learn more here or watch the video presentation from Mile High Video.

System architecture for proposed Super Resolution based adaptive bitrate algorithm

Less buffering and higher QoE with applied machine learning

The next group of research papers involve applying machine learning at different stages of the video workflow to improve QoE for the end user.

FaRes-ML: Fast multi-resolution, multi-rate encoding

Fast multi-rate encoding approaches aim to address the challenge of encoding multiple representations from a single video by re-using information from already encoded representations. In this paper, a convolutional neural network is used to speed up both multi-rate and multi-resolution encoding for ABR streaming. Experimental results show that the proposed method for multi-rate encoding can reduce the overall encoding time by 15.08% and parallel encoding time by 41.26%. Simultaneously, the proposed method for multi-resolution encoding can reduce the encoding time by 46.27% for the overall encoding and 27.71% for the parallel encoding on average. You can learn more here.

FaRes-ML flowchart

ECAS-ML: Edge assisted adaptive bitrate switching

As video streaming traffic in mobile networks increases, utilizing edge computing support is a key way to improve the content delivery process. At an edge node, we can deploy ABR algorithms with a better understanding of network behavior and access to radio and player metrics. This project introduces ECAS-ML, Edge Assisted Adaptation Scheme for HTTP Adaptive Streaming with Machine Learning. It uses machine learning techniques to analyze radio throughput traces and balance the tradeoffs between bitrate, segment switches and stalls to deliver a higher QoE, outperforming other client-based and edge-based ABR algorithms. You can learn more here.

ECAS-ML system architecture

Challenges ahead

The road from research to practical implementation is not always quick or direct or even possible in some cases, but fortunately that’s an area where Bitmovin and ATHENA have been working together closely for several years now. Going back to our initial implementation of HEVC encoding in the cloud, we’ve had success using small trials and experiments with Bitmovin’s clients and partners to provide real-world feedback for the ATHENA team, informing the next round of research and experimentation toward creating viable, game-changing solutions. This innovation-to-product cycle is already in progress for the research mentioned above, with promising early quality and efficiency improvements.

Many of the advancements we’re seeing in AI are the result of aggregating lots and lots of processing power, which in turn means lots of energy use. Even with processors becoming more energy efficient, the sheer volume involved in large-scale AI applications means energy consumption can be a concern, especially with increasing focus on sustainability and energy efficiency. From that perspective, for some use cases (like Super Resolution) it will be worth considering the tradeoffs between doing server-side upscaling during the encoding process and client-side upscaling, where every viewing device will consume more power.

Learn more

Want to learn more about Bitmovin’s AI video research and development? Check out the links below.

Analytics Session Interpreter webinar

AI-powered video Super Resolution and Remastering

Super Resolution blog series

Super Resolution with Machine Learning webinar

Athena research

MPEG Meeting Updates

GAIA project blogs

AI Video Glossary

Machine Learning – Machine learning is a subfield of artificial intelligence that deals with developing algorithms and models capable of learning and making predictions or decisions based on data. It involves training these algorithms on large datasets to recognize patterns and extract valuable insights. Machine learning has diverse applications, such as image and speech recognition, natural language processing, and predictive analytics.

Neural Networks – Neural networks are sophisticated algorithms designed to replicate the behavior of the human brain. They are composed of layers of artificial neurons that analyze and process data. In the context of video streaming, neural networks can be leveraged to optimize video quality, enhance compression techniques, and improve video annotation and content recommendation systems, resulting in a more immersive and personalized streaming experience for users.

Super Resolution – Super Resolution upscaling is an advanced technique used to enhance the quality and resolution of images or videos. It involves using complex algorithms and computations to analyze the available data and generate additional details. By doing this, the image or video appears sharper, clearer, and more detailed, creating a better viewing experience, especially on 4K and larger displays.

Graphics Processing Unit (GPU) – A GPU is a specialized hardware component that focuses on handling and accelerating graphics-related computations. Unlike the central processing unit (CPU), which handles general-purpose tasks, the GPU is specifically designed for parallel processing and rendering complex graphics, such as images and videos. GPUs are widely used in various industries, including gaming, visual effects, scientific research, and artificial intelligence, due to their immense computational power.

Video Understanding – Video understanding is the ability to analyze and comprehend the information present in a video. It involves breaking down the visual content, movements, and actions within the video to make sense of what is happening.

The post The AI Video Research Powering a Higher Quality Future appeared first on Bitmovin.

]]>

AI-powered Video Super Resolution and Remastering

Fri, 12 Apr 2024 15:18:37 +0000

AI has been the hot buzz word in tech the past couple of years and we’re starting to see more and more practical applications for video emerging from the hype, like automatic closed-captioning and language translation, automated descriptions and summaries, and AI video Super Resolution upscaling. Bitmovin has especially focused on how AI can provide value for our customers, releasing our AI Analytics Session Interpreter earlier this year and we’re looking closer at several other areas of the end-to-end video workflow.

We’re very proud of how our encoder maintains the visual quality of the source files, while significantly reducing the amount of data used, but now we’re exploring how we can actually improve on the quality of the source file for older and standard definition content. Super Resolution implementations have come a long way in the past few years and have the potential to give older content new life and make it look amazing on Ultra-High Definition screens. Keep reading to learn about Bitmovin’s progress and results.

What is video Super Resolution and how does it work?

Super Resolution refers to the process of enhancing the quality or increasing the resolution of an image or video beyond its original resolution. The original methods of upscaling images and video involved upsampling by using mathematical functions like bilinear and bicubic interpolation to predict new data points in between sampled data points. Some techniques used multiple lower-resolution images or video frames to create a composite higher resolution image or frame. Now AI and machine learning (ML) based methods involve training deep neural networks (DNNs) with large libraries of low and high-resolution image pairs. The networks learn to map the differences between the pairs, and after enough training they are able to accurately generate a high-resolution image from a lower-resolution one.

Bitmovin’s AI video Super Resolution exploration and testing

Super Resolution upscaling is something that Bitmovin has been investigating and testing with customers for several years now. We published a 3-part deep dive back in 2020 that goes into detail about the principles behind Super Resolution, how it can be incorporated into video workflows and the practical applications and results. We won’t fully rehash those posts here, so check them out if you’re interested in the details. But one of the conclusions we came to back then, was that Super Resolution was an especially well-suited application for machine learning techniques. This is even more true now, as GPUs have gotten exponentially more powerful over the past 4 years, while becoming more affordable and accessible as cloud resources.

Nvidia’s GPU computation capabilities over the last 8 years – source: Nvidia GTC 2024 keynote

ATHENA Super Resolution research

Bitmovin’s ATHENA research lab partner has also been looking into various AI video Super Resolution approaches. In a proposed method called DeepStream, they demonstrated how a DNN enhancement-layer could be included with a stream to perform Super Resolution upscaling on playback devices with capable GPUs. The results showed this method could save ~35% bitrate while delivering equivalent quality. See this link for more detail.

Other Super Resolution techniques the ATHENA team has looked at involve upscaling on mobile devices that typically can’t take advantage of DNNs due to lack of processing power and power consumption/battery concerns. Lightweight Super Resolution networks specifically tailored for mobile devices like LiDeR and SR-ABR Net have shown positive early outcomes and performance.

AI-powered video enhancement with Bitmovin partner Pixop

Bitmovin partner Pixop specializes in AI and ML video enhancement and upscaling. They’re also cloud native and fellow members of NVIDIA’s Inception Startup Program. They offer several AI-powered services and filters including restoration, Super Resolution upscaling, denoising, deinterlacing, film grain and frame rate conversion that automate tedious processes that used to be painstaking and time consuming. We’ve found them to be very complementary to Bitmovin’s VOD Encoding and have begun trials with Bitmovin customers.

One application we’re exploring is digital remastering of historic content. We’ve been able to take lower resolution, grainy and generally lower quality content (by today’s standards) through Pixop’s upscaling and restoration, with promising results. The encoded output was not only a higher resolution, but also the application of cropping, graining and color correction resulted in a visually more appealing result, allowing our customer to re-monetize their aged content. The image below shows a side-by-side comparison of remastered content with finer details.

Side-by-side comparison of AI remastered content

Interested in giving your older content new life with the power of AI video Super Resolution? Get in touch here.

Globo, Google Cloud and Bitmovin: Taking Quality to New Heights

Wed, 10 Apr 2024 17:28:53 +0000

Globo’s content and reach

When it comes to content scale and audience reach, Globo is on par with Hollywood and the big US broadcasters with over 3,000 hours of entertainment content being produced each year. The viewership numbers are equally impressive with forty-nine million Brazilians watching the daily, one-hour newscast and Globo’s Digital Hub attracting eight out of ten Brazilians with internet access. The Digital Hub hosts a variety of content categories, from news, sports, and entertainment to live events such as the Olympics, Carnival, and the FIFA World Cup. Globo also runs a subscription video on demand (SVOD) service called Globoplay that streams live sports, licensed content, as well as movies and television series produced by Estúdios Globo, the largest content production studio in Latin America.

Globo standard of quality

Globo has worked hard to build and become known for the “Globo Standard of Quality”. This included creating the optimal viewing experience together with award-winning content, delivered in stunning visual quality. To develop that reputation, Globo became one of the first mainstream broadcasters outside of the US to offer content in 4K, adopting it as a new standard across its platforms and devices. It has already produced hundreds of hours of 4K content (including HDR) with over a thousand hours of encoding output with its telenovelas and original series. The early adoption of 4K is even more impressive for Globo as Brazil is ranking 79th on the list of countries by Internet connection speed. In order to deliver high-quality video, operators cannot just work with higher bitrates but rather have to find the optimal encoder that achieves both quality, speed, and cost-efficiency at the same time. In the past, 4K encoding was accomplished with on-premises hardware encoders. As the next update cycle of the appliances was fast approaching, Igor Macaubas, Head of Online Video Platform, and Lucas Stephanou, Video Platform Product Owner at Globo, decided to conduct a thorough evaluation of vendors, and ultimately chose Bitmovin.

“We are not willing to compromise the visual integrity of our content and we hold ourselves to strict perception-quality standards. Bitmovin’s renowned 3-Pass Encoding exceeded our expectations and ensures that high perceptual quality can still be delivered while streaming at optimal bandwidth levels.”

– Lucas Stephanou (Video Platform Product Owner, Globo)

Globoplay, powered by Bitmovin VOD Encoding on Google Cloud

Globo handles a massive VOD library of over a million titles, and with 12 variants in their HEVC bitrate stack — encoding demands are high. Bitmovin’s VOD encoding service running on Google Cloud gave Globo the capability to encode a 90-minute video asset in 14 minutes across the entire HEVC ladder. This is a realtime factor of 6.4 times, which resulted in a quantifiable impact on time-to-market. Globo saw the business need for fast turnaround time in encodes and chose Bitmovin as the clear front runner in this regard.

Bitmovin VOD Encoding on Google Cloud is an easy-to-use, fully-managed video transcoding software-as-a-service (SaaS). Bitmovin VOD Encoding allows customers to efficiently stream any type of on-demand content to any viewing device. Customers use Bitmovin VOD Encoding for a wide range of on-demand streaming use cases, including Subscription Video on Demand (SVOD), Transactional VOD (TVOD), and Ad-supported VOD (AVOD) services, online training, and other use cases. Bitmovin’s Emmy Award® winning multi-codec outputs and per-scene and per-title content-aware transcoding produce higher visual quality video outputs at lower bit rates than other file-based transcoding SaaS to optimize content delivery and reduce streaming cost. Bitmovin VOD Encoding is available for purchase on Google Cloud Marketplace.

Bitmovin’s 3-Pass Encoding algorithm uses machine learning and AI to examine the video on a scene-by-scene basis. It analyzes the content’s complexity multiple times to optimize intra-frame and inter-frame compression. This helps determine the ideal resolution and bitrate combinations that maximize the quality and efficiency. All together, this ensures the visual elements of the video are not degraded in the encoding process and prevents unnecessary overhead data that might impact the viewing experience.

Processing HD and 4K video with Globo’s volume requires computing resources that would exceed the CapEx budgets of most companies. This is where the Google Cloud’s flexibility and on-demand compute power really shine. Together with Bitmovin’s split-and-stitch technology, single encoding jobs run significantly faster with parallel processing and spikes in demand are handled with ease and throughput that is just not possible with on-premises encoding. Customers also have the option to deploy Bitmovin VOD Encoding as a managed service running in the Bitmovin account or as a single tenancy running in the customer’s Google Cloud account. This allows encoding costs to be applied toward any annual spending commitments.

“Globo is known to set quality standards. We want our viewers to experience our great content in stunning video quality. Our 4K workflows have been relying on hardware encoders, but we wanted to test the power of the cloud and conducted a thorough vendor evaluation based on video quality. Bitmovin’s encoding quality and speed convinced us across the board. And, since using Bitmovin’s encoding service running on Google Cloud, we are spending a fraction of the cost by bringing our capital cost down without spending more on operational cost.”

– Igor Macaubas (Head of Online Video Platform, Globo)

Olympics in 8K

One prime example of this collaboration innovating and pushing the boundaries of video quality is from the Tokyo Olympics in 2021, where 8K VOD content from the Olympics was delivered to viewers at home via Globoplay. This marked the first time that the Olympics were viewable in 8K resolution outside of Japan. 8K video has 16x the resolution of HD and 4x that of 4K, so it requires an enormous amount of processing power and advanced compression to lower the data rates for delivery to end users. 4K and 8K content is also referred to as Ultra High Definition (UHD) and is usually mastered in a High Dynamic Range (HDR) format that allows for brighter highlights, more contrast and a wider color palette. Hybrid-Log Gamma (HLG) is an HDR format that was developed for broadcast applications and backward compatibility with Standard Dynamic Range (SDR) television sets.

After receiving the HLG mastered content from Intel in Japan, Globo utilized Bitmovin VOD Encoding on Google Cloud’s compute instances for efficient parallel processing with Bitmovin’s VOD Encoding API. 8K/60p transcoding was performed using the High Efficiency Video Coding (HEVC) codec, creating an optimized adaptive bitrate ladder. At this stage, Bitmovin’s 3-pass encoding was key for transforming the content into a compatible size for transport over broadband internet connections, without sacrificing the stunning 8K visual quality. The 8K content was then delivered via Globo’s own Content Delivery Network (CDN) infrastructure to subscribers of Globoplay with 8K Samsung TVs.

“Our 3-Pass Encoding proved to be the right encoding mode. It ensured high perceptual quality could still be delivered while streaming at optimal bandwidth levels. With our split-and-stitch technology running on Google Cloud’s scalable infrastructure, we were able to deliver both speed and quality for this time-sensitive content.”

– Stefan Lederer (CEO, Bitmovin)

Learn more about Bitmovin’s VOD Encoding SaaS here.

Split-and-Stitch Encoding with incredible speed, quality and scale

Wed, 13 Mar 2024 17:09:44 +0000

Introduction

In the early days of digital video, encoding a full-length movie could take several hours or even days to complete, depending on the settings and techniques that were used. Over time, as processor speeds increased and specialized hardware was introduced, encoding turnaround times decreased, but it was usually an incremental, linear response to the advancements in technology. Once cloud computing resources became readily available and opened new possibilities, cloud-native encoding services like Bitmovin disrupted the status quo with massive gains for encoding speed and turnaround times. This potential was unlocked by developing an innovative new technique known as split-and-stitch encoding.

What is split-and-stitch encoding?

As the name suggests, split-and-stitch encoding is a method of encoding that involves splitting a file into smaller chunks, encoding those chunks separately, and then stitching them back together. These smaller chunks being encoded in parallel with separate cloud computing resources led to huge leaps in shortening turnaround times. Prior to that, digital videos were processed linearly, which was an unnecessary limitation carried over from film and tape processing workflows, where the physical medium was actually a limiting factor.

How fast is split-and-stitch encoding?

Back in 2015 when Bitmovin first implemented our encoder on the Google Compute Engine (now Google Cloud Platform) we were able to achieve encoding speeds of 66x real-time running in their cloud, as mentioned here. With some further optimization, we became the first to reach 100x real-time encoding speeds.

The actual turn-around times for your encoding jobs will depend on a lot of factors including source format, codec(s), resolution, duration and advanced features like Dolby Vision, but even with very complex 4K HDR workflows, your encodes will run faster than real-time using split and stitch. Below is a real-world example of an H.264/AAC encoding that ran faster than 92x real-time.

Running split-and-stitch encoding in the cloud means your individual encoding jobs run faster than real-time, but it also means that you can scale to run many jobs in parallel which allows large backlogs to be cleared in hours instead of weeks. You also have the capacity to handle spikes of content with no impact on queue time.

What are the advantages of Bitmovin’s split-and-stitch encoding?

Bitmovin has over a decade of experience developing and refining our split-and-stitch implementation. We built our system to take advantage of spot and preemptible instances to keep costs down, while surpassing the quality of single instance encodes with innovations like 3-pass encoding and Smart Chunking. Our intelligent workload orchestration allows you to manage priority and resource scheduling with capacity for thousands of jobs per hour.

Bitmovin also supports using multiple codecs and packaging formats together with split-and-stitch, including H.264 (AVC), H265 (HEVC), VP9 and AV1 with both HLS and DASH, where other platforms may be limited to H.264 and HLS. We’ve also implemented fast decode enhancements for large J2K and ProRes mezzanine source files that reduce the overall turnaround time even further.

What is Smart Chunking?

In 2023, Bitmovin made some key changes and updates to our VOD Encoder with a new feature called Smart Chunking. This further increased the potential visual quality and turnaround times that were possible with split and stitch by decoupling the split-and-stitch chunk duration from the user-defined segment duration. This allows for variable chunk size depending on the type of codec and the complexity of encoding, enabling many immediate improvements and future optimizations. Using Smart Chunking means we can segment chunks at the optimal points with better bitrate distribution, providing more consistent quality without any noticeable dips.

In the graph below, you can see a comparison of an encoding job run with and without Smart Chunking. While the overall quality is similar, in the blue version (without Smart Chunking) there are several lower quality outlier frames. By using Smart Chunking (orange version) the lowest 1% of frames in terms of quality were improved by an average of 6 VMAF points, which is a noticeable difference. The lowest 0.1% improved by 22 VMAF points and the single worst frame gained a massive 60 VMAF points.

Is split-and-stitch always the best approach?

The steps of analyzing, splitting and reassembling chunks of video do add some overhead processing time to the encoding process. For longer episodic content or movies, the added time is negligible compared to the time saved by using split-and-stitch. But, for shorter videos like ads and news clips that are time-sensitive, the pre-processing can make using split-and-stitch less advantageous.

For these cases, Bitmovin has 2 solutions. First, we’ve added support for hardware encoding with Nvidia T4 GPUs. They can deliver the same quality of video encoding, up to four times faster than CPUs, with H.264 (AVC) and H.265 (HEVC) codec support. We also have a new “accelerated mode” that uses pre-warmed cloud compute resources, so you no longer have to wait for new instances to be started. This has made a huge impact on overall encoding job turnaround time, lowering queuing times from minutes to <10 seconds.

Ready to get started with split-and-stitch encoding?

Bitmovin’s split-and-stitch encoding with Smart Chunking is enabled by default and doesn’t require any special configuration. You can get started quickly with our dashboard encoding wizard without any coding required. Get going today with our free trial and see the results for yourself by clicking here!

The post Split-and-Stitch Encoding with incredible speed, quality and scale appeared first on Bitmovin.

]]>

Supercharging Data Insights with AI for Video Analytics

Sat, 24 Feb 2024 02:39:37 +0000

Introduction

At a recent internal hackathon, two of Bitmovin’s software engineers, Myriam Gantner and Thomas Sablattnig, explored whether AI could be used to process the large volume of data captured by Bitmovin Analytics into concise summaries and recommendations. The project was a success and is now being developed into a feature that is now available to Bitmovin customers. Keep reading to learn more about the new Analytics AI Session Interpreter.

Background and motivation

Bitmovin Analytics allows video developers and technicians to track, monitor and analyze their video streams in real-time. It provides insights into user behavior, video player performance and much more. While it’s a valuable companion for Bitmovin’s Encoding and Player products, it can also stand alone and be used with several open source and commercial video players. It has a dedicated dashboard for visual interpretation, but can also export data for your own custom dashboards in products like Grafana or Looker Studio.

Bitmovin Analytics collects a ton of data about the behavior and experience your customers have when watching videos, from simple metrics like play and pause duration to more technical information like video bitrate, DRM license exchange, adaptive bitrate switching and detailed logs around errors. There is a lot of information provided for both individual viewers and overall performance, so analysis can be time-consuming and sometimes overwhelming, especially if you don’t have a lot of technical or background knowledge about the entire video streaming workflow.

What if we could use AI to process all the data and cut through the noise and make it easier to get to the source of streaming problems sooner? Could we make the insights provided more accessible to a wider audience that didn’t have technical expertise or streaming experience? Is it possible to build an analytics AI session interpreter with existing cloud resources? These are the questions we hoped to answer with this project. Keep reading to learn more about the process and results.

Using Google’s Vertex AI + PaLM 2 models

Google’s Vertex AI

Vertex AI is a managed machine learning (ML) platform that leverages Google Cloud’s infrastructure and tools for building and training ML models. It has AutoML capabilities for quick and easy deployment, but also supports custom training for more advanced users and workflows. It integrates with other Google Cloud services like BigQuery and Pub/Sub for end-to-end intelligent data processing and analytics AI workflows.

PaLM 2

PaLM 2 is Google’s next generation large language model that was released mid-2023. It excels at reasoning tasks and can decompose a complex task into simpler subtasks. It understands nuances of human language better than previous models and can perform multilingual translations, even when idioms or ambiguous and figurative language are used.

Initial results

Once the initial workflow and connections were established, the next step was to see how smart it was out of the box. We began by feeding it the analytics data from a streaming session.

The question: “Can you please analyze this video streaming session?”

The answer:

The first event is a play event, which indicates that the user has started playing the video.

The second event is a buffering event, which indicates that the video is buffering.

The third event is a play event, which indicates that the video has finished playing.

Not so insightful. Obviously some more work was needed to get the results we were hoping for, so we began the process of improving and fine tuning the prompt/question and the model’s parameters.

Crafting the “perfect” question

Improving the prompt to get more insightful responses was a multi-step iterative process. We asked questions and verified the accuracy of the results, leading us toward better phrasing of questions for the best outputs. This involved more clearly defining what aspects we wanted the AI to summarize and also asking it to provide recommendations for improvements. We also provided explanations of the properties that were part of the analytics session data and added context about certain metrics, including thresholds and ideal values and ranges for specific metrics (i.e. what is considered a good video startup time). Part of this included descriptions of how these various metrics would impact the viewer experience. We also learned it was better to structure the analytics data in JSON format to make it easier for the AI to interpret.

In the end, our “perfect” question grew to ~150 lines long!

Summary, analysis, and recommend solutions

Here is an example of Analytics session data input and the more useful output we were seeing after refining the question and contextual data:

Session Data

Summarization

The user watched a live stream for 6281 seconds (1.7 hours).

The duration of the live event is unknown.

The video was not paused and there were no buffering events. However, there were 53 quality changes during the session. This could indicate that the network connection was unstable or that the ABR logic was not working properly. The average bitrate of the video was 2179753 bps. This is a relatively high bitrate, which could indicate that the video was of high quality.

Analysis

The video quality changed 53 times, which is a lot. This high number of quality changes in this session could be due to a number of factors. First, the network connection of the user may have been unstable, which could have caused the video to switch between different bitrates in order to maintain a smooth playback experience. Second, the ABR logic of the video player may not have been working properly, which could have also caused the video to switch between different bitrates unnecessarily. The lack of buffer events and buffer time in this session is a good sign. This means that the video did not experience any buffering issues, which would have been a major inconvenience for the user.

Recommendation

To improve the user experience, it would be helpful to investigate the cause of the quality changes. If the network connection is unstable, the user could try using a different network or connecting to a different location. If the ABR logic is set too aggressively, the user could try adjusting the settings. If the video is encoded inefficiently, the video could be re-encoded using a more efficient codec.

User-centric insights

Beyond the text analysis and recommendations, we experimented with other potential options for providing more accessible and concise summaries of viewing sessions. This included engagement, quality of service and quality of experience scores and a “5 star” system that created an aggregate of all metrics into a single rating.

JSON response with scores

{

‘engagement’: 7,

‘qualityOfServiceScore’: 9,

‘qualityOfExperienceScore’: 9

}

5 star rating examples

RATING: 1

Explanation: The video session has a low rating because the buffering time is high. This could be due to a number of factors, such as a slow internet connection or a lack of available bandwidth. The session also has a high buffer event count, which indicates that the user experienced multiple buffering interruptions. This can be a very disruptive experience for users, as it can break the flow of the video and lead to frustration. Finally, the video startup time is also high, which could indicate that the video was not optimized for the user’s device. Overall, this video session has a number of issues that could negatively impact the user experience.

RATING: 5

Explanation: This session had a good video startup time and low buffering time. The quality change rate was also good. The only downside is that the user paused the video for a long time, which may have impacted their engagement with the content.

Audience-specific session analysis

Another potential option for responses is to cater the information for specific personas with a variety of technical backgrounds, objectives and motivations when it comes to analytics data. You can imagine how a software developer, operations technician, product manager and sales or finance executive might want different levels of summarization and areas of focus when it comes to playback issues and how their own goals may be impacted. Just for fun here’s an example session summary that was generated using the voice of a teenager (which may be more accurate for a teenager from the early 2000s rather than today).

“Teenager” session analysis:

“Yo, check it out. This dude watched a live stream on his phone for over an hour. That’s crazy! He must have been really into it. The video quality was pretty good, and there were no buffering issues. But then, after about 40 minutes, the video just stopped working. The dude was pissed. He tried to restart the app, but it didn’t work. He even tried to delete and reinstall the app, but that didn’t work either. So he gave up and just went to bed.

I don’t know what happened, but it’s a bummer that the video stopped working. I’m sure the dude was really enjoying it. But at least he got to watch it for a while.“

Bitmovin Analytics AI Session Interpreter available now

Since the initial hackathon project, we’ve continued fine-tuning the process while doing the background work for connecting our APIs to make the Analytics AI Session Interpreter a reality for Bitmovin Analytics customers.

We know that many companies are sensitive to having their data used by AI models, so we are ensuring:

Customers will need to explicitly enable this feature in their dashboard. Without that permission, no data will ever be sent to AI services.
No customer or user identifiable information will be sent to AI services.
Only encrypted data will be sent to AI services.
AI services will only retain data for the time needed to generate the output.

Sign up today to get started with a free trial of Bitmovin and supercharge your data and insights with our Analytics AI Session Interpreter!

Adaptive Bitrate Streaming Evolved: WISH ABR and Bitmovin’s Player Integration

Fri, 01 Dec 2023 18:15:07 +0000

Video streaming has grown rapidly over the past few years and is the prominent content people engage with online. This puts a lot of pressure on streaming companies as they now have to support a wide range of devices to maximize their viewer reach. Additionally, depending on the device type and the user’s network connectivity, viewers could experience playback issues when streaming. This is where adaptive bitrate streaming (ABR) and the platform’s video player they use on that device come into play, as they help ensure a better viewer experience. However, with WISH ABR, there is now a way to customize the ABR to the user’s device configuration and further improve the quality of experience (QoE).

In this blog, we will go into the essentials of ABR streaming, how WISH ABR is changing that methodology, and what Bitmovin has done to make it available on the Bitmovin Player.

What is adaptive bitrate streaming, how does it work, and what are the benefits?

Adaptive bitrate (ABR) streaming refers to the logic that a video player uses to dynamically adjust the quality of the video stream based on the available bandwidth. It ensures that users receive the best possible viewing experience by continuously adapting the video bitrate to match the network conditions. For example, if the user has a poor connection, it will request lower-quality packets from the edge, and when the connection is healthy again, it will request the highest quality available.

It works by encoding the video content into multiple renditions, each with different bitrates and quality levels. These renditions are further divided into small segments or packets. The video player then evaluates and monitors the network conditions and selects the appropriate rendition for each segment, optimizing for quality and smooth playback. You can read more on the general aspects of Adaptive bitrate streaming in our recent blog.

Diagram of how ABR works per connection type

The Benefits of ABR

Seamless playback:
- ABR ensures viewers experience minimal buffering and interruptions, even in challenging network conditions.
The highest quality possible:
- By dynamically adjusting the video bitrate, ABR delivers the best possible quality while avoiding buffering issues.
Bandwidth efficiency and cost savings:
- ABR optimizes bandwidth usage by adapting the video quality to match the available network capacity, reducing data consumption.
Device compatibility:
- ABR can be utilized across a wide range of devices, including mobile phones, tablets, smart TVs, game consoles, set-top boxes, and browsers.

What is WISH ABR and how is it different?

First developed by Minh Nguyen and the Athena Team and focused on mobile device playback, WISH stands for Weighted Sum model for HTTP Adaptive Streaming and takes the ABR logic one step further. Instead of just adapting it to network conditions, WISH ABR enables the personalization of the ABR to fit specific use cases. This essentially gives platforms the tools to improve QoE by customizing their ABR logic to specific device settings, configurations, types, and other variables that may be common to a streaming platform’s audience. Proving this concept works, in WISH’s testing evaluation, it was able to enhance QoE by up to 17.6% and reduce data usage by 36.4%.

WISH’s logic is based on a mathematical model consisting of three distinct components /cost factors:

Bandwidth cost
- “How much data will it be used for the download?”
Buffer cost
- “How much will the buffer level decrease?”
Quality cost –
- “How much will the video quality decrease?”

The mathematical model that WISH is based on.

The algorithm evaluates each video rendition, judging each value from the “costs” listed above, and selects the one that balances them all the best with the lowest overall cost. WISH lets users adjust this balance based on their preference, like choosing between better video quality or less buffering, depending on their settings.

An example of WISH selecting the perfect rendition at that moment as it evaluates the total cost of the variables (image from the Athena publication)

How did we implement it for the Bitmovin Player?

Seeing how well it performed with mobile devices, we wanted to expand its capabilities and leveraged our collaboration with the Athena team to implement it for our Web SDK. This is important as it could then be used across smart TVs, game consoles, browsers, set-top boxes, and more. After some minor adjustments and refinements, we deployed it successfully. Now, anyone using the Bitmovin Player can test it out for themselves and apply it to their workflow by accessing our API documentation for AdaptationConfig, AdaptationLogicType, and TweaksConfig.

Ongoing Improvements and Testing

We are still defining specific attributes around the new logic before making it a default functionality. However, from our initial testing, we’ve already seen pretty good results, specifically regarding rebuffering (Stall time) and overall QoE (Mean ITU score). In the future, we are looking to list “presets” that enable specific behaviors by device, giving streaming companies an expectation of what they can achieve with each setting, easily enhancing the QoE for their audience. Additionally, the WISH ABR can be utilized with the Bitmovin Player for any industry and will also be the main ABR logic for our latest video player, Player Web X.

Conclusion

ABR functionality has revolutionized the video streaming landscape by making it possible for streams to adapt the video quality to match network conditions dynamically. WISH ABR takes this concept a step further by introducing a user-centric approach to adaptive bitrate selection and optimizing the streaming experience to align with the preferences of individual viewers. The integration of WISH ABR into the Bitmovin Player further enhances the capabilities of our powerful solution, empowering content providers to deliver a superior quality of experience. Now, we can definitely say the future of video streaming is poised to be more personalized, efficient, and immersive.

The post Adaptive Bitrate Streaming Evolved: WISH ABR and Bitmovin’s Player Integration appeared first on Bitmovin.

]]>

Game-Changing Savings with Per-Title Encoding

Mon, 27 Nov 2023 06:09:54 +0000

Introduction

The post will explain how Per-Title Encoding works and the advantages of using Per-Title Encoding compared to using the same bitrate ladder for all your content. Per-Title often requires fewer ABR ladder renditions and lower bitrates that translate into storage, egress and CDN cost savings. It also improves QoE with less buffering and quality drops for viewers, along with better visual quality. On top of that, it can make 4K streaming viable, turning it from a loss leader and financial burden into a revenue generator. Keep reading to learn more.

Per-Title Encoding is key for cutting streaming costs

For the past couple of years, “controlling costs” has been among the top challenges for video streaming, according to the results of Bitmovin’s annual video developer survey. While the pandemic years created a boom for streaming services and content creation, things have now shifted toward cost-cutting in a few different ways. Several platforms have cut back their budgets for original content and are removing shows and films from their libraries to save on licensing and other operational costs.

Another trend highlighted by industry analyst Dan Rayburn, has been the lowering of bitrates, including removal of 4K streaming in some cases. Services that do still offer 4K often restrict it to their highest-priced subscription tier. Back in 2014, Dan called out the cost and QoS challenges services would face when delivering 4K video, and many are still struggling with that reality, especially those using a fixed bitrate ladder for their encoding.

Per-Title Encoding can have a huge impact on 4K content, something that can be seen in the recommended internet connection speeds for 4K streaming:

Netflix: 15 Mbps (they use their own version of per-title encoding)

Disney+: 25 Mbps

Paramount+: 25 Mbps

Max: 50+ Mbps

For long form content that gets even in the tens of thousands views, the difference between 15 Mbps and 25 or 50 Mbps will add up quickly in the form of excess network egress and CDN costs. With non-optimized encoding at those high bitrates, a viral hit that ends up getting hundreds of thousands or millions of views can end up being a financial burden. Using Per-Title Encoding ensures each video uses only the bits needed for its content and complexity and when combined with more advanced codecs like HEVC and AV1, it can make a game-changing difference. When Bitmovin added support for using Per-Title Encoding with the AV1 codec, I was shocked to see just how low the bitrate could go (often under 2 Mbps).

Per-Title Encoding with AV1 can deliver mind-blowing low bitrates

How does Per-Title Encoding work?

In 2012, Bitmovin’s co-founders published a research paper titled “Dynamic Adaptive Streaming over HTTP Dataset” that, among other things, provided data for per-genre encoding that would further evolve into Bitmovin’s Per-Title Encoding. Per-Title Encoding is an optimization of adaptive bitrate encoding that analyzes the complexity of a video file and determines the encoding settings needed to maintain the highest level of visual quality together with the most efficient adaptive bitrate ladder.

Bitmovin’s Per-Title Encoding process

In 2015, Netflix published a tech blog that detailed their research and development of their own per-title encoding. Through brute force encoding of content at different resolutions and quality levels, they found that the ideal adaptive bitrate ladder for each video would form a smooth convex hull when plotting quality vs bitrate. When the bitrate and resolution pairs in their ABR ladder fell along the convex hull, it maximized quality for the viewer and meant that data was being distributed efficiently. Bitmovin’s Per-Title complexity analysis spares you the excessive testing and experimentation and automatically determines the ideal ABR ladder and convex hull for each file.

Per-Title Encoding ABR ladder vs fixed bitrate ladder

The graph below shows how Per-Title Encoding provides better QoE with lower bitrates than the competition’s static bitrate ladder for a 4K source. Per-Title Encoding matches the source at 3840x2160p, with a bitrate of 6050 kbps and a VMAF score of 95.5. The static ladder is capped at 1080p and requires 7830 kbps for a lower VMAF score of 90.9. That’s 22.7% bitrate savings with better quality by using Per-Title.

Per-Title Encoding provides higher quality 4K with a lower bitrate than our customer’s previous 1080p using fixed ABR ladder

The next example uses the HEVC codec for the customer’s UHD ladder vs Bitmovin Per-Title Encoding. The highest rendition on the Per-Title ladder only needs 1.9 Mbps to hit a VMAF score of 94.9, while the static ladder uses 15 Mbps, an increase of 13.1 Mbps in bandwidth for an undetectable VMAF difference. This equates to 87% savings on the CDN bill for viewers of the top rendition, without sacrificing quality.

With a duration of 44:39, the top rendition for Per-Title would mean 0.622 GB in data transfer while the top rendition for the fixed ladder would require 5.023 GB. For popular content tens of thousands of views (or more) those savings add up quickly. At a time when some services are removing 4K renditions, these optimizations make it feasible to provide UHD and improve margins on premium subscription tiers.

For lower complexity content, Per-Title Encoding only needs 2 Mpbs for 4K video, 87% lower than our customer’s previous encoding ladder.

Next we have some medium-complexity 1080p content where using Bitmovin Per-Title with a more advanced codec like HEVC can make a huge difference. Throughout the ladder, using Bitmovin Per-Title with H.264 provides some quality gains and bitrate savings compared to the customer’s static ladder with ffmpeg, but the results from Per-Title with HEVC highlight the impact of using a newer generation codec. HEVC delivers 1080p in the 90+ VMAF range with only 2 Mbps while ffmpeg with H.264 needs over 6.5 Mbps for the same quality. That’s around 70% bandwidth savings for viewers of the top rendition. At the lower end of the spectrum, a viewer with 1 Mbps available bandwidth would be limited to 432p with the static H.264 ladder, but would still receive 1080p with Per-Title HEVC.

For medium-high complexity content, using Per-Title Encoding with HEVC can deliver the same quality with 70% lower bitrate than AVC/H.264.

Storage savings with Per-Title Encoding

Bitmovin’s Per-Title Encoding can deliver massive storage savings when compared to fixed bitrate ladders, by removing unnecessary renditions from the ABR ladder and ensuring the most efficient bitrate is used for each piece of content. The chart below shows the potential savings on your storage bill from using Per-Title Encoding over a fixed ladder with AVC encoding.

Improve quality without increasing bitrates

Per-Title Encoding can also improve quality without needing to use additional data. The chart below references our customer’s fixed ABR ladder using the AVC codec and shows the quality improvements (% VMAF score) that Bitmovin’s Per-Title provided with different codecs at the same bitrate.

Bitmovin Smart Chunking prevents lower quality outlier frames

The graphs below plot the VMAF quality scores of every frame in our customer’s sample content. Bitmovin’s smart chunking virtually eliminates all of the lower quality outlier frames that are present in our competitor’s encoding and would be noticeable by viewers. Smart Chunking is now active for all Bitmovin VOD Encoding without any additional configuration or cost to the user.

Conclusion

In the past, balancing cost and quality has always been a tradeoff, but using Per-Title Encoding may be the single most effective way for streaming services to reduce their total cost of ownership without sacrificing their viewers’ quality of experience. With consumers having an abundance of options, the QoE improvements Per-Title provides can mean the difference between renewal and churn and its cost savings can tip the scales toward profitability. With streaming firmly in a cost conscious era, using Per-Title Encoding makes more sense than ever before.

Ready to see what difference Per-Title Encoding can make with your content? Anyone can test it out for free with no coding required using our VOD encoding wizard. We also have a comparison tool in our dashboard where you can input your own content or use common test videos. Try it out today!

Bitmovin’s VOD Encoding UI allows anyone to use Per-Title encoding with no coding necessary

Choosing the Best Per-Title Encoding Technology

What is Per-Title Encoding and how does it work

How to Create a Per-Title Encoding

Advanced Per-Title configuration

The post Game-Changing Savings with Per-Title Encoding appeared first on Bitmovin.

]]>

Building Better Together with the Bitmovin Innovators Network

Tue, 21 Nov 2023 08:30:22 +0000

Over the past 3-4 years, each industry that streams video has changed significantly. Depending on the company’s size, cloud adoption for streaming workloads before 2020 was driven by leading-edge “builders” with the technical expertise and in-house resources to experiment. Now, we’re in an era of early mainstream adoption led by “buyers” looking for comprehensive, reliable, and best-of-breed solutions that can replace or integrate easily into existing workflows. The need for cost-effective strategies, subscriber retention, and revenue generation have been the main drivers in this.

I saw this shift firsthand throughout each industry show and even more recently at our semi-annual Bitmovin Innovators Network event. There, clients and partners presented trending industry topics and successful use cases from projects we’re working on with them that have helped create and define these “best-of-breed solutions.” In this blog, I will highlight why partner relationships are essential for streaming platforms, recent successes, and what we mean by ‘building better together’ when we speak about Bitmovin’s Innovators Network.

Managing costs and exploring vendor solutions with Paramount+

“Innovate with us,” stated Tony McNamara as he started off the first presentation at the event. He emphasized that vendors should approach their customer relationships as a real-time collaboration rather than viewing the relationship from a transactional customer-vendor perspective. This is because customers are often closer to the actual problem than the vendor, which can incentivize vendors to work together with them to solve complex problems that otherwise would be beyond reach. Some customers will want turn-key solutions, but the innovation and tomorrow’s advances are in solving today’s issues in new ways, which the right “client” can be a partner in coordinating. This can, in turn, help solution providers expand their product features to tackle unique challenges and give them a small but noticeable advantage for at least the next 6-12 months.

Continuing from talking about innovating with vendors, the topic of profitability came up. Economic pressure has been taking its toll on the industry, and as the economy tightens, companies are forced to explore different ways of generating revenue. “It is no longer just growth at any cost.” stated Tony Mcnamara, “It’s about getting to profitability, generating new sources of revenue (FAST, Ad supported tiers), and even looking within the business for ways to optimize processes and reduce spend.”

This had become an objective for their dev team as they evaluated their internal processes, specifically for on-demand (VOD) encoding. They had built an in-house solution to handle their encoding needs, which excels for a rapid turnaround time but doesn’t handle some of the more complex (e.g., HDR) encodes. They decided to split the processes – quick turn work using the internal encoder but for HDR/etc. needs, using Bitmovin – thus meeting their needs, minimizing trade-offs and costs, and allowing the internal development team to focus on their specific domain issues and innovations.

With this, the presentation had come full circle and back to why the “Innovate with us” statement rang true to every partner and streaming service in the room. Paramount had realized it was great to own the process internally. Still, the right vendor or “Partner” in this case, like Bitmovin, made it more accessible and helped avoid a good amount of headaches when it came to cost, functionality, and planning product features for future needs.

Powering Television New Zealand’s sports streaming hub sustainably

As sustainability has become a major talking point in the industry, streaming services worldwide have made optimizing their video workflows for the environment in the best way possible a core focus. This was the main topic for the second presentation by Accedo on TV New Zealand, a public broadcaster that had just recently won a selection of premium sports rights and was focused on deploying the best possible experience for its end users and a greener video workflow for their entire service including the newly acquired sports rights.

Accedo’s local Pro Services team has worked with TVNZ for multiple years and was tasked to test and validate various video players. The choice was obvious: the Bitmovin Player was the only one who delivered on all the criteria. With TVNZ’s goals in mind, they wanted partners to help them achieve their initiatives while giving their users the best viewing experience possible. This is where Accedo and Bitmovin’s “Better Together” collaboration came into play.

Video processing and streaming are responsible for a significant portion of carbon emissions, so Accedo’s launch of their sustainable marketplace and Bitmovin’s ECOMode for the Player were the perfect fit for this project. By collaborating closely, they solved any issues that arose and got the correct stream configuration to help TVNZ reduce carbon emissions, stream content more sustainably across all devices, and provide data on the carbon footprint for active streams to their viewers. Other partners complimented this group solution, namely EZDRM with their multi-DRM, which integrated seamlessly with Bitmovin’s Player and delivered content securely on smart TVs and various other devices, making it easy for TVNZ and streaming services like them to protect revenue from high-value content like live sports.

With TVNZ and other streaming services, Accedo has made sustainability a top business priority. It encompasses both the solutions they offer and the partners they choose to work with, ensuring a positive impact on customers and the world. The multi-vendor “Better Together” collaboration helps OTT streaming services like TVNZ deliver the highest quality content while maintaining their commitment to sustainability.

“We needed to launch a comprehensive streaming platform in a short time frame after securing a selection of premium sports rights. Partnering with Bitmovin and Accedo ensured we were able to put in place a high-quality video streaming solution with the appropriate content protections. Importantly, our viewers were able to watch their favourite sports uninterrupted and with ease from day one. The feedback we’ve received has been fantastic and we’re looking forward to a big summer of cricket ahead of us”

Kym Niblock, TVNZ’s Chief Product and Information Officer.

Enabling a major US basketball league to stream and captivate their viewers globally

In the third and final presentation, Mediakind and Microsoft presented how MediaKind’s live and on-demand streaming platform running on Microsoft Azure infrastructure with the Bitmovin Player helped a major US basketball league stream billions of views to millions of viewers globally. This major US basketball league wanted to revitalize its streaming application, driven by its goals to assert greater control over its platform, harness user data for an improved experience, and, most importantly, continuously captivate and engage its fanbase.

This endeavor represented a significant challenge that involved meticulously piecing together various technical partners to provide a best-of-breed solution. From their perspective, the streaming application project was not only about delivering personalized content but also about laying a foundation for the future of how they would engage their audience.

Bitmovin played a vital role in this complex venture, collaborating closely with Microsoft and MediaKind. This partnership was essential to achieving remarkable results, particularly regarding uptime, reliability, and quality.

This solid foundation allowed the major US basketball league to realize its vision—a globally scaled, secure cloud solution built with dependable infrastructure and cutting-edge data and AI capabilities. With this in place, the league had the tools to innovate, adapt, and provide its fanbase with an extraordinary streaming experience. The success of this solution underscored the importance of the approach in addressing the various requirements of modern streaming applications and utilizing the right partners to build better together and set a new standard for reliability and user satisfaction in the industry.

Why the Bitmovin Innovators Network Matters to OTT Providers

Through all of the presentations, the main item that was portrayed was how building “better together” was the only possible way any of these successful outcomes could have been achieved. The Bitmovin Innovators Network transcends conventional industry collaboration, evolving into a vibrant community that unites top-tier technology vendors, systems integrators, resellers, consultants, and leading-edge research institutions. This collaborative ecosystem is dedicated to streamlining the complexities of live and on-demand media workloads. Our overarching mission is to democratize streaming, making it accessible and efficient for both media and non-media organizations.

In this interconnected network, knowledge-sharing and resource synergy are the driving forces. We harness collective expertise to craft innovative solutions that empower media companies to deliver outstanding content experiences. Simultaneously, we introduce non-media entities to the transformative potential of streaming technology. This network stands as a testament to the power of innovation, simplifying the world of streaming video for all involved parties.

Conclusion

Cooperation, innovation, and sustainability are essential for every streaming service and technology vendor, and finding the right balance between in-house solutions and external offerings is crucial for cost-effective growth. With how increasingly competitive the space is getting and economic pressure taking its toll, it’s clear that multi-vendor collaboration is needed more than ever.

Partnerships like those between Paramount+ and Bitmovin, TVNZ with Bitmovin and Accedo, and the major US basketball league with Microsoft, MediaKind, and Bitmovin demonstrate the power of what collaboration and building better together can do in this industry. The Bitmovin Innovators Network further highlights the importance of this in simplifying streaming challenges and making a more accessible and sustainable future.

You can check out the many partners in the Bitmovin Innovators Network on our website and enquire about joining by contacting our partner team.

The post Building Better Together with the Bitmovin Innovators Network appeared first on Bitmovin.

]]>

PhD video research: From the ATHENA lab to Bitmovin products

Fri, 10 Nov 2023 18:16:47 +0000

Introduction

The story of Bitmovin began with video research and innovation back in 2012, when our co-founders Stefan Lederer and Christopher Mueller were students at Alpen-Adria-Universität (AAU) Klagenfurt. Together with their professor Dr. Christian Timmerer, the three co-founded Bitmovin in 2013, with their research providing the foundation for Bitmovin’s groundbreaking MPEG-DASH player and Per-Title Encoding. Five years later in 2018, a joint project between Bitmovin and AAU called ATHENA was formed, with a new laboratory and research program that would be led by Dr. Timmerer. The aim of ATHENA was to research and develop new approaches, tools and evaluations for all areas of HTTP adaptive streaming, including encoding, delivery, playback and end-to-end quality of experience (QoE). Bitmovin could then take advantage of the knowledge gained to further innovate and enhance its products and services. In the late spring and summer of 2023, the first cohort of ATHENA PhD students completed their projects and successfully defended their dissertations. This post will highlight their work and its potential applications.

Bitmovin co-founders Stefan Lederer, Christopher Mueller, and Christian Timmerer celebrating the opening of the Christian Doppler ATHENA Laboratory with Martin Gerzabek and Ulrike Unterer from the Christian Doppler Research Association. (Photo: Daniel Waschnig)

Video Research Projects

Optimizing QoE and Latency of Live Video Streaming Using Edge Computing and In-Network Intelligence

Dr. Alireza Erfanian

The work of Dr. Erfanian focused on leveraging edge computing and in-network intelligence to enhance the QoE and reduce end-to-end latency in live ABR streaming. The research also addresses improving transcoding performance and optimizing costs associated with running live streaming services and network backhaul utilization.

Optimizing resource utilization – Two new methods ORAVA and OSCAR, utilize edge computing, network function virtualization, and software-defined networking (SDN). At the network’s edge, virtual reverse proxies collect clients’ requests and send them to an SDN controller, which creates a multicast tree to deliver the highest requested bitrate efficiently. This approach minimizes streaming cost and resource utilization while considering delay constraints. ORAVA, a cost-aware approach, and OSCAR, an SDN-based live video streaming method, collectively save up to 65% bandwidth compared to state-of-the-art approaches, reducing OpenFlow commands by up to 78% and 82%, respectively.
Light-Weight Transcoding – These three new approaches utilize edge computing and network function virtualization to significantly improve transcoding efficiency. LwTE is a novel light-weight transcoding approach at the edge that saves time and computational resources by storing optimal results as metadata during the encoding process. It employs store and transcode policies based on popularity, caching popular segments at the edge. CD-LwTE extends LwTE by proposing Cost- and Delay-aware Light-weight Transcoding at the Edge, considering resource constraints, introducing a fetch policy, and minimizing total cost and serving delay for each segment/bitrate. LwTE-Live investigates the cost efficiency of LwTE in live streaming, leveraging the approach to save bandwidth in the backhaul network. Evaluation results demonstrate LwTE processes transcoding at least 80% faster, while CD-LwTE reduces transcoding time by up to 97%, decreases streaming costs by up to 75%, and reduces delay by up to 48% compared to state-of-the-art approaches.

Slides and more detail

Video Coding Enhancements for HTTP Adaptive Streaming using Machine Learning

Dr. Ekrem Çetinkaya

The research of Dr. Çetinkaya involved several applications of machine learning techniques for improving the video coding process across 4 categories:

Fast Multi-Rate Encoding with Machine Learning – These two techniques address the challenge of encoding multiple representations of a video for ABR streaming. FaME-ML utilizes convolutional neural networks to guide encoding decisions, reducing parallel encoding time by 41%. FaRes-ML extends this approach to multi-resolution scenarios, achieving a 46% reduction in overall encoding time while preserving visual quality.
Enhancing Visual Quality on Mobile Devices – These three methods focused on improving visual quality on mobile devices with limited hardware. SR-ABR integrates super-resolution into adaptive bitrate selection, saving up to 43% bandwidth. LiDeR addresses computational complexity, achieving a 428% increase in execution speed while maintaining visual quality. MoViDNN facilitates the evaluation of machine learning solutions for enhanced visual quality on mobile devices.
Light-Field Image Coding with Super-Resolution – This new approach addresses the data size challenge of light field images in emerging media formats. LFC-SASR utilizes super-resolution to reduce data size by 54%, ensuring a more immersive experience while preserving visual quality.
Blind Visual Quality Assessment Using Vision Transformers – A new technique, BQ-ViT, tackles the blind visual quality assessment problem for videos. Leveraging the vision transformer architecture, BQ-ViT achieves a high correlation (0.895 PCC) in predicting video visual quality using only the encoded frames.

Slides and more detail

Policy-driven Dynamic HTTP Adaptive Streaming Player Environment

Dr. Minh Nguyen

The work of Dr. Ngyuen addressed critical issues impacting QoE in adaptive bitrate (ABR) streaming, with four main contributions:

Days of Future Past Plus (DoFP+) – This approach uses HTTP/3 features to enhance QoE by upgrading low-quality segments during streaming sessions, resulting in a 33% QoE improvement and a 16% reduction in downloaded data.
WISH ABR – This is a weighted sum model that allows users to customize their ABR switching algorithm by specifying preferences for parameters like data usage, stall events, and video quality. WISH considers throughput, buffer, and quality costs, enhancing QoE by up to 17.6% and reducing data usage by 36.4%.
WISH-SR – This is an ABR scheme that extends WISH by incorporating a lightweight Convolutional Neural Network (CNN) to improve video quality on high-end mobile devices. It can reduce downloaded data by up to 43% and enhance visual quality with client-side Super Resolution upscaling.
New CMCD Approach – This new method for determining Common Media Client Data (CMCD) parameters, enables the server to generate suitable bitrate ladders based on clients’ device types and network conditions. This approach reduces downloaded data while improving QoE by up to 2.6 times

Slides and more detail

Multi-access Edge Computing for Adaptive Video Streaming

Dr. Jesús Aguilar Armijo

The network plays a crucial role for video streaming QoE and one of the key technologies available on the network side is Multi-access Edge Computing (MEC). It has several key characteristics: computing power, storage, proximity to the clients and access to network and player metrics, that make it possible to deploy mechanisms at the MEC node to assist video streaming.

This thesis of Dr. Aguilar Armijo investigates how MEC capabilities can be leveraged to support video streaming delivery, specifically to improve the QoE, reduce latency or increase savings on storage and bandwidth.

ANGELA Simulator – A new simulator is designed to test mechanisms supporting video streaming at the edge node. ANGELA addresses issues in state-of-the-art simulators by providing access to radio and player metrics, various multimedia content configurations, Adaptive Bitrate (ABR) algorithms at different network locations, and a range of evaluation metrics. Real 4G/5G network traces are used for radio layer simulation, offering realistic results. ANGELA demonstrates a significant simulation time reduction of 99.76% compared to the ns-3 simulator in a simple MEC mechanism scenario.
Dynamic Segment Repackaging at the Edge – The proposal suggests using the Common Media Application Format (CMAF) in the network’s backhaul, performing dynamic repackaging of content at the MEC node to match clients’ requested delivery formats. This approach aims to achieve bandwidth savings in the network’s backhaul and reduce storage costs at the server and edge side. Measurements indicate potential reductions in delivery latency under certain expected conditions.
Edge-Assisted Adaptation Schemes – Leveraging radio network and player metrics at the MEC node, two edge-assisted adaptation schemes are proposed. EADAS improves ABR decisions on-the-fly to enhance clients’ Quality of Experience (QoE) and fairness. ECAS-ML shifts the entire ABR algorithm logic to the edge, managing the tradeoff among bitrate, segment switches, and stalls through machine learning techniques. Evaluations show significant improvements in QoE and fairness for both schemes compared to various ABR algorithms.
Segment Prefetching and Caching at the Edge – Segment prefetching, a technique transmitting future video segments closer to the client before being requested, is explored at the MEC node. Different prefetching policies, utilizing resources and techniques such as Markov prediction, machine learning, transrating, and super-resolution, are proposed and evaluated. Results indicate that machine learning-based prefetching increases average bitrate while reducing stalls and extra bandwidth consumption, offering a promising approach to enhance overall performance.

Slides and more detail

Potential applications for Bitmovin products

The WISH ABR algorithm presented by Dr. Nguyen is already available in the Bitmovin Web Player SDK as of version 8.136.0, which was released in early October 2023. It can be enabled via AdaptationConfig.logic. Use of CMCD metadata is still gaining momentum throughout the industry, but Bitmovin and Akamai have already demonstrated a joint solution and the research above will help improve our implementation.

Bitmovin has experimented with server-side Super Resolution upscaling with some customers, mainly focusing on upscaling SD content to HD for viewing on TVs and larger monitors, but the techniques investigated by Dr. Çetinkaya take advantage of newer models that can extend Super Resolution to the client side on mobile devices. These have the potential to reduce data usage which is especially important to users with limited data plans and bandwidth. They can also improve QoE and visual quality while saving service providers on delivery costs.

Controlling costs has been at or near the top of the list of challenges video developers and streaming service providers have faced over the past couple of years according to Bitmovin’s annual Video Developer Report. This trend will likely continue into 2024 and the resource management and transcoding efficiency improvements developed by Dr. Erfanian will help optimize and reduce operational costs for Bitmovin and its services.

Edge computing is becoming more mainstream, with companies like Bitmovin partners Videon and Edgio delivering new applications that take advantage of available compute resources closer to the end user. The contributions developed by Dr. Aguilar Armijo address different facets of content delivery and provide a comprehensive approach to optimizing video streaming in edge computing environments. This has the potential to provide more actionable analytics data and enable more intelligent and robust adaptation during challenging network conditions.

Conclusion

Bitmovin was born from research and innovation and 10 years later is still breaking new ground. We were honored to receive a Technology & Engineering Emmy Award for our efforts and remain committed to improving every part of the streaming experience. Whether it’s taking advantage of the latest machine learning capabilities or developing novel approaches for controlling costs, we’re excited for what the future holds. We’re also grateful for all of the researchers, engineers, technology partners and customers who have contributed along the way and look forward to the next 10 years of progress and innovation.

The post PhD video research: From the ATHENA lab to Bitmovin products appeared first on Bitmovin.

]]>

Unlocking the Highest Quality of Experience with Common-Media-Client-Data (CMCD) – What Is It and What Are the Benefits

Thu, 14 Sep 2023 15:23:14 +0000

As video workflows get more detailed, companies face numerous challenges in delivering a seamless viewing experience to their audiences. One of the biggest hurdles is the ability to make sense of disjointed sets of information from different points in the video delivery workflow. When a client experiences buffering or other playback issues, it can be difficult to pinpoint the root cause within a workflow. Do You rack your brain wondering if it’s a problem with the manifest, the client’s Adaptive Bitrate (ABR) algorithm, or the Content Delivery Network (CDN)? To create a clearer picture for streaming platforms and the CDNs delivering the content, this is where Common-Media-Client-Data (CMCD) comes into play.

What is CMCD and Why is it Important?

CMCD is an open specification and tool developed by the Web Application Video Ecosystem (WAVE) project launched by the Consumer Technology Association (CTA). Its focus is to allow media players to communicate data back to CDNs during video streaming sessions. It provides a standardized protocol for exchanging information between the client and the CDN, bridging the gap between client-side quality of experience (QOE) metrics and server-side quality of service (QOS) data. By providing the transmission of this detailed data and information, CMCD-enabled video streaming services can facilitate better troubleshooting, optimization, and dynamic delivery adjustments by CDNs.

With CMCD, media clients can send key-value pairs of data to CDNs, providing valuable insights into the streaming session. This data includes information such as encoded bitrate, buffer length, content ID, measured throughput, session ID, playback rate, and more. By capturing and analyzing this data, CDNs can gain a deeper understanding of the client’s streaming experience and make informed decisions to improve performance and address any issues.

What data is tracked and how is data sent and processed with CMCD?

The data points for CMCD are thorough, giving you the detailed metrics you need to verify your viewer’s experience along with how to optimize it. The metrics include:

Encoded bitrate
Buffer length
Buffer starvation
Content ID
Object duration
Deadline
Measured throughput
Next object request
Next range request
Object type
Playback rate
Requested maximum throughput
Streaming format
Session ID
Stream type
Startup
Top bitrate

There are three common methods for sending CMCD data from the client to the CDN: custom HTTP request headers, HTTP query arguments, or JSON objects independent of the HTTP request. The choice of method depends on the player’s capabilities and the CDN’s processing requirements and could also differ by platform. In browsers, HTTP query arguments are preferred over HTTP request headers as headers would cause OPTIONS requests in addition to see if the CDN allows the usage of these headers, adding additional round-trip times. Other platforms like Android don’t have this limitation.

It is recommended to sequence the key-value pairs in alphabetical order to reduce the fingerprinting surface exposed by the player. Additionally, including a session ID (sid) and content ID (cid) with each request can aid in parsing and filtering through CDN logs for specific session and content combinations.

The Role of CMCD in Video Streaming Optimization

CMCD plays a crucial role in optimizing video streaming by enabling comprehensive data analysis and real-time adjustments. Combining client-side data with CDN logs, CMCD allows for the correlation of metrics and the identification of issues that affect streaming performance. This holistic view empowers CDNs to take proactive measures to address buffering, playback stalls, or other quality issues.

With CMCD, CDNs can segment data based on Live and Video on Demand (VOD) content, monitor CDN performance, identify specific subscriber sessions, and track the journey of media objects from the CDN to the player and screen. This level of insight enables CDNs to optimize content delivery, manage bandwidth allocation, and ensure a smooth and consistent streaming experience for viewers.

Adoption of CMCD in the Industry

Akamai and Bitmovin CMCD Workflow

The adoption and implementation of CMCD in video workflows are still developing. Many in the video streaming industry are evaluating it at the moment but haven’t made significant moves. However, there are notable players in the market who have taken the lead in incorporating CMCD into their platforms. One such example is Akamai, a prominent CDN provider. Akamai has been actively working on CMCD in collaboration with the Bitmovin Player.

Live Demo

Together, Akamai and Bitmovin have developed a demo presenting the capabilities and benefits of CMCD. The demo shows how CMCD data can be sent by the Bitmovin Player to the CDN.

What are the benefits of CMCD and how can it be implemented on the Bitmovin Player?

As listed above, there are clear benefits to implementing CMCD for video playback. Some of the benefits of CMCD that can be achieved with the Bitmovin player are:

Troubleshooting errors and finding root causes faster
- CMCD makes Player sessions visible in CDN logs so you can trace error sessions through the Player and CDN to quickly find the root cause, reducing the cost associated with users experiencing errors on your platform.
Combine Playback sessions and CDN logs with common session & content identifiers
- Improve your operational monitoring by giving a clearer view of content requests from Player and how those are handled by the CDN.
Improve the quality of experience and reduce rebuffering by enabling pre-fetching
- Through CMCD, the CDN is aware of the Player’s current state and the content it most likely needs next. This allows the CDN to prepare and deliver the next packet the Player needs faster, reducing the time your viewers are waiting.
Integration with Bitmovin’s Analytics
- Monitor every single user session and gain granular data on audience, quality, and ad metrics that ensure a high quality of experience for viewers while helping you pinpoint error sessions rapidly with CMCD data.

As Bitmovin is continuing to explore CMCD’s capabilities, we’ve made it easy to set up and deploy into video workflows through our Github. If you’re wondering how it should be working or want to see it before taking the steps to implement it, you can check out our Bitmovin Web Player Samples.

Additionally, if you have any questions or have any feedback on our experience using it, join our Bitmovin Developer community and comment on the running dialog around our CMCD implementation.

Future Implications and Industry Outlook

While CMCD is still in its early stages of adoption, its potential impact on the video streaming industry is significant. As more embrace CMCD, the ability to gather and analyze comprehensive data will become a standard practice and its benefits will become increasingly evident. This data-driven approach will enable continuous improvements in streaming performance and video workflows. This was a major reason that we at Bitmovin took this project on as transparency is key and CMCD makes the issues easier to find and address, increasing viewer and client satisfaction.

Interest in CMCD will continue to grow with new implementations and use cases, leading the industry to realize the gains from reducing buffering and delivering better, streams to viewers. Our partnership with Akamai is just one step in how we are committed to advancing video streaming technology for content providers and providing a seamless viewing experience for audiences worldwide.

The post Unlocking the Highest Quality of Experience with Common-Media-Client-Data (CMCD) – What Is It and What Are the Benefits appeared first on Bitmovin.

]]>

Innovation – Bitmovin

The AI Video Research Powering a Higher Quality Future

AI for video at NAB 2024

FaRes-ML granted US Patent

Recent Bitmovin and ATHENA AI Research

Generative AI for Adaptive Video Streaming

DeepVCA: Deep Video Complexity Analyzer

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

Previous Bitmovin and ATHENA AI Research

Better quality with neural network-driven Super Resolution upscaling

Less buffering and higher QoE with applied machine learning

Challenges ahead

Learn more

AI Video Glossary

AI-powered Video Super Resolution and Remastering

What is video Super Resolution and how does it work?

Bitmovin’s AI video Super Resolution exploration and testing

ATHENA Super Resolution research

AI-powered video enhancement with Bitmovin partner Pixop

Related Links

Globo, Google Cloud and Bitmovin: Taking Quality to New Heights

Globo standard of quality

Globoplay, powered by Bitmovin VOD Encoding on Google Cloud

Olympics in 8K

Related Links

Split-and-Stitch Encoding with incredible speed, quality and scale

What is split-and-stitch encoding?

How fast is split-and-stitch encoding?

What are the advantages of Bitmovin’s split-and-stitch encoding?

What is Smart Chunking?

Is split-and-stitch always the best approach?

Ready to get started with split-and-stitch encoding?

Supercharging Data Insights with AI for Video Analytics

Background and motivation

Using Google’s Vertex AI + PaLM 2 models

Google’s Vertex AI

PaLM 2

Initial results

Crafting the “perfect” question

Summary, analysis, and recommend solutions

Session Data

Summarization

Analysis

Recommendation

User-centric insights

JSON response with scores

5 star rating examples

Audience-specific session analysis

Bitmovin Analytics AI Session Interpreter available now

Related resources and links

Adaptive Bitrate Streaming Evolved: WISH ABR and Bitmovin’s Player Integration

What is adaptive bitrate streaming, how does it work, and what are the benefits?

The Benefits of ABR

What is WISH ABR and how is it different?

How did we implement it for the Bitmovin Player?

Ongoing Improvements and Testing

Conclusion

Game-Changing Savings with Per-Title Encoding

Table of Contents

Introduction

Per-Title Encoding is key for cutting streaming costs

How does Per-Title Encoding work?

Per-Title Encoding ABR ladder vs fixed bitrate ladder

Storage savings with Per-Title Encoding

Improve quality without increasing bitrates

Bitmovin Smart Chunking prevents lower quality outlier frames

Conclusion

Related Links

Building Better Together with the Bitmovin Innovators Network

Managing costs and exploring vendor solutions with Paramount+

Powering Television New Zealand’s sports streaming hub sustainably

Enabling a major US basketball league to stream and captivate their viewers globally

Why the Bitmovin Innovators Network Matters to OTT Providers

Conclusion

PhD video research: From the ATHENA lab to Bitmovin products

Table of Contents

Introduction

Video Research Projects

Optimizing QoE and Latency of Live Video Streaming Using Edge Computing and In-Network Intelligence

Dr. Alireza Erfanian