Wednesday, July 27, 2016

MPEG Survey on Virtual Reality

As mentioned in my previous blog post, virtual reality is becoming a hot topic across the industry (and also academia) which also reaches standards developing organizations like MPEG. MPEG established an Ad-hoc Group on MPEG-VR (open to everyone) which published a survey on virtual reality. The survey is open until August 18, 2016 and available here...



Within Bitmovin we're working on this topic in a Web context and a demo is available here.

Tuesday, July 19, 2016

MPEG news: a report from the 115th meeting, Geneva, Switzerland


The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
MPEG News Archive
The 115th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:
  • MPEG issues Genomic Information Compression and Storage joint Call for Proposals in conjunction with ISO/TC 276/WG 5
  • Plug-in free decoding of 3D objects within Web browsers
  • MPEG-H 3D Audio AMD 3 reaches FDAM status
  • Common Media Application Format for Dynamic Adaptive Streaming Applications
  • 4th edition of AVC/HEVC file format
In this blog post, however, I will cover topics specifically relevant for adaptive media streaming, namely:
  • Recent developments in MPEG-DASH
  • Common media application format (CMAF)
  • MPEG-VR (virtual reality)
  • The MPEG roadmap/vision for the future.

MPEG-DASH Server and Network assisted DASH (SAND): ISO/IEC 23009-5

Part 5 of MPEG-DASH, referred to as SAND – server and network-assisted DASH – has reached FDIS. This work item started sometime ago at a public MPEG workshop during the 105th MPEG meeting in Vienna. The goal of this part of MPEG-DASH is to enhance the delivery of DASH content by introducing messages between DASH clients and network elements or between various network elements for the purpose of improving the efficiency of streaming sessions by providing information about real-time operational characteristics of networks, servers, proxies, caches, CDNs as well as DASH client’s performance and status. In particular, it defines the following:
  1. The SAND architecture which identifies the SAND network elements and the nature of SAND messages exchanged among them.
  2. The semantics of SAND messages exchanged between the network elements present in the SAND architecture.
  3. An encoding scheme for the SAND messages.
  4. The minimum to implement a SAND message delivery protocol.
The way that this information is to be utilized is deliberately not defined within the standard and left open for (industry) competition (or other standards developing organizations). In any case, there’s plenty of room for research activities around the topic of SAND, specifically:
  • A main issue is the evaluation of MPEG-DASH SAND in terms of qualitative and quantitative improvements with respect to QoS/QoE. Some papers are available already and have been published within ACM MMSys 2016.
  • Another topic of interest includes an analysis regarding scalability and possible overhead; in other words, I'm wondering whether it's worth using SAND to improve DASH.

MPEG-DASH with Server Push and WebSockets: ISO/IEC 23009-6

Part 6 of MPEG-DASH reached DIS stage and deals with server push and Web sockets, i.e., it specifies the carriage of MPEG-DASH media presentations over full duplex HTTP-compatible protocols, particularly HTTP/2 and WebSocket. The specification comes with a set of generic definitions for which bindings are defined allowing its usage in various formats. Currently, the specification supports HTTP/2 and WebSocket.

For the former it is required to define the push policy as an HTTP header extension whereas the latter requires the definition of a DASH subprotocol. Luckily, these are the preferred extension mechanisms for both HTTP/2 and WebSocket and, thus, interoperability is provided. The question of whether or not the industry will adopt these extensions cannot be answered right now but I would recommend keeping an eye on this and there are certainly multiple research topics worth exploring in the future.

An interesting aspect for the research community would be to quantify the utility of using push methods within dynamic adaptive environments in terms of QoE and start-up delay. Some papers provide preliminary answers but a comprehensive evaluation is missing.

To conclude the recent MPEG-DASH developments, the DASH-IF recently established the Excellence in DASH Award at ACM MMSys’16 and the winners are presented here (including some of the recent developments described in this blog post).

Common Media Application Format (CMAF): ISO/IEC 23000-19

The goal of CMAF is to enable application consortia to reference a single MPEG specification (i.e., a “common media format”) that would allow a single media encoding to use across many applications and devices. Therefore, CMAF defines the encoding and packaging of segmented media objects for delivery and decoding on end user devices in adaptive multimedia presentations. This sounds very familiar and reminds us a bit on what the DASH-IF is doing with their interoperability points. One of the goals of CMAF is to integrate HLS in MPEG-DASH which is backed up with this WWDC video where Apple announces the support of fragmented MP4 in HLS. The streaming of this announcement is only available in Safari and through the WWDC app but Bitmovin has shown that it also works on Mac iOS 10 and above, and for PC users all recent browser versions including Edge, FireFox, Chrome, and (of course) Safari.

MPEG Virtual Reality

Virtual reality is becoming a hot topic across the industry (and also academia) which also reaches standards developing organizations like MPEG. Therefore, MPEG established an ad-hoc group (with an email reflector) to develop a roadmap required for MPEG-VR. Others have also started working on this like DVB, DASH-IF, and QUALINET (and maybe many others: W3C, 3GPP). In any case, it shows that there’s a massive interest in this topic and Bitmovin has shown already what can be done in this area within today’s Web environments. Obviously, adaptive streaming is an important aspect for VR applications including a many research questions to be addressed in the (near) future. A first step towards a concrete solution is the Omnidirectional Media Application Format (OMAF) which is currently at working draft stage (details to be provided in a future blog post).

The research aspects covers a wide range activity including - but not limited to - content capturing, content representation, streaming/network optimization, consumption, and QoE.

MPEG roadmap/vision

At it’s 115th meeting, MPEG published a document that lays out its medium-term strategic standardization roadmap. The goal of this document is collecting feedback from anyone in professional and B2B industries dealing with media, specifically but not limited to broadcasting, content and service provision, media equipment manufacturing, and telecommunication industry. The roadmap is depicted below and further described in the document available here. Please note that “360 AV” in the figure below also refers to VR but unfortunately it’s not (yet) reflected in the figure. However, it points out the aspects to be addressed by MPEG in the future which would be relevant for both industry and academia.


The next MPEG meeting will be held in Chengdu, October 17-21, 2016.

Wednesday, December 9, 2015

Real-Time Entertainment now accounts for >70% of the Internet Traffic

Sandvine's Global Internet Phenomena Report (December 2015 edition) reveals that real-time entertainment (i.e., streaming video and audio) traffic now accounts for more than 70% of North American downstream traffic in the peak evening hours on fixed access networks (see Figure 1). Interestingly, five years ago it accounted only for less than 35%.

Netflix is mainly responsible for this with a share of >37% (i.e., more than the total five years ago) but already had a big share in 2011 (~32%) and didn't "improve" that much. Second biggest share is coming from YouTube with roughly 18%.

I'm using these figures within my slides to motivate that streaming video and audio is a huge market - opening a lot of opportunities for research and innovation - and it's interesting to see how the Internet is being used. In most of these cases, the Internet is used as is, without any bandwidth guarantees and clients adapt themselves to what's available in terms of bandwidth. Service providers offer the content in multiple versions (e.g., different bitrates, resolution, etc.) and each version is segmented to which clients can adapt both at the beginning and also during the session. This principle is known as over-the-top adaptive video streaming and a standardized representation format is available known as Dynamic Adaptive Streaming over HTTP (DASH) under ISO/IEC 23009. Note that the adaptation logic is not part of the standard and open a punch of possibilities in terms of research and engineering.

Both Netflix and YouTube adopted the DASH format which is now natively supported by modern Web browsers thanks to the HTML5 Media Source Extensions (MSE) and even digital rights management is possible due to Encrypted Media Extensions (EME). All one needs is a client implementation that is compliant to the standard - the easy part; the standard is freely available - and adapts to the dynamically changing usage context while maximizing the Quality of Experience (QoE) - the difficult part. That's why we at bitmovin thought to setup a grand challenge at IEEE ICME 2016 in Seattle, USA with the aim to solicit contributions addressing end-to-end delivery aspects which improve the QoE while optimally utilising the available network infrastructures and its associated costs. This includes the content preparation for DASH, the content delivery within existing networks, and the client implementations. Please feel free to contribute to this exciting problem and if you have further questions or comments, please contact us here.

Thursday, November 12, 2015

Final Call for Papers: ACM MMSys 2016 Full Papers


The autumn shows itself from its best side here in Klagenfurt and this is the final call for papers for ACM MMSys 2016 full papers with YouTube as gold sponsor and featuring the Excellence in DASH Award sponsored by the DASH-IF.

ACM MMSys 2016
May 10-13, 2016
Klagenfurt am Wörthersee, Austria

The ACM Multimedia Systems Conference (MMSys) provides a forum for researchers to present and share their latest research findings in multimedia systems. While research about specific aspects of multimedia systems are regularly published in the various proceedings and transactions of the networking, operating system, realtime system, and database communities, MMSys aims to cut across these domains in the context of multimedia data types. This provides a unique opportunity to view the intersections and the inter-play of the various approaches and solutions developed across these domains to deal with multimedia data types.

MMSys is a venue for researchers who explore:
  • Complete multimedia systems that provide a new kind of multimedia experience or systems whose overall performance improves the state-of-the-art through new research results in one of more components, or
  • Enhancements to one or more system components that provide a documented improvement over the state-of-the-art for handling continuous media or time-dependent services.
Such individual system components include:
  • Operating systems
  • Distributed architectures and protocol enhancements
  • Domain languages, development tools and abstraction layers
  • Using new architectures or computing resources for multimedia
  • New or improved I/O architectures or I/O devices, innovative uses and algorithms for their operation
  • Representation of continuous or time-dependent media
  • Metrics, measures and measurement tools to assess performance
This touches aspects of many hot topics including but not limited to: adaptive streaming, games, virtual environments, augmented reality, 3D video, Ultra-HD, HDR, immersive systems, plenoptics, 360° video, multimedia IoT, multi- and many-core, GPGPUs, mobile streaming, P2P, clouds, cyber-physical systems.

Submission Guidelines
Papers should be between 6 and 12 pages long (in PDF format) prepared in the ACM style and written in English. The submission site is open and papers can be submitted using the following URL: http://mmsys2016.itec.aau.at/online-paper-submission/

Important dates:
  • Submission Deadline: November 27, 2015 December 11, 2015
  • Reviews available to Authors: January 15, 2016
  • Rebuttal Deadline: January 22, 2016
  • Acceptance Notification: January 29, 2016
  • Camera-ready Deadline: March 11, 2016
DASH Industry Forum Excellence in DASH Award
This award offers a financial prize for those papers which best meet the following requirements:
  1. Paper must substantially address MPEG-DASH as the presentation format
  2. Paper must be selected for presentation at ACM MMSys 2016
  3. Preference given to practical enhancements and developments which can sustain future commercial usefulness of DASH
  4. DASH format used should conform to the DASH IF Interoperability Points as defined by http://dashif.org/guidelines/
Further details about the Excellence in DASH Award can be found here.



Friday, September 4, 2015

HEVC, AOMedia, MPEG, and DASH

Ultra-high definition (UHD) displays are available for quite some time and in terms of video coding the MPEG-HEVC/H.265 standard was designed to support these high resolutions in an efficient way. And it does, with a performance gain of more than twice as much as its predecessor MPEG-AVC/H.264. But it all comes with costs - not only in terms of coding complexity at both encoder and decoder - especially when it comes to licensing. The MPEG-AVC/H.264 licenses are managed by MPEG LA but for HEVC/H.265 there are two patent pools available which makes its industry adoption more difficult than it was for AVC.

HEVC was published by ISO in early 2015 and in the meantime MPEG started discussing about future video coding using its usual approach of open workshops inviting experts from companies inside and outside of MPEG. However, now there’s the Alliance for Open Media (AOMedia) promising to provide "open, royalty-free and interoperable solutions for the next generation of video delivery” (press release). A good overview and summary is available here which even mentions that a third HEVC patent pool is shaping up (OMG!).

Anyway, even if AOMedia’s "media codecs, media formats, and related technologies” are free like in “free beer” it’s still not clear whether it will taste anything good. Also, many big players are not part of this alliance and could (easily) come up with some patent claims at a later stage jeopardising the whole process (cf. what happened with VP9). In any case, AOMedia is certainly disruptive and together with other disruptive media technologies (e.g., PERSEUS although I have some doubts here) might change the media coding landscape, not clear whether it will be a turn to the better though...

Finally, I was wondering how does this all impact DASH, specifically as MPEG LA recently announced that they want to establish a patent pool for DASH although major players have stated some time ago not to charge anything for DASH (wrt licensing). In terms of media codecs please note that DASH is codec agnostic and it can work with any codec, also those not specified within MPEG and we’ve shown it works some time ago already (using WebM). The main problem is, however, which codecs are supported on which end user devices and how to access them with which API (like HMTL5 & MSE). For example, some Android devices support HEVC but not through HMTL5 & MSE which makes it more difficult to integrate with DASH.

Using MPEG-DASH with HMTL5 & MSE is currently the preferred way how to deploy DASH, even the DASH-IF’s reference player (dash.js) is assuming HTML5 & MSE and companies like bitmovin are offering bitdash following the same principles. Integrating new codecs on the DASH encoding side like on bitmovin’s bitcodin cloud-based transcoding-as-a-service isn’t a big deal and can be done very quickly as soon as software implementations are available. Thus, the problem is more on the plethora of heterogeneous end user devices like smart phones, tablets, laptops, computers, set-top-boxes, TV sets, media gateways, gaming consoles, etc. and their variety of platforms and operating systems.

Therefore, I’m wondering whether AOMedia (or whatever will come in the future) is a real effort changing the media landscape to the better or just another competing standard to choose from … but on the other side, as Andrew S. Tanenbaum has written already in his book on computer networks, “the nice thing about standards is that you have so many to choose from.”

Monday, August 31, 2015

Over-the-Top Content Delivery: State of the Art and Challenges Ahead at ICME 2015

As stated in my MPEG report from Warsaw I attended ICME'15 in Torino to give a tutorial -- together with Ali Begen -- about over-the-top content delivery. The slides are available as usual and embedded here...


If you have any questions or comments, please let us know. The goal of this tutorial is to give an overview about MPEG-DASH and also selected informative aspects (e.g., workflows, adaptation, quality, evaluation) not covered in the standard. However, it should not be seen as a tutorial on the standard as many approaches presented here can be also applied on other formats although MPEG-DASH seems to be the most promising from those available. During the tutorial we ran into interesting questions and discussions with the audience and I could also show some live demos from bitmovin using bitcodin and bitdash. Attendees were impressed about the maturity of the technology behind MPEG-DASH and how research results find their way into actual products available on the market.

If you're interested now, I'll give a similar tutorial -- with Tobias Hoßfeld -- about "Adaptive Media Streaming and Quality of Experience Evaluations using Crowdsourcing" during ITC27 (Sep 7, 2015, Ghent, Belgium) and bitmovin will be at IBC2015 in Amsterdam.


Friday, August 28, 2015

One Year of MPEG

In my last MPEG report (index) I’ve mentioned that the 112th MPEG meeting in Warsaw was my 50th MPEG meeting which roughly accumulates to one year of MPEG meetings. That is, one year of my life I've spend in MPEG meetings - scary, isn't it? Thus, I thought it’s time to recap what I have done in MPEG so far featuring the following topics/standards where I had significant contributions:
  • MPEG-21 - The Multimedia Framework 
  • MPEG-M - MPEG extensible middleware (MXM), later renamed to multimedia service platform technologies 
  • MPEG-V - Information exchange with Virtual Worlds, later renamed to media context and control
  • MPEG-DASH - Dynamic Adaptive Streaming over HTTP

MPEG-21 - The Multimedia Framework

I started my work with standards, specifically MPEG, with Part 7 of MPEG-21 referred to as Digital Item Adaptation (DIA) and developed the generic Bitstream Syntax Description (gBSD) in collaboration with SIEMENS which allows for a coding-format independent (generic) adaptation of scalable multimedia content towards the actual usage environment (e.g., different devices, resolution, bitrate). The main goal of DIA was to enable the Universal Media Access (UMA) -- any content, anytime, anywhere on any device -- and also motivated me to start this blog. I also wrote a series of blog entries on this topic: O Universal Multimedia Access, Where Art Thou? which gives an overview about this topic and basically is also what I’ve done in my Ph.D. thesis. Later I helped a lot in various MPEG-21 parts including its dissemination and documented where it has been used. In the past, I saw many forms of Digital Items (e.g., iTunesLP was one of the first) but unfortunately the need for a standardised format is very low. Instead, proprietary formats are used and I realised that developers are more into APIs than formats. The format comes with the API but it’s the availability of an API that attracts developers and makes them to adopt a certain technology. 

MPEG-M

The lessons learned from MPEG-21 was one reason why I joined the MPEG-M project as it was exactly the purpose to create an API into various MPEG technologies, providing developers a tool that makes it easy for them to adopt new technologies and, thus, new formats/standards. We created an entire architecture, APIs, and reference software to make it easy for external people to adopt MPEG technologies. The goal was to hide the complexity of the technology through simple to use APIs which should enable the accelerated development of components, solutions, and applications utilising digital media content. A good overview about MPEG-M can found on this poster.

MPEG-V

When MPEG started working on MPEG-V (it was not called like that in the beginning), I saw it as an extension of UMA and MPEG-21 DIA to go beyond audio-visual experiences by stimulating potentially all human senses. We created and standardised an XML-based language that enables the annotation of multimedia content with sensory effects. Later the scope was extended to include virtual worlds which resulted in the acronym MPEG-V. It also brought me to start working on Quality of Experience (QoE) and we coined the term Quality of Sensory Experience (QuASE) as part of the (virtual) SELab at Alpen-Adria-Universität Klagenfurt which offers a rich set of open-source software tools and datasets around this topic on top of off-the-shelf hardware (still in use in my office).

MPEG-DASH

The latest project I’m working on is MPEG-DASH where I’ve also co-founded bitmovin, now a successful startup offering fastest transcoding in the cloud (bitcodin) and high quality MPEG-DASH players (bitdash). It all started when MPEG asked me to chair the evaluation of call for proposals on HTTP streaming of MPEG media. We then created dash.itec.aau.at that offers a huge set of open source tools and datasets used by both academia and industry worldwide (e.g., listed on DASH-IF). I think I can proudly state that this is the most successful MPEG activity I've been involved so far... (note: a live deployment can be found here which shows 24/7 music videos over the Internet using bitcodin and bitdash).

DASH and QuASE are also part of my habilitation which brought me into the current position at Alpen-Adria-Universität Klagenfurt as Associate Professor. Finally, one might ask the question, was it all worth spending so much time for MPEG and at MPEG meetings. I would say YES and there are many reasons which could easily results in another blog post (or more) but it’s better to discuss this face to face, I'm sure there will be plenty of possibilities in the (near) future or you come to Klagenfurt, e.g., for ACM MMSys 2016 ...