Saturday, April 1, 2017

VR/360 Streaming Standardization-Related Activities

April 1 (1st version), September 14 (2nd version)

Universal media access (UMA) as proposed in the late 90s, early 2000 is now reality. It is very easy to generate, distribute, share, and consume any media content, anywhere, anytime, and with/on any device. These kind of real-time entertainment services — specifically, streaming audio and video — are typically deployed over the open, unmanaged Internet and account now for more than 70% of the evening traffic in North American fixed access networks. It is assumed that this number will reach 80% by the end of 2020. A major technical breakthrough and enabler was certainly the adaptive streaming over HTTP resulting in the standardization of MPEG-DASH.

One of the next big things in adaptive media streaming is most likely related to virtual reality (VR) applications and, specifically, omnidirectional (360-degree) media streaming, which is currently built on top of the existing adaptive streaming ecosystems. The major interfaces of such an ecosystem are shown below and are described in a Bitmovin blog post some time ago (note: it has now evolved to Immersive Media referred to as MPEG-I).

Omnidirectional video (ODV) content allows the user to change her/his viewing direction in multiple directions while consuming the video, resulting in a more immersive experience than consuming traditional video content with a fixed viewing direction. Such video content can be consumed using different devices ranging from smart phones and desktop computers to special head-mounted displays (HMD) like Oculus Rift, Samsung Gear VR, HTC Vive, etc. When using a HMD to watch such a content, the viewing direction can be changed by head movements. On smart phones and tablets, the viewing direction can be changed by touch interaction or by moving the device around thanks to built-in sensors. On a desktop computer, the mouse or keyboard can be used for interacting with the omnidirectional video.

The streaming of ODV content is currently deployed in a naive way by simply streaming the entire 360-degree scene/view in constant quality without exploiting and optimizing the quality for the user’s viewport.

There are several standardization-related activities ongoing which I'd like to highlight in this blog post.

The VR industry forum has been established with the aim "to further the widespread availability of high quality audiovisual VR experiences, for the benefit of consumers" comprising working groups related to requirements, guidelines, interoperability, communications, and liaison. However, the VR-IF just started but it may be find itself in a similar role as the DASH-IF for DASH. Currently, VR-IF published a lexicon of terms related to virtual reality (VR), augmented reality (AR), mixed reality (MR), and 360-degree video. Additionally, draft guidelines are available which cover all aspects of the distribution ecosystems, including compression, storage and delivery, in order to ensure high quality, comfortable consumer VR experiences.

QUALINET is a European network concerned about Quality of Experience (QoE) in multimedia systems and services. In terms of VR/360 it runs a task force about "Immersive Media Experiences (IMEx)" where everyone is invited to contribute. QUALINET also coordinates standardization activities in this area. It can help organizing and conducting formal QoE assessments in various domains. For example, it has conducted various experiments during development of MPEG-H High Efficiency Video Coding (HEVC). It recently established a Joint Qualinet-VQEG team on Immersive Media (JQVIM) -- together with VQEG (see also below) -- and everyone is welcome to join (details can be found here).

JPEG started an initiative called Pleno focusing on images. At the 76th JPEG meeting in Turin, Italy, responses to the call for proposals for JPEG Pleno light field image coding were evaluated using subjective and objective evaluation metrics, and a Generic JPEG Pleno Light Field Architecture was created. The JPEG committee defined three initial core experiments to be performed before the 77th JPEG meeting in Macau, China. Additionally, the JPEG XS requirements document references VR applications and JPEG recently created an AhG on JPEG360 with the mandates to collect and define use cases for 360 degree image capture applications, develop requirements for such use cases, solicit industry engagement, collect evidence of existing solutions, and update description of needed metadata.

In terms of MPEG, I've previously reported about MPEG-I as part of my MPEG report (also see above) which currently includes five parts. The first part will be a technical report describing the scope of this new standard and a set of use cases and applications from which actual requirements can be derived. Technical reports are usually publicly available for free. The second part specifies the omnidirectional media application format (OMAF) addressing the urgent need of the industry for a standard is this area. Part three will address immersive video and part four defines immersive audio. Finally, part five will contain a specification for point cloud compression for which a call for proposals is currently available. OMAF is part of a first phase of standards related to immersive media and should finally become available by the end of 2017, beginning of 2018 while the other parts are scheduled at a later stage around 2020. The current OMAF committee draft comprises a specification of the i) equirectangular projection format (note that others might be added in the future), ii) metadata for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data, iii) storage format adopting the ISO base media file format (ISOBMFF/mp4), and iv) the following codecs: MPEG-H High Efficiency Video Coding (HEVC) and MPEG-H 3D audio.

The Spatial Relationship Descriptor (SRD) of the MPEG-DASH standard provides means to describe how the media content is organized in the spatial domain. In particular, the SRD is fully integrated in the media presentation description (MPD) of MPEG-DASH and is used to describe a grid of rectangular tiles which allows a client implementation to request only a given region of interest — typically associated to a contiguous set of tiles. Interestingly, the SRD has been developed before OMAF and how SRD is used with OMAF is currently subject to standardization.

MPEG established an AhG related to Immersive Media Quality Evaluation with the goal to document requirements for VR QoE, collect test material, study existing methods for QoE assessment, and develop a test methodology. The current mandates of this AhG comprises to (i) review and document existing methods to assess human perception and reaction to immersive media stimuli, (ii) develop immersive media quality metrics and investigate their measurability in immersive media services, and (iii) develop guidelines for evaluating quality of experience of immersive media services.

3GPP is working on a technical report on Virtual Reality (VR) media services over 3GPP which provides an introduction to VR, various use cases, media formats, interface aspects, and -- finally -- latency and synchronization aspects. Version 2.0 of this document is already available for download which also covers audio/video quality evaluation as well as a gap analysis, recommended objectives and candidate solutions for VR use cases. Specifically the audio/video quality evaluation are very much within the scope of JQVIM (of QUALINET and VQEG) and the MPEG AhG on immersive media quality evaluation. Additionally, 3GPP started with new study and work items as follows: (i) 3GPP_VRStream: Virtual Reality Profiles for Streaming Media S4-170751, (ii) IVAS_Codec: EVS Codec Extension for Immersive Voice and Audio Services S4-170745, (iii) LiQuIMas: Test Methodologies for the Evaluation of Perceived Listening Quality in Immersive Audio Systems S4-170746, (iv) FS_QoE_VR: Study Item on QoE metrics for VR S4-170724, and (v) FS_CODVRA: Study on 3GPP codecs for VR audio S4-170739.

IEEE has started IEEE P2048 and here specifically "P2048.2 Standard for Virtual Reality and Augmented Reality: Immersive Video Taxonomy and Quality Metrics" -- to define different categories and levels of immersive video -- and "P2048.3 Standard for Virtual Reality and Augmented Reality: Immersive Video File and Stream Formats" -- to define formats of immersive video files and streams, and the functions and interactions enabled by the formats -- but not much material is available right now. However, P2048.2 seems to be related to QUALINET and P2048.3 could definitely benefit from what MPEG has done and is still doing (incl. also, e.g., MPEG-V). Additionally, there's IEEE P3333.3 defining a standard for HMD based 3D content motion sickness reducing technology to resolve VR sickness caused by the visual mechanism set by the HMD-based 3D content motion sickness through the study of i) visual response to the focal distortion, ii) visual response to the lens materials, iii) visual response to the lens refraction ratio, and iv) visual response to the frame rate.

The ITU-T started a new work program referred to as "G.QoE-VR” after successfully finalizing P.NATS which is now called P.1203. However, there are no details about "G.QoE-VR” publicly available yet, just found this here. According to @slhck, G.QoE-VR will generally focus on HMD-based VR streaming, investigation of subjective test methodologies and, later, instrumental QoE models. This also confirmed here with expected deliverables from this study group, namely recommendations on QoE factors and requirements for VR, subjective test methodologies for assessing VR quality, and objective quality estimation model(s) for VR services. In this context, it's worth to mention the Video Quality Experts Group (VQEG) which has a Immersive Media Group (IMG) with the mission on "quality assessment of immersive media, including virtual reality, augmented reality, stereoscopic 3DTV, multiview". IMG is also involved in JQVIM introduced above.

Finally, the Khronos group announced a VR standards initiative which resulted into OpenXR (Cross-Platform, Portable, Virtual Reality) defining an APIs for VR and AR applications. It again could benefit from MPEG standards in terms of codecs, file formats, and streaming formats. In this context, the WebVR already defines an API which provides support for accessing virtual reality devices, including sensors and head-mounted displays on the web.

DVB started a CM Study Mission Group on Virtual Reality which released an executive summary comprising mission statements of individuals/companies. The topic has been also discussed at DVB World. It has been promoted to CM-VR official group and its goal is delivering commercial requirements to be passed to the relevant DVB technical module (TM) groups in order to work on developing technical specifications targeting the delivery of VR contents over DVB networks.

Most of these standards activities are currently in its infancy but definitely worth to follow. If you think I missed something, please let me know and I'm happy to include it / update this blog post.
Post a Comment