MPEG Standards -> Metadata captures multimedia diversity

Metadata captures multimedia diversity

Metadata captures multimedia diversity
By Eric Rehm, Chief Technology Officer, Singingfish Inc., Seattle, EE Times
November 12, 2001 (1:06 p.m. EST)
URL: http://www.eetimes.com/story/OEG20011112S0050

In 1974, Chicago writer, poet and pianist Gil Scott-Heron proclaimed, "The Revolution Will Not Be Televised." In 2001, we sadly know better.

But how will the essence-audio, video, images, animations-be accompanied over all of the transports, such as terrestrial, cable, satellite, Internet and wireless? And, exactly how do we include adequate descriptions such as prose, verse, keywords, key frames, audiovisual summaries, semantic concepts, color histograms, shapes, as well as recognized speech? We will need to be able to find this essence efficiently in our burgeoning digital asset-management systems, search indexes, electronic program guides, topic maps or browsable directories.

The answer is that at every portion of the content chain that "televises the revolution," essence will be accompanied by metadata-data about the data-that describes the essence at that point. This accompaniment is going to make sense among the alphabet soup of au diovisual production tools, portal servers and client applications because they will all speak a new XML-based lingua franca, namely MPEG-7, the ISO/IEC 15938 Multimedia Content Description Interface, to be released this December.

The systems portion of the MPEG-7 specification (ISO/IEC 15938-1) addresses how metadata can be coded and transported efficiently in both textual and binary XML formats, and describes a terminal architecture employing an encoder and a decoder. While MPEG-7 descriptions can be simply exchanged as text or binary XML files, special attention has been paid to supporting XML in a dynamic environment where:

Descriptions are in perpetual evolution;
Fragments of a transmitted description can be received out-of-order;
Descriptions can be synchronized with the essence;
Updated or changed information can be sent on demand-for example, based on a user request or a user profile known only by the receiver;
Partial updates (add, delete, replace) can be app lied or the entire description can be reset.

MPEG-7 has employed other standards or harmonized with them wherever possible.

MPEG-7 uses XML Schema as the language of choice for content description. XML Schema is a means for defining the structure, content and semantics of XML documents and includes the ability to express structure and content data types. As such, it's very much like using the object and data-type facilities of any modern programming language-for example, C structs and C++/Java classes.

The adoption of XML Schema allows MPEG-7 applications to leverage a large body of existing tools, APIs and server technology built around World Wide Web Consortium-based XML standards. This fact ensures that the uptake of MPEG-7 will be rapid. Moreover, support for an XML textual format eases initial development and debugging, while a binary encoding ens ures efficient transport.

MPEG-7 will be interoperable with other leading standards such as SMPTE Metadata Dictionary, Dublin Core, EBU P/Meta and TV-Anytime.

TV-Anytime is the first metadata application to use MPEG-7 descriptors and description schemes as part of an open specification designed to allow consumer electronics manufacturers, content creators, telcos, broadcasters and service providers to exploit high-volume digital storage in consumer platforms. Earlier this year, an Advanced Television Standards Committee (ATSC) request for proposal stated that "it is highly desirable that any ATSC standard for enhanced metadata support advanced EPG [electronic program guide] features." It should also "be harmonized with other standards efforts, such as MPEG-7 . . . [and] . . . TV-Anytime," the ATSC said. Finally, the MPEG-21 Multimedia Framework can use MPEG-7 to describe multimedia as part of a Digital Item Declaration.

Like other MPEG standards, MPEG-7 has a systems component that allows an MPEG-7 encoder and MPEG-7 decoder to interoperate over a variety of transports including MPEG-4 systems and MPEG-2 transport streams (private data, DSM-CC sections, MPEG-2 transport ancillary data such as ATSC PSIP).

This interoperation is supported in textual XML format or a compressed binary format. Using the 15938-1 binary format, an XML textual description can be compressed, partitioned, streamed and reconstructed at terminal side. The reconstructed XML description will not be byte-to-byte equivalent to the original description, but will conform to the World Wide Web Consortium (W3C) Canonical XML, making it compatible with all XML tools.

MPEG-7-based applications, such as TV-Anytime or an advanced ATSC electronic program guide, can transmit descriptions and, optionally, can incrementally update portions (or fragments) of an MPEG-7 description.

Four layers
There are four main layers of the MPEG-7 terminal architecture: the application, the systems layer, the delivery layer and the storage and/or transmission medium.

MPEG-7 standardizes only the systems layer. However, the specification does make specific assumptions about the delivery layer-for instance, for synchronized delivery, the delivery layer must provide a method of time-stamping the metadata access units.

The systems layer defines a decoder whose architecture is described here to provide an overview and to establish common terms of reference. A compliant decoder need not implement the specific parts; rather it must implement the decoding process that the standard defines.

The major portions of the decoding process are as follows:

Decoder initialization signals text or binary mode and hints at the MPEG- 7 Schema location used for binary decode. Decoder initialization information may use a separate initialization extractor compared with the description stream.
The initial description is a constrained access unit that may be used to initialize the current description tree. The initial description is further updated over time by the access units that constitute the description stream. The current description tree must be valid with respect to the MPEG-7 XML Schema after the first access unit is received.
The terminal processes a description stream only after decoder initialization.
An access unit is composed of any number of fragment update units, each of which is extracted in sequence by the fragment update component extractor.

Each fragment update unit consists of:

An fragment update command, specifying the type of update to be executed (add, replace, or delete);
A fragment update context, which points to a specific XML element in the current description tree where the fragment update command applies; and
A fragment update payload, which provides the value for the fragment to be added or replaced. The payload is passed into the fragment update payload decoder, which decodes the textual or binary to yield a description fragment.

By using the corresponding fragment update command and fragment update context, the description composer application (not standardized by MPEG-7) either places the description fragment at the appropriate node of the current description tree at composition time or may use application logic to skip unwanted elements as desired, for example.

A very simple terminal can handle a Web-based client-server application. The Web server, as an MPEG-7 encoder, composes a textual XML MPEG-7 document based on parameters in an HTTP request from a client. The HTTP response consists of decoder initialization containing an initial description, with a single fragment representing an entire description, and no subsequent access units.

MPEG-7, like other MPEG standards, is all about interoperation, providing ample opportunity for innovation and market differentiation among compliant products.

The chain of processes that pipes essence from its creation to its delivery to an end user is glued together with metadata. For example, a video logging application is really a metadata creation tool. Video key frame metadata-typically, an image at an automatically detected shot boundary, represented by a single time code, and perhaps metadata about the type of shot boundary (pan, zoom, and wipe)-are extracted from the content by the video-logging application and made accessible to other applications via a proprietary interface. Any secondary application wishing to gain access to the key frame metadata must implement the proprietary interface to the proprietary key frame meta-data encoding.

The alternative is to represent interfaces between essence and metadata tools, and between metadata tools using MPEG-7. Applications are free to keep their proprietary representations; however, the applications' input/output interfaces, such as key frames, should be standardized using MPEG-7.

In light of the MPEG-7 terminal and the XML technology upon which MPEG-7 is based, implementing such I/O interfaces can be no more difficult than building the simplest of MPEG-7 codecs:

Input decoding. Parse XML and transcode MPEG-7 description schemes into a proprietary internal metadata format.
Output encoding. Trans-code proprietary metadata formats into MPEG-7-compliant XML output.

Which MPEG-7 description schemes should you use? Having finalized the MPEG-7 V1.0 specification, MPEG-7 participants are looking to group MPEG-7 tools into profiles and levels, as in other MPEG-7 standards. An MPEG-7 profile will define a set of descriptors and description schemes that can be used in a certain interoperability context, such as a specific MPEG-7 terminal.

Upcoming MPEG-7 "awareness events" in the United States, Europe a nd Asia are the main activities of the MPEG-7 forum. Visit the MP7IF at www.mpeg-industry.com.

Industry Articles

MPEG Standards -> Metadata captures multimedia diversity