The Resource Description Framework (RDF) is certainly the
most ambitious of all the metadata efforts from the W3C Metadata Activity, it
became a W3C Recommendation on the 22nd February 1999. RDF is a
syntax for describing resources.
Resources are defined as anything that can be designated by a URI. RDF does not
specify a vocabulary for describing resources. Rather, it provides the means
for vocabulary authors to build up descriptions and facts about some topic of
interest. It was influenced by the W3C experience with PICS, but it attempts to
break out of the narrow model of PICS by providing a generalized model for
describing resources.
RDF is a model for defining statements about resources. Each
resource possesses one or more properties, each of which has a value. The model
provides a means of defining classes of resources and properties. These classes
are used to build statements which assert facts about the resource. RDF defines
a syntax for writing a schema for a resource. A schema is analogous to a DTD,
but is much more expressive. The schema uses the model defined for some
vocabulary to express the structure of a document in the vocabulary. The
statements in the model place constraints on the statements that can be made in
a document conforming to the schema.
RDF Model
The basic RDF model is built from three types of objects:
Resources anything that can be named with a URI
Properties a specific, meaningful attribute of a resource
Statements a combination of a resource, a property of the resource, and the value of the property
Resources
Resources can be almost anything: a document, a collection
of documents, a site, even a specific portion of a document. This allows RDF to
describe almost anything that can be placed online.
Properties
Properties have well-defined meanings. This means that
constraints are placed on a property to define the types of resources to which
it can be applied, the range and types of values it can take on, and how it
relates to other properties. These constraints are a major reason why RDF is so
expressive the constraints give meaning to the properties, and hence to the
resources they describe.
Statements
Statements are triplets consisting of a subject
resource, a predicate property, and an object
value. Objects can be literal values or resources, making complex statements
possible. Consider the natural language statement:
The topic of urn:this-book is designing distributed applications.
The subject resource is urn:this-book. The property is topic, and
the object is designing
distributed applications.
Strictly speaking, properties are a subtype of resources.
This is important from a theoretical perspective, but it is simpler for our
introductory purposes to think of them as entirely separate entities. Our common
sense view of them as separate items will make it easier to conceptualize the
RDF model.
One property defined in the basic RDF model is type. This gives RDF a way to assign types to
resources. Resources and properties use a class typing mechanism, so a given
resource may be said to be a subtype of another class type. The RDF namespace
has names for the class of resources and the subClassOf
a property. By successively defining new classes of resources and properties, a
vocabulary builder can develop RDF statements of arbitrary complexity and
meaning.
Constraints are a specialized type of property. They are
further refined in the range and domain of the property. Where typing gives us
specialized properties, constraints bound a property, thereby giving it definition
and meaning.
RDF also defines a variety of containers and collection
classes. As we have seen in the previous chapter, it is often necessary to
discuss collections of objects. RDF's container classes are much more
sophisticated than ours. They define a variety of ordering and containment
models.
An examination of RDF container classes is outside the scope of this book. The
full W3C RDF Recommendation can be found at http://www.w3.org/TR/REC-rdf-syntax/
RDF Schema
RDF would be of little more than theoretical value if it did
not include a format for transmitting data models. The creators of RDF chose to
define an XML vocabulary for this task. This vocabulary defines resources and
properties in a typed system similar to object oriented languages like C++ and
Java.
The terminology of RDF can be overly theoretical in places. A few words on terminology for those of us
who are not set theorists is therefore in order. RDF is a model for talking about things. Those things we can discuss, use, or otherwise
refer to in an RDF schema are called resources. Both classes and properties are kinds of resources in the RDF
model. Each property has a range the
set of values it can talk on and a domain the class to which the
property applies.
Let's illustrate these concepts with a very simple RDF
schema. Suppose we wish to talk about our retail customers. For generality,
we'd like to say that retail customers are a specialized type of some customer
class. This is done with the following lines:
<rdfs:Class rdf:ID="Customer">
<rdfs:comment>Generic class for describing customers</rdfs:comment>
<rdfs:subClassOf
rdf:resource="http://www.w3.org/TR/WD-rdf-schema#Resource"/>
</rdfs:Class>
<rdfs:Class rdf:ID="RetailCustomer">
<rdfs:comment>Derived class for describing retail customers</rdfs:comment>
<rdfs:subClassOf
rdf:resource="#Customer"/>
</rdfs:Class>
The rdf and rdfs namespaces are part of the RDF proposal
and are declared elsewhere in our schema document. Our class named Customer is a subclass of the RDF-defined
class resource. RetailCustomer,
then, is a subclass of Customer. Now let's give
our customer a way to pay for his purchases. RetailCustomer
should have a property that will take on one of the names of a set of credit
cards. That is accomplished with this property definition:
Our property is named paymentType.
It takes on a value from the class CreditCards,
which we shall define shortly. The property's domain the class to which it
can apply is the class RetailCustomer. We know
that the values for this property will be a limited number of strings naming
the major credit card types. First we define a class of literals.
Perhaps we are interested in keeping track of who referred
this customer to us. This should be a property whose value is a resource of the
type Customer. This allows us to have any sort of
customer derived class as a value for this property. That way, we could have
referrals from RetailCustomer instances
or as-yet undefined WholesaleCustomer
instances without having to enumerate these specific derived classes.
Similarly, if we derive more classes from Customer,
the referrer can participate in these relationships without modifying the range
declaration.
Our property is called referrer,
it can be applied to the RetailCustomer class, and
its value must be a resource of the Customer
class. Since we have previously defined that class, no further specification is
necessary. Here's the full text of our simple RDF schema:
RDF, quite simply, is far too ambitious for our purposes.
Many of its assignments are nothing more than names. A complicated system of
mappings between names and resources is needed to discern meaning. More
advanced features, e.g., ranges, domains, and container classes, are needed to
communicate metadata regarding the topic under discussion. These features,
however, are a bit too much for the simple kinds of automated metadata
applications we are likely to support in the immediate future. If RDF can be
supported, it is a powerful mechanism for communicating intellectual models.
Our needs, however, are somewhat simpler.
Indeed, both XML and our development philosophy share the
belief that simple features that can be readily implemented are more useful
than complex features that can be implemented only with great difficulty. Given
some XML vocabulary, we'd like to be able to discover the proper structure for
a document that conforms to that vocabulary. This is far simpler to implement.
We really need a better way of encoding a DTD. This is what the remaining
proposals aim to achieve.
Meta Content Framework Using XML
The Meta Content Framework (MCF) is similar to RDF, although
it doesn't seem to have influenced quite so many later efforts as has RDF. Like
RDF (and, indeed, most of the metadata proposals), the MCF uses a directed
graph model of nodes and edges to build conceptual models. Objects are the
nodes and property values are the edges. An XML vocabulary is provided for
encoding MCF models. Subclassing and inheritance is permitted. Like RDF, a core
set of property and object types are used to describe more complicated types,
and so forth until the complete metadata model is described. An interesting
property of MCF is that its authors anticipated using MCF to define
componentized blocks of metadata. These blocks would then be combined through
the XML linking specification to compose complete metadata models. In this way,
MCF blocks found to be useful to particular problems could be reused by other
vocabulary authors working on related problems. The following illustration
shows a simple MCF schema for this book. The book object is derived from the
category (MCFs term for class) Book, which
in turn derives from Document. The book has
chapters (i.e., the book is the domain of the Chapter
category) which takes their values from the category English_Prose.
That category is derived from the category text.
Note that typeof, domain, and range
are properties of their respective objects.
Heres the XML document that captures the information in the
illustration above:
<xml-mcf>
<Category id="Designing_Distributed_Applications">
<name>Designing_Distributed_Applications</name>
<superType unit="Book"/>
<description>The category whose sole member is this book</description>
</Category>
<Category id="Book">
<name>Book</name>
<superType unit="Document"/>
<description>The notion of a bound book</description>
</Category>
<!-- The supertype, Page, is a category from MCF itself. -->
<Category id="Document">
<name>Document</name>
<superType unit="Page"/>
<description>A generalized document</description>
</Category>
<Category id="Chapter">
<name>Chapter</name>
<superType unit="Page"/>
<description>The notion of an organized sequence of pages</description>
<domain unit="Designing_Distributed_Applications"/>
<range unit="English_Prose"/>
</Category>
<Category id="English_Prose">
<name>English_Prose</name>
<superType unit="text"/>
<description>The notion of prose written in English</description>
</Category>
<Category id="text">
<name>text</name>
<superType unit="Page"/>
<description>The notion of some organized natural language</description>
</Category>
</xml-mcf>
XML Data is an ambitious proposal for the definition of
schemas. Like RDF, it can express both conceptual and syntactic models. To
clarify, a DTD is an example of a syntactic model it specifies the allowable
syntax of some vocabulary, whereas a relational database schema is a conceptual
model, as it describes things and the relations between things in the model.
XML Data also uses an XML vocabulary as its documentation format. It can
express all the information of a conventional XML DTD, but it adds strong
typing of elements and attributes. In addition, constraints may be placed on
the value and use of an element. XML Data also supports inheritance of types,
which allows us to conveniently extend existing definitions. Further aiding
authors of schemas is the ability to use a defined element type as a complex
structure. Hence, our RetailCustomer from the
RDF discussion may be used as a basic type in later schemas.
Unlike a DTD, an XML Data schema allows you to declare a
model open. In an open model, the syntactic rules
laid down in the schema do not preclude the inclusion of content not covered in
the schema. This might be useful in cases when we wish to precisely define some
content but are indifferent to other content that might be added to documents.
If the model is declared closed, an XML Data schema specifies content in the
same formal manner as a DTD. In which case, all content must be explicitly
described in the schema to be permitted in a document conforming to the model.
In order to embrace conceptual models such as relational database schemas, XML
Data introduces relations, a concept in which an element acts as a reference to
another. This is like the notion of primary and foreign keys in a database; an
element contained in one item of content establishes a relationship with
another item of content. The element in question is a key or index into the
other content. Aliases are also permitted. This allows us to establish subtle
concepts. An element can have an alias, or correlative in XML Data's terminology,
which establishes the context of a relationship. For example, we might have a STUDIED
element with the correlative STUDENT. This establishes that STUDIED
is an alias for STUDENT, in the context of the student's
relation to the topic she studies.
We will not discuss XML Data and the related proposal that
follows, XML Document Content Description, in great depth because a partial
implementation is included with the version of MSXML that ships with Internet
Explorer 5.0. This partial implementation, intended as a technology preview, is
termed XML Schema. We will discuss its implementation at length and develop
some prototype code using it later in this chapter.
The XML Document Content Description (DCD) proposal is an
attempt to extract the subset of XML Data's features that permit the encoding
of a DTD in XML. It is thus a simplification of XML Data that addresses a
pressing need in a valuable way. Its authors modified the syntax of XML Data so
that DCD would be more closely aligned with RDF.
DCD also offers a few features that cannot be expressed in
an XML 1.0 DTD. The first, and perhaps most important to the exchange of
business data using XML, is the ability to specify the data type of elements
and attributes. One criticism of XML is that it expresses all values as text,
leaving the native data type in question. DCD identifies a host of native types
drawn from common programming languages as well as the core tokenized types
defined in XML 1.0.
DCD explored two additional features in appendices to the
main submission. The first is the ability to nest element type definitions
within other definitions in order to declare an element type with scope local
to the containing element type definition. The second, of somewhat broader use,
is the inheritance and subclassing mechanism. This borrows a powerful technique
from the world of object oriented programming. Element and attribute type
definitions can be extensions of simpler type definitions. When a type
definition includes the keyword element <Extends
Type="some_type_definition"/>,
it inherits all the elements and properties previously defined for the class some_type_definition.
Internet Explorer 5.0 supports metadata in several ways.
First, it uses the current draft of the namespaces specification. Second, it
uses namespaces to provide an approach to typing of elements. This is coupled
with Microsoft's extensions to the DOM so that a program can retrieve the value
of an element in either text (i.e., as it appears in the document) or native
binary data format (e.g., int, float). Finally, it offers a technology preview
termed XML Schema. This is based on the XML Data proposal, but only supports
the feature subset that is also part of the XML DCD proposal. These features
may be used to explore the metadata in XML and suggest ways we could use it in
our applications.
The various metadata efforts seen in this chapter cover a
spectrum from the highly ambitious to the narrowly focused. Each minimally
gives us a way to capture the same metadata about a vocabulary that a DTD
expresses. Each goes further, however, adding more expressive techniques for
describing data. That is what is interesting to us in terms of the third
principle of developing cooperative network applications:
3. Services shall be provided as self-describing data.
The more descriptive our data can be, the better. An automated consumer of service data such
as an agent may encounter an unfamiliar vocabulary. Unlike a human consumer, the robot needs a great deal of help in
exploring the data. When the thicket of metadata efforts is cleared, service programmers will have a very powerful tool
for providing that help. Since these efforts use XML for their own syntax, we have the added benefit of being able
to reuse MSXML and other XML parsers with which we may be familiar.
Defining Datatypes in XML
There are many occasions when the textual contents of an
element represent a typed value other than text in the domain we are
describing. This is most obvious in the case of numeric values. The integer
1234 requires two bytes of storage in its native form on a PC. In XML's default
character encoding, it consumes four bytes. Worse, before we can use it in calculations,
we must perform a conversion from the string to the numeric form. Beyond the
issues of storage and conversion, if we simply use unadorned text the type of
data is implicit knowledge. If we use the data type namespace, however, we can
make the type explicit. This might be useful to us if we wanted to examine a
document in an unknown format. For example, a graphing component might search a
document for collections of numeric types. If found, these could be presented
to the user for selection of what data to put in a graph. Use of the data type
namespace also allows us to manipulate data in native form. For example, if I
have this element
<VELOCITY dt:type="r8">1.5E5</VELOCITY>
I can retrieve it as either the string 1.5E5 or as an eight-byte floating point
numeric value. The DOM extensions to support this consist of two properties of
the Node class:
Property
Description
nodeTypedValue
read-write; typed value of the node
dataType
read-write; the type of the node
Twenty-five types frequently encountered in programming
languages are supported. Additionally, the XML 1.0 recommendation defines ten
enumerated or tokenized types and these are supported as well. The definitive
list of types supported is found at http://www.microsoft.com/workshop/xml/schema/reference/datatypes.asp.