and 3bff99b6df0575f22fed3cf268ad913c

<html xmlns="http://www.w3.org/1999/xhtml" 
   xml:lang="en" >
<head>
<title>RDF in HTML: Approaches</title>
<link rel="stylesheet" type="text/css" href="style.css" />
<style type="text/css">
<!--
body { margin: 1em; font-family: Georgia, sans-serif; }
h1 { font-family: Tahoma; }
h2, h3, h4, h5, h6 { font-family: Arial, sans-serif; }
pre, textarea { margin-left: 1em; border: 2px solid #c0c0e0; 
   background-color: #fafdff; /* font-size: 0.7em; */ padding: 0.5em; }
-->
</style>
</head>
<body>
<h1>RDF in HTML: Approaches</h1>

<p>Since there is no one standardized approach for associating RDF compatible 
metadata with HTML, and since this is one of the most frequently asked questions 
on the RDF mailing lists, this document is provided as an outline of some 
RDF-in-HTML approaches that the author is aware of.</p>

<h2 id="toc">Table Of Contents</h2>

<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#approaches">The Approaches</a> 
(<a href="#approachestoc">approaches TOC</a>)</li>
<li><a href="#embeddingissues">Issues With Embedding</a></li>
<li><a href="#conclusion">Conclusion: Which Approach Is Best?</a></li>
<li><a href="#furtherreading">Further Reading</a></li>
</ul>

<p>Please direct feedback to <a 
href="mailto:sean&#64;mysterylights.com?cc=www-archive&#64;w3.org">the 
author</a>, preferably CCing the publically archived <a 
href="http://lists.w3.org/Archives/Public/www-archive/">www-archive</a>.</p>

<h2 id="introduction">Introduction</h2>

<p>Ever since RDF's inception, people have been wanting to embed it in their 
HTML documents. In fact, ever since HTML was invented, people have been wanting 
to embed some sort of metadata for extraction and processing by user agents and 
crawlers. So, theoretically, HTML and RDF is a match made in heaven (aka. the 
halls of the W3C's offices at MIT).</p>

<p>However, after many raging discussions within the W3C's RDF Interest Group 
and elsewhere, there is still no one standard method for associating RDF with 
HTML. This is an important thing for the Semantic Web community to resolve: 
even the author has quite recently found himself wanting to associate RDF with 
HTML for certain applications, but has had to put-aside the application due to 
the lack of a standard approach.</p>

<p>The original <a href="http://www.w3.org/RDF/FAQ">RDF FAQ</a> contained 
a <a href="http://www.w3.org/RDF/FAQ#How" 
title="3. How do I put some RDF into my HTML pages?">piece of advice</a> 
telling people to simply embed the XML RDF into the XHTML (cf. <a 
href="#embedNoValidate">Embed XML RDF Part I</a>), but this was criticized 
since the approach means that the resultant XHTML/RDF soup does not validate. 
This issue has been noted by the RDF Core Working Group (as <a 
href="http://www.w3.org/2000/03/rdf-tracking/#faq-html-compliance">faq-
html-compliance</a>), and is currently "for discussion". My hope is that this 
document will be valuable input into the issue.</p>

<p>All of the approaches given in this note will suited towards particular 
applications. What applications are there? In general, anything that combines 
human and machine readable data is game; for example: RDF Schemata/namespace 
documents, page accessiblity evaluations, complex relationships between the 
document and related resources (for example, one could generate an SVG diagram 
showing how this document fits into the rest of the world), links to digital 
signatures, and possibly even advanced versioning data (cf. CVS). Only the first 
of the list is extant, to the best of the author's knowledge.</p>

<h2 id="approaches">The Approaches</h2>

<ul id="approachestoc">
<li><a href="#embedNoValidate">Embed XML RDF Part I: Eschew Validation</a></li>
<li><a href="#embedAndValidate">Embed XML RDF Part II: Embrace Validation</a>
</li>
<li><a href="#objectOrScript">Utilize the Object or Script Elements</a></li>
<li><a href="#link"><code>&lt;link></code> to the Metadata</a></li>
<li><a href="#hyperrdf">HyperRDF</a></li>
<li><a href="#augmeta">Augmented Metadata for XHTML</a></li>
<li><a href="#profile">Use the Profile Attribute</a></li>
<li><a href="#notations">Making use of XML Notations</a></li>
</ul>

<p>In no particular order...</p>

<h3 id="embedNoValidate"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#embedNoValidate">>></a> 
Embed XML RDF Part I: Eschew Validation</h3>

<p>In the "validator.w3.org be damned" approach, one would generally use the 
abbreviated XML RDF syntax so as to hide the contents from older browsers (which 
usually render the contents of any element, but not attribute values).</p>

<pre>
&lt;head>
&lt;title>Some Page&lt;/title>
&lt;rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
&lt;rdf:Description rdf:about="http://www.w3.org/" dc:title="W3C Homepage"/>
&lt;/rdf:RDF>
&lt;/head>
</pre>

<p>This approach is convenient for authors that know how to author XML RDF (or 
have had some generated for them). It's quite easy for agents to extract, too. 
However, it also has many disadvantages: it does not validate, may still choke 
some older browsers, and the fragment identifers may conflict. As the <a 
href="http://www.w3.org/2001/tag/">TAG</a> have put it:-</p>

<blockquote>
<div>[...] despite widely adopted specifications for XHTML and RDF, there is 
no specification for the interpretation of the mixture. The TAG felt that this 
lack, falling between the scopes of two working groups, was within its scope 
to fill or ask to be filled. [...] A futher problem is that the question of 
how to define the meaning of a URIref with fragement id wihtin such a 
document.</div>
<div>- <cite><a href="http://www.w3.org/2002/04/htmlrdf">Embedding HTML in 
RDF</a></cite>, TimBL for TAG, 2002</div>
</blockquote>

<p>Sidenote: Murray Altheim wrote an excellent <a 
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2001Apr/0223" 
title="Re: RDF in XHTML [...]">summary of why validation is important</a>. 
Also: <a href="http://lists.w3.org/Archives/Public/www-validator/2001Sep/0126" 
title="Why Validate?">Nick Kew's essay</a> on the subject.</p>

<h3 id="embedAndValidate"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#embedAndValidate">>></a> 
Embed XML RDF Part II: Embrace Validation</h3>

<p>This "create a new XHTML family" approach basically involves hacking up a 
small DTD (document type definition) using <a 
href="http://www.w3.org/TR/xhtml-modularization/">XHTML Modularization</a> for 
a variant of XHTML, putting it on the Web, and then referencing it from your 
document. The main drawback is that the DTDs are large and relatively complex; 
this is not a viable approach for typical HTML authors.</p>

<textarea rows="8" cols="80">
&lt;!ENTITY % XHTML-datatypes.mod
<!ENTITY % XHTML-datatypes.mod
         PUBLIC "-//W3C//ENTITIES XHTML Datatypes 1.0//EN"
         "http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-datatypes-1.mod" >
%XHTML-datatypes.mod;

&lt;!-- Prefix Junk -->
&lt;!ENTITY % RDF.NS.prefixed "INCLUDE" >
&lt;!ENTITY % RDF.prefixed "%RDF.NS.prefixed;" >
&lt;!ENTITY % RDF.xmlns "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
&lt;!ENTITY % RDF.prefix "rdf" >
<!-- Prefix Junk -->
<!ENTITY % RDF.NS.prefixed "INCLUDE" >
<!ENTITY % RDF.prefixed "%RDF.NS.prefixed;" >
<!ENTITY % RDF.xmlns "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
<!ENTITY % RDF.prefix "rdf" >

&lt;![%RDF.prefixed;[
  &lt;!ENTITY % RDF.pfx  "%RDF.prefix;:" >
]]&gt;
&lt;!ENTITY % RDF.pfx  "" >
<![%RDF.prefixed;[
  <!ENTITY % RDF.pfx  "%RDF.prefix;:" >
]]>
<!ENTITY % RDF.pfx  "" >

&lt;!ENTITY % RDF.xmlns.extra.attrib "" >
<!ENTITY % RDF.xmlns.extra.attrib "" >

&lt;![%RDF.prefixed;[
&lt;!ENTITY % RDF.xmlns.attrib
<![%RDF.prefixed;[
<!ENTITY % RDF.xmlns.attrib
   "xmlns:%RDF.prefix;  %URI.datatype;  #FIXED '%RDF.xmlns;'
   %RDF.xmlns.extra.attrib;"
>
]]&gt;
&lt;!ENTITY % RDF.xmlns.attrib
]]>
<!ENTITY % RDF.xmlns.attrib
   "xmlns   %URI.datatype;  #FIXED '%RDF.xmlns;'
    %RDF.xmlns.extra.attrib;"
>

&lt;![%RDF.prefixed;[
&lt;!ENTITY % XHTML.xmlns.extra.attrib
<![%RDF.prefixed;[
<!ENTITY % XHTML.xmlns.extra.attrib
    "%RDF.xmlns.attrib;" >
]]&gt;
]]>

&lt;!-- Hack to cover a bug in modularization -->
&lt;!ENTITY % XLINK.xmlns.attrib "%RDF.xmlns.attrib;" >
<!-- Hack to cover a bug in modularization -->
<!ENTITY % XLINK.xmlns.attrib "%RDF.xmlns.attrib;" >

&lt;!-- Now add the comment element -->
&lt;!ENTITY % RDF.RDF.qname  "%RDF.pfx;RDF" >
&lt;!ENTITY % Misc.extra "| %RDF.RDF.qname;" >
&lt;!ENTITY % RDF.Property.qname  "%RDF.pfx;Property" >
<!-- Now add the comment element -->
<!ENTITY % RDF.RDF.qname  "%RDF.pfx;RDF" >
<!ENTITY % Misc.extra "| %RDF.RDF.qname;" >
<!ENTITY % RDF.Property.qname  "%RDF.pfx;Property" >

&lt;!-- Bring in the XHTML 1.1 DTD -->
&lt;!ENTITY % xhtml11.dtd
<!-- Bring in the XHTML 1.1 DTD -->
<!ENTITY % xhtml11.dtd
     PUBLIC "-//W3C//DTD XHTML 1.1//EN"
            "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >
%xhtml11.dtd;

&lt;!ENTITY % RDF.RDF.content ANY >
<!ENTITY % RDF.RDF.content ANY >
     "( %RDF.Property.qname; )" 
&lt;
&lt;!ELEMENT %RDF.RDF.qname; %RDF.RDF.content; >
&lt;!ELEMENT %RDF.Property.qname; ANY >
<
<!ELEMENT %RDF.RDF.qname; %RDF.RDF.content; >
<!ELEMENT %RDF.Property.qname; ANY >

&lt;!ENTITY % RDF.about.attribute "%RDF.pfx;about" >
&lt;!ATTLIST %RDF.Property.qname; 
<!ENTITY % RDF.about.attribute "%RDF.pfx;about" >
<!ATTLIST %RDF.Property.qname; 
   %RDF.about.attribute;   %URI.datatype;   #IMPLIED
   %RDF.xmlns.attrib; 
>
</textarea>

<pre>
&lt;!DOCTYPE html SYSTEM "http://infomesh.net/2002/m12n/test/rdf.txt" >

&lt;html xmlns="http://www.w3.org/1999/xhtml" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
   xml:lang="en" >
&lt;head>
&lt;title>Embedded RDF Test&lt;/title>
&lt;rdf:RDF>
&lt;rdf:Property rdf:about="http://purl.org/net/swn#homepage">
&lt;/rdf:Property>
&lt;/rdf:RDF>
&lt;/head>
</pre>

<p>XHTML Modularization is essentially oriented towards companies and skilled 
Web users that want to provide regular extensions to XHTML. It is not so good 
when it comes to unique extensions that need to be created on a whim.</p>

<p>This method has the same "what is the meaning of the fragment identifiers 
within such a document?" issue as embed-and-don't-validate.</p>

<h3 id="objectOrScript"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#objectOrScript">>></a> 
Utilize the Object or Script Elements</h3>


<p>HTML has two elements for including non-HTML media; <code>&lt;object></code>, 
and <code>&lt;script></code>. <code>&lt;object></code> is a generic element for 
including any external object, whereas <code>&lt;script></code> is available for 
embedding executable scripts.</p>

<h4 id="object">&lt;object&gt;</h4>

<p>The HTML 4.01 specification <a 
href="http://www.w3.org/TR/html401/struct/objects#edef-OBJECT">says</a> that 
inline data may be supplied from a base64 encoded "data:" URI. For example:-</p>

<pre>
&lt;head>
&lt;title>My Document&lt;/title>
&lt;object data="data:application/rdf+xml;base64,PHJkZjpSREYgeG1sbnM6cmRmPSJodHR
wOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIgogICAgICAgICAgICB4bWxu
czpkYz0iaHR0cDovL3B1cmwub3JnL2RjL2VsZW1lbnRzLzEuMS8iPgogIDxyZGY6RGVzY3JpcHRpb
24gcmRmOmFib3V0PSJodHRwOi8vd3d3LnczLm9yZy8iPgogICAgPGRjOnRpdGxlPldvcmxkIFdpZG
UgV2ViIENvbnNvcnRpdW08L2RjOnRpdGxlPiAKICA8L3JkZjpEZXNjcmlwdGlvbj4KPC9yZGY6UkR
GPg==">&lt;/object>
&lt;/head>
</pre>

<blockquote>
<div>congrats, you've found a syntax less workable thatn RDF/XML.</div>
<div><cite>- Edd Dumbill, 24 seconds after this method was proposed on 
#rdfig.</cite></div>
</blockquote>

<p>Of course, one can also link to the RDF in an external file, although we 
shall be discussing using the &lt;link> element for this a little later. Note 
that object allows one to cascade the referenced media, thereby offering a 
provision for alternate serializations: perhaps offering XML RDF, Notation3, 
and NTriples versions of your RDF metadata.</p>

<h4 id="script">&lt;script&gt;</h4>

<p>On the other hand, we have the script element with which to wrap some 
embedded XML RDF. For example:-</p>

<pre>
&lt;head>
&lt;title>My Document&lt;/title>
&lt;script type="application/rdf+xml">
&lt;rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
&lt;rdf:Description rdf:about="http://www.w3.org/" dc:title="W3C Homepage"/>
&lt;/rdf:RDF>
&lt;/script>
&lt;/head>
</pre>

<p>All that the HTML 4.01 specification says about the contents of the 
script element is that:-</p>

<blockquote>
<div>Scripts are evaluated by script engines that must be known to a user agent. 
[...] The syntax of script data depends on the scripting language.</div>
<div>- <cite><a 
href="http://www.w3.org/TR/html401/interact/scripts#edef-SCRIPT">HTML 4.01: 
Definition of the script Element</a></cite></div>
</blockquote>

<p>This is suspiciously vague. Moreover, whilst I do not want to get engaged 
in an argument over the semantics of programming languages, I will note that 
people [such as Sandro Hawke? ask Sandro if he really said this] have estimated 
that the Notation3 superset of RDF has as much power as Prolog, a well-known 
highly-declarative programming language.</p>

<p>Since using the script element in this way is very similar to embedding 
the information (it's just embedding + giving the media type), one would 
presume that it has the same fragment-conflict problem looming over it.</p>

<h3 id="link"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#link">>></a> 
<code>&lt;link></code> to the Metadata</h3>

<p>Arguably the purest solution from an architectural point of view, making use 
of the <code>&lt;link&gt;</code> element has been the object of criticism since 
maintaining the metadata externally to the RDF is seen as an inconvenience. 
Proponents of the solution contend that CSS, JavaScript, and images are already 
maintained externally without fuss, and that retrieving external files does not 
take much more programming than extraction (in fact, possibly less so).</p>

<p>Here's an example:-</p>

<pre>
&lt;head>
&lt;title>My Document&lt;/title>
&lt;link rel="meta" type="application/rdf+xml" href="meta.rdf"/>
&lt;/head>
</pre>

<p>or, if you want to mention it in the document body...</p>

<pre>
&lt;body>&lt;p>&lt;a rel="meta" type="application/rdf+xml" 
href="meta.rdf">blargh&lt;/a>[...]
</pre>

<p>Note that according to the HTML 4.01 specification:-</p>

<blockquote>
<div>Authors may wish to define additional link types not described in this 
specification. If they do so, they should use a <a 
href="http://www.w3.org/TR/html401/struct/global.html#profiles">profile</a> 
to cite the conventions used to define the link types.</div>
<div>- <cite><a href="http://www.w3.org/TR/html401/types#type-links">Link 
Types in HTML 4.01</a></cite></div>
</blockquote>

<p>Since this recommendation is a "should" and "not" a must, and since 
the "meta" link relationship is not one where achieving a global consensus 
should be a difficulty, it is reasonable to use the link type without 
declaring a profile.</p>

<p>Another interesting point to note is that the link element does allow 
for a certain amount of cascading thanks to the "alternate" link 
relationship. For example:-</p>

<pre>
&lt;link rel="meta" type="application/rdf+xml" href="meta.rdf"/>
&lt;link rel="alternate meta" type="application/n3" href="meta.n3"/>
&lt;link rel="alternate meta" type="application/ntriples" href="meta.nt"/>
</pre>

<p>This means that the XML RDF version is preferred, but that user agents 
may use the Notation3 and/or NTriples files as alternatives.</p>

<h3 id="hyperrdf"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#hyperrdf">>></a> 
HyperRDF</h3>

<p>Dan Connolly of the W3C published a note that outlined an ingenious 
method for marking up HTML in such as way as to make relatively easy to 
transform via. XSLT into RDF. The method relies upon binding URIs to link 
relationship QName prefixes via. a special profile and the &lt;link&gt; 
element, and closely resembles the XML Names binding mechanism.</p>

<p>Here's the basic example from DanC's proposal:-</p>

<pre>
&lt;html xmlns="http://www.w3.org/1999/xhtml">
  &lt;head id="rel" profile="http://www.w3.org/2000/07/hs78#">
    &lt;title>example</title>
    &lt;link id="c" rel="rel:classes" href="http://www.w3.org/2000/07/hs78#" />
  &lt;/head>
  [...]
&lt;/html>
</pre>

<p>(Excerpted from <a href="http://www.w3.org/2000/07/hs78/">HyperRDF: Using 
XHTML Authoring Tools with XSLT to produce RDF Schemas</a>, Dan Connolly, 
2000-08).</p>

<p>However, HyperRDF can never be valid XHTML 1.x since the head element 
does not allow an ID attribute. This can be "fixed" with modularization:-</p>

<pre>
&lt;!-- XHTML HyperRDF 1.0 DTD -->

&lt;!ENTITY % xhtml11.mod PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >
%xhtml11.mod;

&lt;!ATTLIST %head.qname; %id.attrib; >
</pre>

<h3 id="augmeta"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#augmeta">>></a> 
Augmented Metadata for XHTML</h4>

<p><a href="http://infomesh.net/2002/augmeta/">Augmented Metadata in XHTML</a>, 
Murray Altheim and Sean B. Palmer eds. With this approach, the current 
metadata facilities of HTML are augmented; the content model is changed so that 
the &lt;meta&gt; element may appear within the body of the XHTML document. For 
example:-</p>

<pre>
  &lt;html>
    &lt;head>
      &lt;link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
    &lt;/head>
  &lt;body>
  &lt;p>
    [&lt;a href="inverts/grasshoppers.html">
      &lt;meta name="DC.type" scheme="HTML4" content="Prev" />
      &lt;meta name="DC.title" content="Previous Chapter" />
      &lt;meta name="DC.language" content="en" />
      &lt;img src="images/prev-arrow.gif" alt="Previous Chapter" />
    &lt;/a>] 
    [&lt;a href="inverts/scorpions.html">
      &lt;meta name="DC.type" scheme="HTML4" content="Next" />
      &lt;meta name="DC.title" content="Next Chapter" />
      &lt;meta name="DC.language" content="en" />
      &lt;img src="images/next-arrow.gif" alt="Next Chapter" />
    &lt;/a>]
  &lt;/p>
</pre>

<p>This is a powerful approach, and one that is different from the others in 
that it adapts current HTML elements&#151;whilst preserving their basic 
semantics&#151;<em>so that they don't necessarily have to refer to the current 
document</em>. Instead, they may refer to a linked file, or the source of some 
cited material. In other words, it is only predicate-object pairs that are 
being stated.</p>

<h3 id="profile"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#profile">>></a> 
Use the Profile Attribute</h4>

<p>As outlined in <a 
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2001Aug/0218">my 
proposal on www-rdf-interest</a>. The basic premise is that one can take the 
profile attribute to be a global namespace prefix for all of the rel/meta@name 
attributes throughout the document.</p>

<p>This approach is mainly for those authors that want to use a simple 
mechanism for producing RDF from their XHTML. It is ineffective from the 
point of view of anyone that wants to randomly extract RDF from XHTML, since 
one cannot tell whether the author wanted the assertions to be converted 
into the triples produced by the algorithm or not.</p>

<pre>
   &lt;head profile="http://example.org/#">
   &lt;meta name="myProp" value="My Object"/>
   &lt;link rel="myOtherProp" href="http://myuri.net/"/>
   &lt;/head>
</pre>

<h3 id="notations"><a title="Anchor for this section" 
href="http://infomesh.net/2002/rdfinhtml/#notations">>></a>
Making use of XML Notations</h4>

<p>This idea <a 
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2001Apr/0286">was 
propounded by Murray Altheim</a>, almost in passing, on www-rdf-interest. The 
approach involves using XML notations (and hence CDATA sections) and a 
custom &lt;metadata> element to wrap the metadata in. To quote Murray:-</p>

<pre>
In the DTD we'd have something akin to:

   &lt;!NOTATION dc PUBLIC 
       "-//DCMI//NOTATION Dublin Core Metadata Element Set V1.0//EN"" 
       "http://dublincore.org/">
   &lt;!NOTATION rdf SYSTEM "http://www.w3.org/1999/02/22-rdf-syntax-ns#">  
   &lt;!NOTATION blat PUBLIC "-//doctypes.org//NOTATION Blat 1.0//EN"
       "http://www.doctypes.org/blat/1.0/">
   ...
   &lt;!ELEMENT  metadata  ( #PCDATA ) >  &lt;!-- really, a CDATA section -->
   &lt;!ATTLIST  metadata
       type  NOTATION  (dc|rdf|blat)
   >
   ]>&lt;!-- end of DTD -->
   ...
   &lt;head>
   &lt;metadata type="rdf">
   &lt;![CDATA[
     {rdf content}
   ]]>&lt;/metadata>
</pre>

<p>This means that there would be one (or a set of few) centralized and 
customized DTDs which could be referenced by authors all over the world. It's 
fairly language independent, although it does mean updating the DTD every time 
a new language comes along.</p>

<h2 id="embeddingissues">Issues With Embedding</h2>

<p>Here we consider the three main issues of the (rather issue-prone) embedding 
approach: how current implementations deal with it, whether to embed in the 
head or body sections, and whether fragment identifier conflicts are a 
problem.</p>

<h3>How Current Implementations Deal With Embedding</h3>
<p>Embedding is a popular approach and has already been implemented in 
numerous applications, including:-</p>

<ul>
<li><a href="http://rdfweb.org/people/damian/2001/10/RDFAuthor/">RDF Author</a> 
via. <a href="http://www.hpl.hp.com/semweb/jena-top.html">Jena</a></li>
<li><a href="http://www.redland.opensource.ac.uk/raptor/">Raptor</a>. "Can 
extract RDF content embedded in XML (such as XHTML)"</li>
<li><a href="http://wilbur-rdf.sourceforge.net/docs/rdf-parser.html">Wilbur</a>. 
"(this is useful, for example, when scanning metadata from XHTML files, 
otherwise the parser will keep reading until the end of the file is 
reached)"</li>
<li><a href="http://139.91.183.30:9090/RDF/VRP/">Validating RDF Parser 
(VRP)</a>. "Supports: Embedded RDF in HTML or XML"</li>
<li>Jason Diamond's RepatCOM.RdfReader. Apparently. q.v. <a 
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2001Apr/0216">Seth's 
note</a>, and <a href="http://www.injektilo.org/">Jason's homepage</a></li>
<li><a href="http://www-diglib.stanford.edu/diglib/ginf/sirpac.html">SiRPAC</a>. 
"Extraction of RDF content from arbitrary HTML pages"</li>
</ul>

<p>Note that all of these implementations simply extract the RDF from the 
XHTML, parse it, and then add it to a store: only RDFAuthor actually does 
anything with the triples that are returned. There are also a handful of 
HTML pages on the Web which have the XML RDF directly embedded within them, 
of which Dan Brickley's <a 
href="http://xmlns.com/foaf/0.1/">FOAF namespace/schema</a> is a notable 
example.</p>

<p>The latest RDF Syntax Working Draft provides a bit of verbiage providing 
implementations of the embedding approach the basis of a solid algorithm for 
extracting RDF from arbitrary XML.</p>

<blockquote>
<div>If the content is known to be RDF/XML by context, such as when RDF/XML 
is embedded inside other XML content, then the grammar can either start at 
Element Node  RDF (only when an element is legal at that point in the XML) 
or at production nodeElementList (only when element content is legal, since 
this is a list of elements).</div>
<div>- <cite><a 
href="http://www.w3.org/TR/2002/WD-rdf-syntax-grammar-20020325#start">RDF/XML 
Syntax Specification (Revised)</a>, W3C Working Draft 25 March 2002, 
Dave Beckett</cite></div>
</blockquote>

<p>This is especially important in light of that fact that the SVG 
Recommendation allows one to embed external XML dialects within a particular 
element allocated as the metadata construct of SVG:-</p>

<blockquote>
<div>The contents of the 'metadata' [element] should be elements from other 
XML namespaces, with these elements from these namespaces expressed in a 
manner conforming with the "Namespaces in XML" Recommendation</div>
<div>- <cite><a href="http://www.w3.org/TR/SVG/metadata#MetadataElement">SVG 
1.0, 21.2 The 'metadata' element</a></cite></div>
</blockquote>

<h3>Embedding in the head vs. body</h3>
<p>The <code>&lt;head&gt;</code> of an HTML document is a reserved space to 
hold metadata about the document which contains it. However, in RDF, the 
subject of each triple is unlimited (except that it must denoted with a 
URI-reference), so the RDF is independent of where it is placed within an 
HTML document.</p>

<p>In TimBL's <a href="http://www.w3.org/DesignIssues/Syntax">Strawman 
Syntax</a> and Altheim et al.'s <a href="http://infomesh.net/2002/augmeta/" 
>augmeta</a> proposals, however, the approach is different since only data 
which can be interpreted as predicate-object pairs are embedded within parts 
of the tree, and therefore are context sensitive. TimBL suggests using the 
current document as the subject in html:head, and the value of the href/cite 
attributes in any elements which have them.</p>

<h3>MIME vs. MIME</h3>
<p><em>This section is slightly controversial, and consists of more of the 
author's opinion than the rest of this note</em>.</p>

<p>The semantics of a URI-reference with fragment identifier are defined by 
the specification of the media-type of the representation returned by a 
network retrieval action of the base URI. The text/html media type specification 
(<a href="http://www.ietf.org/rfc/rfc2854.txt">RFC 2854</a>) states:-</p>

<blockquote cite="http://www.ietf.org/rfc/rfc2854.txt">
<div>For documents labeled as text/html, the fragment identifier designates 
the correspondingly named element; any element may be named with the "id" 
attribute [...]</div>
</blockquote>

<p>The language here is not strict: since it applies to the SGML version, 
we do not know to which namespace(s) the words "any element" apply. Moreover, 
the HTML 4.0 specification defines some elements which <em>do not</em> have 
id attributes, e.g. &lt;head&gt;.</p>

<p>Notwithstanding (or perhaps because of) this ambiguity in the media-type 
specification for HTML, popular thinking amongst Web architecture experts is 
that IDs in XML RDF embedded in HTML have an unknown semantics.</p>

<p>XHTML is a language whose extensiblity has been a major selling point: 
the enourmous modularization of XHTML specification is devoted to making it 
easier for people to create their own customized XHTML derivatives. Because 
of this, it would be sensible to defer the interpretation of XML IDs (and 
their synonyms, such as rdf:about in RDF) to the specification of the namespace 
of the embedded material.</p>

<p>TimBL has <a href="http://www.w3.org/2002/04/htmlrdf">said</a> that he thinks 
this solution "means that you can't use fragids to point to a generic bit of XML 
when just doing XML text processing". Substituting "can't necessarily" for 
"can't", I agree with this sentiment, but feel that it is unimportant. For 
example, <a href="http://www.w3.org/TR/XLink">XLink</a>-aware applications can 
still move to an element with an XML ID declared; whether or not they understand 
the semantics of the thing denoted by the ID is irrelevant since the position is 
still marked with the XML ID; i.e. it does not matter whether the element is the 
actual thing denoted by the ID, as in HTML, or whether it describes the thing 
denoted by the ID, as in RDF.</p>

<p>There have also been concerns raised about languages such as the W3C ERT's 
<a href="http://www.w3.org/2001/03/earl/">EARL</a>, a generic RDF-based 
evaluation language for which being able to identify explicit parts of XML 
trees is very important, and therefore for which the nature of the denotation 
of an XML ID <em>must</em> be known. However, EARL has already had to cope 
with this for a year or more now, and has room to overcome such problems. For 
example, ERT could decide to define a predicate that uses an XPath/XPointer 
style notation to point into the document tree.</p>

<p>Note that this exegesis only necessarily implies that the HTML media 
types be updated to make the semantics of an ID'd element depend upon the 
namespace of that element; it does not mean that this has to apply to every 
XML language, although that may be an option. Note that the HTML WG strongly 
recommends against serving XHTML with foreign namespaced elements as 
text/html:-</p>

<blockquote>
<div>In particular, 'text/html' is <strong>NOT</strong> suitable for XHTML 
Family document types that adds elements and attributes from foreign 
namespaces, such as XHTML+MathML</div>
<div>- <cite><a 
href="http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020430/#text-html" 
title="3.1. text/html">XHTML Media Types</a>, W3C Note 30 April 2002</cite></div>
</blockquote>

<h2 id="conclusion">Conclusion: Which Approach Is Best?</h3>

<p>This question is actually inappropriate: more appropriate may be "<em>which 
approaches, if any, are suitable for all applications of RDF associated with 
HTML?</em>" and "<em>which approach has the best ratio of implementability and 
architectural purity?</em>". In other words, in order to resolve the issue, 
we have to make our choice based upon the applications for associating RDF 
with HTML.</p>

<p>Each of the approaches listed above have their advantages and disadvantages. 
Zealous pragmatics will always be around to argue that embedding the RDF 
straight into the XHTML is the best approach&#151;otherwise, what is the point 
of having XML and namespaces, constructs that are there to enable language 
mixing?</p>

<p>Since it is not viable for the average HTML author to create a new variant 
of XHTML every time they want to embed some RDF, we can discount this approach 
immediately. Since embedding (and embedding within a &lt;script> element) is an 
approach that does not validate, one can obviously not include a doctype 
declaration with the file. However, it is possible to specify an XSLT 
transformation which can be applied to the XHTML such that the result is 
validatable XHTML 1.x:-</p>

<pre>
&lt;stylesheet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
   xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0" >
&lt;template match="node()|@*">
   &lt;copy>&lt;apply-templates select="node()|@*"/>&lt;/copy>
&lt;/template>
&lt;template match="rdf:RDF"/>
&lt;/stylesheet>
</pre>

<p>On the other hand, RDF is not only serializable as XML RDF; languages such as 
Notation3 and NTriples are popular. Given this situation, a language independent 
metadata association mechanism would be preferable&#151;especially if it allowed 
a cascade or provision of alternatives. The obvious counter argument is that 
having a single canonical format for associating with HTML makes sense since it 
minimalizes diversity and therefore increases the chances for 
interoperability.</p>

<p>Neither of the linking ("&lt;link> to the Metadata") or embedding ("Embed 
XML RDF Part I: Eschew Validation", and possibly "Embedding using &lt;script>") 
methods can be ruled out, in the author's opinion. Linking has the substantial 
advantage that it is serialization independent, may reduce file sizes when a 
single set of triples is often referenced (such as contact information), and 
provides a cascade. Embedding is useful because it is direct, there are existing 
implementations to deal with it, plus people will be embedding XML RDF and other 
languages like it into XHTML for a long time to come.</p>

<p>The HyperRDF, Augmeta, and generic profile attribute approaches are still 
valid. However, I recommend that authors of such documents combine this with 
the &lt;link> element method, for example pointing to the URI of <a 
href="http://www.w3.org/2001/05/xslt">an XSLT Web service</a> that converts the 
current document into XML RDF.</p>

<p>In conclusion&#151;and with the strong caveat that this is the author's 
opinion only&#151;both the linking and embedding options should be supported by 
any new implmentations that have to deal with extracting RDF from HTML. This is 
the path of least resistance since no one can <em>ban</em> anyone from linking 
for embedding, although it does make more work for the parser developers. It is 
important that the precise semantics of XML RDF embedded in HTML are made clear 
and published by the W3C; preferably as part of a generic language mixing 
note.</p>

<h2 id="furtherreading">Further Reading</h2>

<p>For anyone that's wondering what to do next.</p>

<ul>
<li><a 
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2001Apr/0200">RE: 
Authors describing what their URIs mean</a>, Joshua Allen (April 2001)</li>
<li><a href="http://www.w3.org/2000/08/w3c-synd/">An XHTML profile for RDF Site 
Summaries</a> Dan Connolly ($Revision: 1.14 $ of $Date: 2001/05/31 17:24:11 
$ by $Author: danbri $)</li>
<li><a href="http://www.w3.org/2000/06/dc-extract/form">Dublin Core 
Extraction Service</a>, Dan Connolly $Revision: 1.4 $ of $Date: 2000/06/09 
18:52:10 $</li>
<li><a 
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0103">Using 
XSLT to extract RDF from real-world data</a>, Dan Connolly on 
www-rdf-interest</li>
<li><a href="http://www.w3.org/DesignIssues/Syntax">A strawman Unstriped syntax 
for RDF in XML</a> see under "RDF in HTML - Transparent or not?". 
Tim Berners-Lee, 1999</li>
<li><a href="http://www.mysterylights.com/xhtmltordf/">XSLT XHTML to RDF 
Extractor</a>, Sean B. Palmer (2000/2001)</li>
<li><a href="http://uwimp.com/">UWIMP (Uniform Web Index Maker Program)</a>, 
William Loughborough and Sean B. Palmer (26 February 2001)</li>
<li><a href="http://www.daml.org/2002/03/tutorial/slide38-0.html">Embedding 
RDF in HTML</a> in "DAML+OIL for Application Developers", Mike Dean, 2002-03</li>
</ul>

<h3>Peripherally Related</h3>
<ul>
<li><a href="http://www.openhealth.org/RDDL/tddl">Terminology Definition 
Description Language (TDDL) 1.0</a>, Jonathan Borden</li>
<li><a href="http://rdfweb.org/2001/01/swipe/">RDFWeb: SWIPE</a>, "SWIPE is a 
simple RDF vocabulary that provides some basic facilities to support the 
extraction of structured RDF data from arbitrary HTML, XHTML and pseudo-HTML 
textual content." DanBri, Damian, and Libby</li>
</ul>

<h2>Acknowledgements</h2>
<p>Many thanks to Dave Beckett, Dan Brickley, and Dan Connolly for their 
early reviews and feedback. Many thanks also to Murray Altheim for his 
discussion of many important principles and for the augmeta approach, and 
William Loughborough and Dan Brickley for providing the inspiration to 
write this note up. Credit is also due to the many contributors to the 
RDF-in-XHTML threads on the RDF mailing lists: Joshua Allen, Danny Ayers, 
Seth Russell, Aaron Swartz, Jonathan Borden, et al.</p>

<p>This note was first published on: 2002-05-31; 
most recent update: 2002-06-02.</p>

<address>
<a href="http://purl.org/net/sbp/" 
title="A Homepage Of Sean B. Palmer">Sean B. Palmer</a>
</address>
</body>
</html>
Properties
http://infomesh.net/2002/rdfinhtml/	3bff99b6df0575f22fed3cf268ad913c
~~`base_uri@:http://infomesh.net/2002/rdfinhtml/`~~	Add this property
`charset@:utf-8`
~~`content_type@:text/html`~~	Add this property
~~`last_modified@:2004-03-24 23:00:00Z`~~	Add this property
~~`uri@:http://infomesh.net/2002/rdfinhtml/<>2007-10-22 13:34:26Z`~~	Add this property
~~`uri@:http://infomesh.net/2002/rdfinhtml/<>2007-10-22 13:38:59Z`~~	Add this property
~~`uri@:http://infomesh.net/2002/rdfinhtml/<>2007-10-22 13:44:06Z`~~	Add this property
~~`uri@:http://infomesh.net/2002/rdfinhtml/<>2012-07-05 23:56:57Z`~~	Add this property
Add this property	`content_type@:application/xml-dtd`
Add this property	`derived_from@:digest:f9fa860135ce1c80959e908b052ad724`
Add this property	`documentation@:uri:http://infomesh.net/2002/rdfinhtml/`
Add this property	`src@:digest:dde8aceeaa0abc6998bbb9077e6d7a53`
Add this property	`tag@en:DTD Driver`
Add this property	`tag@en:HTML`
Add this property	`tag@en:RDF`
Add this property	`tag@en:RDF/HTML`
Add this property	`tag@en:XHTML`
Add this property	`tag@en:XML DTD`
Add `derived_from` property (<-)	Add `derived_from` property (->)

Edit	Edit