HyperRDF: Using XHTML Authoring Tools with XSLT to produce RDF Schemas

Scripting is disabled and therefore annotations cannot be shown.
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>HyperRDF: Using XHTML Authoring Tools with XSLT to produce RDF
  Schemas</title>
  <meta http-equiv="Content-Type" content="text/html" />
</head>

<body>
<p><a href="../../../">W3C</a> <a href="../../01/sw/">SW Dev</a></p>

<h2><a name="XSLT1" id="XSLT">HyperRDF: Using XHTML Authoring Tools with XSLT
to produce RDF Schemas</a></h2>

<p>Contents:</p>
<ul>
  <li><a href="#Introducti">Intro</a></li>
  <li><a href="#link">Grounding link relationships and classes</a></li>
  <li><a href="#Declaring">ClassTree</a></li>
  <li><a href="#Declaring1">Property</a></li>
  <li><a href="#Declaring2">Rule</a></li>
  <li><a href="#app-amaya">Appendix: using Amaya</a></li>
  <li><a href="#app-style">Appendix: Style sheet</a></li>
  <li><a href="#app-ex">Appendix: Examples</a></li>
  <li>Resources
    <ul>
      <li><a href="form.html">forms-based service</a> implementation</li>
      <li><a href="html2rdfs.html">xslt source</a>, <a
        href="Makefile">Makefile</a>, <a href="style.css">stylesheet</a></li>
    </ul>
  </li>
</ul>

<p>nearby:</p>
<ul>
  <li><a href="http://www.w3.org/2000/06/sw594.html">Toward Swell, the
    Semantic Web Logic Language</a> (formerly here on this page)</li>
  <li><a href="../scribe-stuff/">scribe stuff, with links to lots of earlier
    stuff</a></li>
</ul>

<h2><a name="Introducti" id="Introducti">Introduction</a></h2>

<p><a href="../../../XML/">XML</a> syntax is a little tedious, but lots of
people are evidently willing and able of editing it by hand. <a
href="../../../RDF/">RDF</a> adds another layer of tedium, but there are still
a few folks willing to write it by hand. I make heavy use of
reification/quoting in my representation of logical formulas in RDF. This adds
another layer of tedium that I find unmanageable, and I have been writing
XML/SGML/HTML by hand for 10 years.</p>

<p>I have had a lot of success lately using <a
href="http://www.w3.org/TR/xslt">XSLT</a> to screen-scrape RDF out of XHTML
pages, and I'm quite happy to use a hypertext editor (e.g. <a
href="#app-amaya">Amaya</a>) to record my knowledge. I make use of the
occasional <code>class</code> or <code>rel</code> attribute to distinguish the
information that a particluar XSLT transformatoin is looking for from stuff
that just happens to be there for other reasons. For example, I can write a
typed link:</p>
<pre>&lt;a rel="interest" href="http://www.w3.org/XML/">XML&lt;/a></pre>

<p>on <a href="../../../People/Connolly/">my home page</a>, and convert it to
RDF ala:</p>
<pre>&lt;rdf:Description about="">
  &lt;interest>
    &lt;rdf:Description
         rdf:about="http://www.w3.org/XML/">
      &lt;rdfs:label>XML&lt;/rdfs:label>
    &lt;/rdf:Description>
  &lt;/interest>
&lt;/rdf:Description></pre>

<p>But I  want to go beyond the post-hoc/third-party style of screen-scraping
and make it clear that I, the author of the web pages, am making the very RDF
assertions that the XSLT transformation generates, when I write my web pages.
And I'm starting to think that this techique is sufficiently useful that it
will be deployed beyond the single-use transformations I have been doing, to a
scale where managing collisions among link relationship names and class names
is essential.</p>

<h3><a name="link" id="link">Grounding link relationships and class names in
the Web</a></h3>

<blockquote>
<strong>NOTE:</strong> This section is being reconsidered in light
of <a href="http://www.w3.org/2003/g/data-view">GRDDL</a>
</blockquote>

<p>The <a href="http://www.w3.org/TR/html4/">HTML 4.0 specification</a>, in <a
href="http://www.w3.org/TR/1999/REC-html401-19991224/types.html#h-6.12">section
6.12 Link types</a>, enumerates a few useful link relationships, and then
adds:</p>

<blockquote>
  <p>Authors may wish to define additional link types not described in this
  specification. If they do so, they should use a <a
  href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#profiles">profile</a>
  to cite the conventions used to define the link types. Please see the <a
  href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#adef-profile"><code>profile</code></a>
  attribute of the <a
  href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#edef-HEAD"><code>HEAD</code></a>
  element for more details.</p>
</blockquote>

<p>We hereby establish the following conventions used to define some link
types:</p>
First, a mechanism somewhat analagous to the binding of element and attribute
name prefixes to URIs in <a
href="http://www.w3.org/TR/1999/REC-xml-names-19990114/">Namespaces in
XML</a>: a link relationship name whose prefix matches the id attribute of the
head element denotes the URI resulting from the concatenation of the profile
URI (in absolute form) and the local part of the link relationship name. For
example:
<pre>&lt;html xmlns="http://www.w3.org/1999/xhtml">
  &lt;head id='rel' profile="http://www.w3.org/2000/07/hs78#">
    &lt;title>example&lt;/title>
    &lt;link id='c' rel='rel:classes' href='http://www.w3.org/2000/07/hs78#' />
  &lt;/head>
  ...
&lt;/html></pre>

<p>A relationship name containing no colon (':') character has an empty ("")
prefix. The empty prefix should be declared explicitly ala <code>&lt;head
id='' profile='...'></code> rather than by omitting the <code>id</code>
attribute.</p>

<p>Second, we define a link relationship called <dfn
id="classes">classes</dfn>  that allows class names to denote URIs. A link
element that uses this link relationship binds the prefix in its
<code>id</code> attribute to the URI denoted by its <code>href</code>
attribute. In the following example, the rel attribute refers to this <a
href="#classes">classes</a> link relationship, and the class attribute refers
to the <a href="#Rule">Rule</a> class, described below.</p>
<pre>&lt;html xmlns="http://www.w3.org/1999/xhtml">
  &lt;head id='rel' profile="http://www.w3.org/2000/07/hs78#">
    &lt;title>example&lt;/title>
    &lt;link id='c' rel='rel:classes' href='http://www.w3.org/2000/07/hs78#' />
  &lt;/head>
  &lt;body>
    &lt;dl class="c:Rule">
      ...
    &lt;/dl>
&lt;/html></pre>

<p><em>@@hmm... I'm using the same URI for three mechanisms here: (a) link
relationship namespace mechanism, (b) a namespace for a link relationship, (c)
a namespace for three classes. I should probably provide separate URIs for
each of those, and define this one as implying all three.</em><a
name="target-nam1" id="target-nam">target-namespace
(<strong>obsolete</strong>)</a></p>

<h3 id="Declaring">Declaring a hierarchy of classes</h3>

<p>A <code>div</code> element bearing the global class name <dfn
id="ClassTree">ClassTree</dfn> declares a hierarchy of classes, one for each
li element in the <code>div</code> element. </p>

<p>Here's an example from <a href="lists">lists</a>; note that we refer to the
Seq class, but we declare the List class:</p>
<pre>&lt;div class="ClassTree">
&lt;h2>Class hierarchy&lt;/h2>
&lt;ul>
  &lt;li>&lt;a href="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq">Seq&lt;/a>
   &lt;ul>
    &lt;li>&lt;b id="List">List&lt;/b> e.g. &lt;em id="empty">empty&lt;/em>&lt;/li>
   &lt;/ul>
  &lt;/li>
&lt;/ul>
&lt;/div></pre>

<p>Note that you can declare instances ala <code>&lt;em
id="empty">empty&lt;/em></code> or <code>&lt;a
href="..ref...">thatThing&lt;/a></code>.  This markup is translated to the
following RDF (see the <a href="lists.rdf">whole file</a> for details such as
namespace declarations):</p>
<pre>  &lt;s:Class r:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq"
  s:label="Seq" />
  &lt;s:Class r:ID="List" s:label="List">
    &lt;s:subClassOf
    r:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq" />
  &lt;/s:Class>
  &lt;r:Description r:ID="empty">
    &lt;r:type r:resource="#List" />
  &lt;/r:Description></pre>

<p><strong>@@TODO: unions, enumerated sets.</strong></p>

<p>This direct translation of <code>id</code> attributes in HTML to id
attributes in RDF relies on an assumption that the RDF will be made available
at the same address as the HTML is available; i.e. they are variants of the
same generic resource (in the sense of section <a
href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44">14.44
Vary</a> in the <a
href="http://www.w3.org/Protocols/rfc2616/rfc2616.html">HTTP
specification</a>; see also: <a
href="http://www.w3.org/DesignIssues/Generic">Generic Resources</a>)
<strong><a name="TODO" id="TODO">@@TODO</a>: model this generic/variant
relationship in RDF</strong>.</p>

<h3 id="Declaring1">Declaring a property</h3>

<p>An <code>li</code> element bearing the global class name <dfn
id="Property">Property</dfn> declares a property whose URI and label are taken
from the id attribute and content of the first element in the property. The
domain and range of the property are taken from the first and second a
elements in the li element, respectively, if present. A p element in the li is
taken as a comment. For example:</p>
<pre>  &lt;li class="Property">&lt;b id="first">first&lt;/b>:
    &lt;a href="#List">List&lt;/a> ->    anything
    &lt;p>first(l, x) = x is the first item in l&lt;/p>
  &lt;/li></pre>

<p>is transformed to:</p>
<pre>  &lt;r:Property r:ID="first" s:label="first" s:domain="#List">
    &lt;s:comment>first(l, x) = x is the first item in l&lt;/s:comment>
  &lt;/r:Property></pre>

<p>You can link to the property (using <code>&lt;a
href="...xyz">xyz&lt;/a></code>) as well as declaring it (using <code>&lt;b
id="xyz">xyz&lt;/a></code>).</p>

<p><strong><a name="TODO1" id="TODO1">@@TODO</a>: syntax for "facets",
i.e. properties of properties; stuff like inverse, transitive, subproperty,
etc.</strong></p>

<h3 id="Declaring2">Declaring an inference rule</h3>

<p>A <code>dl</code> element bearing the global class name <dfn
id="Rule">Rule</dfn> declares an <a
href="http://www.w3.org/2000/04shoe-swell/inference#Rule">inference rule</a>.
It should have just one <code>dt</code>/<code>dd</code> pair: the
<code>dt</code> is the conclusion of the rule, and the <code>dd</code>
contains a list (<code>ul</code>) of premises (<code>li</code> elements). Each
statement (i.e. premise or conclusion) is written as an element for the
predicate, an element for the subject, and an element for the object. Each of
the predicate, subject, and object elements is either an <code>a</code> or a
<code>var</code> element. var elements represent variables, and <code>a</code>
elements refer (by the URI reference in the <code>href </code>attribute) to
constants. <strong><a name="TODO3" id="TODO3">@@TODO</a>: support for RDF
literals, i.e. strings, using tt</strong>.</p>

<p>See the <a href="lists">lists schema</a> for examples.</p>

<p><strong><a name="TODO2" id="TODO2">@@TODO</a>: support for n-ary
relations.</strong></p>

<h2 id="Related">Related Work</h2>

<blockquote>
  <p>Currently, the only HTML structure interpreted as a knowledge
  representation by WebKB is the definition list. Its use is similar to the
  frame-oriented CG notation with strings as type names</p>
  <address>
    <a
    href="http://meganesia.int.gu.edu.au/%7Ephmartin/WebKB/doc/languages.html#HTMLstructures">2.1.6.1
    HTML structures</a> in <a
    href="http://meganesia.int.gu.edu.au/%7Ephmartin/WebKB/doc/index.html">The
    WebKB set of tools</a> by <a
    href="http://meganesia.int.gu.edu.au/%7Ephmartin/index.html">Philippe
    MARTIN</a> 
  </address>
</blockquote>

<h2 id="app-amaya">Appendix: Notes on <a href="../../../Amaya/">amaya</a> as
an authoring tool:</h2>
<ul>
  <li>@@TODO: add support for &lt;strong>, since Amaya makes that easier to
    get at than &lt;b> (as it should);</li>
  <li>Be careful when adding id attributes: if you select some text and tell
    Amaya to make it bold, you then have to hit f2/esc to select the b element
    before setting the id attribute; otherwise, Amaya inserts a span element
    and puts the id on that.</li>
  <li>I thought amaya/jigteam were set up to handle generic resource, but
    evidently not: I try to save inference and I lose. Hmm... maybe that
    makese sense, since there are two representations. But the etag should
    tell the server which one I'm overwriting. It seems to work, mostly, but
    I've seen some cases that surprised me.</li>
</ul>

<h2 id="app-style">Appendix: Style sheet</h2>

<p>I developed a <a href="style.css">style sheet</a> for use with these markup
conventions. It only works with the empty prefix (i.e.
<code>class="ClassTree"</code> , not <code>class="my:ClassTree"</code>).</p>

<p>A general convention in this stylesheet is: underlined stuff is significant
to the transformation to RDF (there are some exceptions: links in free text
don't get transformed to RDF).</p>

<h2 id="app-ex">Appendix: Examples</h2>

<p><a href="template">template</a></p>

<p>aka stuff to revisit when I upgrade this transformation...</p>
<ul>
  <li><a href="rdf-ms">RDF M&amp;S</a></li>
  <li><a href="lists">lists</a></li>
  <li><a href="http://www.w3.org/2000/04shoe-swell/inference">inference
    rules</a></li>
  <li><a href="foaf">foaf</a></li>
  <li><a href="http://www.w3.org/2000/04/maillog2rdf/email">email</a></li>
  <li><a href="FOPC">FOPC</a></li>
  <li><a href="algernon">algernon</a></li>
  <li><a href="http://www.w3.org/2000/07/document-maintenance/">document
    maintenance</a></li>
  <li><a href="http://www.w3.org/2000/07/DAML-0-5">DAML 0.5</a></li>
  <li><a href="KIF">KIF</a></li>
</ul>

<p></p>
<hr />
<address>
  <a href="http://www.w3.org/People/Connolly/">Dan Connolly<br />
  </a>originally prepared for <a href="../19-DAML">a meeting</a> of 19-20 Jul
  2000<br />
  <small>$Revision: 1.35 $ of $Date: 2005/08/09 04:51:36 $ by $Author:
  connolly $</small> 
</address>

<p></p>
</body>
</html>