Saturday, October 29, 2011

On the difference between Linked Data and Semantic Web

After being confused for some time about the difference between Linked Data and Semantic Web and after reading some resources about the both concepts, I would like to share my interpretation of what I read.

Semantic Web is a vision of (among some other things) creating a Web of Data. Linked Data is a concrete means to achieve (a lightweight version of) that vision. I will explain later what I mean by the lightweight version. For now, Linked Data can be seen like a reference implementation of the Semantic Web, one of several possible implementations. Semantic Web is What and Linked Data is How.

According to the Linked Data book : "Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards - the Web of Data."

According to the W3C Linked Data page: "The Semantic Web is a Web of Data... to make the Web of Data a reality, it is important to have the huge amount of data on the Web available in a standard format, reachable and manageable by Semantic Web tools. Furthermore, not only does the Semantic Web need access to data, but relationships among data should be made available, too, to create a Web of Data (as opposed to a sheer collection of datasets). This collection of interrelated datasets on the Web can also be referred to as Linked Data.
Linked Data lies at the heart of what Semantic Web is all about: large scale integration of, and reasoning on, data on the Web."

According to the article of Chris Bizer, Tom Heath and TimBL, Linked Data - the Story so far : "... while the Semantic Web, or Web of Data, is the goal or the end result of this process, Linked Data provides the means to reach that goal. ... Over time, with Linked Data as a foundation, some of the more sophisticated proposals associated with the Semantic Web vision, such as intelligent agents, may become a reality."

So Linked Data constitutes a paradigm of publishing data sets on the Web in order to achieve the goal of creating a Web of Data - part of the vision of the Semantic Web. The published interrelated data sets themselves are also referred as Linked Data.

There are, however, at least two differences between the original vision of Semantic Web and the vision Linked Data principles facilitate to achieve.

The first difference is about the usage of URIs.  According to Linked Data principles URIs have to be dereferenceable, while there is no such requirement for RDF. In the citations below, the bold font is applied by me.

RDF Primer:
In addition, sometimes an organization will use a vocabulary's namespace URIref as the URL of a Web resource that provides further information about that vocabulary... Accessing ... namespace URIref in a Web browser will retrieve additional information about the ... vocabulary... However, this is also just a convention. RDF does not assume that a namespace URI identifies a retrievable Web resource

The Linked Data book:
The primary means of publishing Linked Data on the Web is by making URIs dereferenceable, thereby enabling the follow-your-nose style of data discovery. This should be considered the minimal requirements for Linked Data publishing.

The second difference is about ontological axioms. According to the Linked Data book ontological axioms should be used sparingly:

"Only define things that matter – for example, defining domains and ranges helps clarify how properties should be used, but over-specifying a vocabulary can also produce unexpected inferences when the data is consumed. Thus you should not overload vocabularies with ontological axioms, but better define terms rather loosely (for instance, by using only the RDFS and OWL terms introduced above). "

The RDFS and OWL terms introduced in the Linked Data book are :
  • rdf:type
  • rdfs:Class
  • rdfs:Property
  • rdfs:subClassOf
  • rdfs:subPropertyOf
  • rdfs:domain
  • rdfs:range
  • rdfs:label
  • rdfs:comment
  • owl:Ontology
  • owl:ObjectProperty
  • owl:inverseOf
  • owl:equivalentClass
  • owl:equivalentProperty
  • owl:inverseFunctionalProperty
So this is what I meant by writing that Linked Data is a means to reach a lightweight version of Semantic Web - the Web of Data with limited use of ontologies and knowledge representation.

One might ask where the use of RDFS and OWL appears in the Linked Data principles. It is actually in the principle 3: "When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)"

Once you use URIs for RDF properties, looking up the properties should provide an information about the properties - information expressed by RDFS and OWL.

In this talk about Linked Data TimBL mentions using Ontology bits for basic inference : "Inference - smarter query" and Ontology bits for Validation and Constraining input : "... mistakes to be spotted... user input menus to be constrained".

Note the words "basic" and "bits".

The seminal paper "The Semantic Web" in Scientific American from 2001 talked about inference rules in the ontologies, for example:
Inference rules in ontologies supply further power. An ontology may express the rule "If a city code is associated with a state code, and an address uses that city code, then that address has the associated state code." 

It seems that TimBL too, is now in favor of achieving (first) the limited version of the Semantic Web by the Linked Data principles - less ontological axioms, less knowledge management, less semantics. Maybe this attitude is aligned with the Rule of Least Power:

Principle: Powerful languages inhibit information reuse.
Good Practice: Use the least powerful language suitable for expressing information, constraints or programs on the World Wide Web.

So, using ontological axioms except from those mentioned above is probably not required for creating the Web of Data. This is why their usage is discouraged by the Linked Data book - they provide more power than needed. A more powerful use of ontologies might be labeled as the Web of Knowledge or Linked Ontologies or Linked Knowledge as opposed to the Web of Data and Linked Data. The original Semantic Web vision probably was to create both Web of Data and Web of Knowledge (Web of Ontologies). The goal of Linked Data paradigm is to achieve Web of Data only.