Thursday, 28 July 2011

Linked Data Books (1): Linked Data - Evolving the Web into a Global Data Space

Compared with Computer Science, World Wide Web and Artificial Intelligence, linked data is a relatively new subject to learn. So it is important to find some books or articles to break the ice for beginners and bring them into the magic world of linked data. I think the very first book, which thoroughly described linked data and linked data applications, is Tom Heath and Christian Bizer's Linked Data - Evolving the Web into a Global Data Space.

Dr. Tom Heath is the lead researcher at Talis, which is the leading research institution in linked data and semantic Web. Christian Bizer is one of the creators of well-known DBpedia. They wrote this book to demonstrate the state-of-art of linked data. Book starts from how linked data comes into being and why linked data is important in the current Web. Then they introduces the principles of linked data, which severs as the basic knowledge of publishing data on the Web.

Well, you have know something about the Web first in order to understand why there are four linked data principles. If you know nothing about Web architecture and what is URI here are some links you might find helpful:

Architecture of World Wide Web
What is URI on wikipedia

This book nearly covers everything you would like to know about the current linked data, such as how to choose URIs, dereferecing methods and choosing vocabularies. It also mentions many tools (such as D2R server, sindice inspector, etc) that would be useful for developers to publish data.

One of the figures I really like is the Figure 5.1 linked data publishing workflow

Linked data publishing options and workflow
This figure covers most of the routes leading from traditional structured data to linked data. More importantly, publishing linked data is not a one step task, you have to be patient and carefully evaluate each step.

This is a great book, well from a technician's point of view. Everyone who wants to know linked data or any developer who wants to shift their systems to linked data should not miss this book. But don't forget that modern technologies are usually driven by big companies. Only the market and users can finally decide which technology survives. It is the same case to linked data. I am wondering, after ten years, when I look back into the content in this book, how many of them will become well-accepted and how many of them will just vanished.

Further reading:

Monday, 25 July 2011

A short tutorial of Linked data: Southampton ECS Web Team › WTF is the Semantic Web?

University of Southampton is several pioneers all around world to publish the whole university data into linked data cloud. You can find the data at:

Building, transport, curriculum, financial, classrooms, etc, information are all published as linked data. Chris Gutteridge is the main developer this open data. Here is his short tutorial of linked data:

Southampton ECS Web Team › WTF is the Semantic Web?

Saturday, 23 July 2011

Want a semantic web/ linked data job?

20 years ago, if somebody post a job position saying he or she wants to employ a web developer, nobody would understand what's that! However, see what is happening now. Not saying the big business, every tiny small business needs a web site to advertise themselves. Everybody knows what is a Web developer and there are thousands and millions of Web developers out there.

Will this be the a similar story for linked data developers in 20 years time? We know PHP, Java, XML, Javascript, etc. Later there might be job requirement for SPARQL, OWL, RDF, etc. How far away these semantic Web technologies from our daily life?

It seems to me that currently, jobs about linked data and semantic Web usually stay in big research institutions and big companies who are "living" on the Web, such as Google, Yahoo, Microsoft, facebook, etc. There are several pioneer companies, such as Talis, Garlik, Seme4 and ordnance survey, which focus on semantic Web and linked data solutions. These companies provide semantic Web related software and help you publish your data into linked data cloud. If your go to, you will find more companies or organisations like those.

To make semantic Web and linked data more popular, there is a gap we have to fulfill, which is the gap between research and money. Certainly, if we want to persuade Lord Sugar to invest a company doing linked data or shift his current business into linked data, we have to give him a clear idea how much is the budget and where is the profit. Every businessman will ask similar question when comes to the concept of linked data: "how can linked data make money for me? And what benefit linked data can bring to me which is impossible for the current technologies?"

It's not an easy question to answer as far as I know. At least, it's pretty difficult for me now to give a single example of what linked data can do (I mean linked data can do now, not in the future) but Google, IBM, Microsoft cannot.  However, I believe that the big vision of linked data is so great that it could not be achieved in a couple of days time. Business man in UK has already sat down together in mydata initiative to talk about how open data can finally make great profit to business. We need to be patient.

I hope that after a while, when I come back to or, I could find some jobs under the category of semantic Web or linked data publisher and bid on those jobs to earn some pocket money :=)

Thursday, 21 July 2011

Why linked data works

Why linked data works and what is the difference between semantic Web and Linked Data? Here is a famous presentation made by Tim Berners Lee in TED.

In my opinion, linked data is the way lead to the big vision of semantic Web. Around 10 years ago, when Tim firstly described the vision of semantic Web, we haven't got a clear idea how the goal could be achieved. Many attempts have been made since then: RDF, RDFs, Web Ontology Language, SPARQL query language, RDFa, Jena, Sesame, triple store, AKT project, etc. However, semantic Web was still mysterious "weapon" controlled only by a group of researchers. Massive Web users were not be able to enjoy the benefit described by Tim ten years ago.

Until recently, the coming up of linked data became the ice-breaker. More and more areas in everyday life starts to apply linked data. One famous example is the open government initiative ( So, where is the difference between semantic web and linked data?

The whole idea of semantic web is based on the mutual understanding of knowledge. The old way of reaching this mutual understanding is everybody sit down and have a chat in the hope that some day we can agree on certain vocabularies to describe the knowledge. I call it the top-down route to semantic Web. But, to be honest, is it possible? When I was doing my master degree in University of Southampton, the lecturer asked all the students to create an ontology to describe Pizza. Some students categorized pizza by vegetarian and non-vegetarian, while some of them categorized by the thickness of the pizza. Well, if we cannot agree on the description of a simple pizza, how can we agree on more complex concepts like planes and cars.

OK, the top-down route is not that realistic. So the pioneers of semantic Web found another way of doing semantic web, which is the idea of Linked data. In TED speech, Tim emphasized the importance of putting raw data online. This is the basic idea of linked data: I don't care what vocabulary do you use to describe the things you want to publish, just publish it! Well, THERE ARE four linked data principles ( that publishers should follow, but at least we don't need to agree on a complex ontology before we even could publish data. I call this bottom-up way.

So, next, why the bottom-up way works? Here is a metaphor. Two thousand years ago, there were languages like English, Chinese, French, etc. But Chinese didn't speak English and they were not even be bothered to learn English because Chinese didn't know British. With the development of the world's trading, Chinese, Japanese, French British have to do business with each other and here comes problem: we don't speak the same language! Somebody had the solution: let's create a completely new language which is "simple" enough for everybody for all countries to understand, which is just like the top-down method. Unfortunately, until now, all such efforts are fruitless. The real situation now is that English becomes the defacto "world language". I don't speak Japanese, but I can speak to a Japanese using English.  The basis of English becoming world language is the development of world trading and the communication among different cultures is necessary. So the idea of linked data is like trying to put products into the world market first and let the market choose how the knowledge representation can connect to each other. I can expect that like the English language, we should put data open online first. We should admit that there are diverse understanding of concepts and then worry about how to  reach a mutual understanding even though we decide to keep the diversity of understandings.

Why we need applying linked data principles into multimedia?

There has been the fast growth of multimedia sharing and annotation applications on the Web, which generate great amount of annotations for the multimedia resources. However, the indexing of these annotations to improve searching for media fragments, instead of the whole multimedia resources, is still not satisfactory. Many people many have the experience that going through a long long long video to just find out the one or two minutes useful fragments. 

Linked data describes a series of methods of publishing structured data using semantic Web technologies into machine readable format. So the basic idea of linked data is to encourage people put their data online in a generalised format, so that everybody, including machine agents, can take a look at it. 

As I said, the current video search results are not satisfactory, because many annotations are still on the WHOLE multimedia resource level. For most applications, descriptions, tags and comments only annotate the whole multimedia resource. In addition, they are not connected with media fragments and there is no efficient mechanism, except for traditional search engines, to interlink media fragments and annotations across different repositories. That's why we need linked data to break out these barriers.