Friday, 12 August 2011

HTML5 + Linked Data + Multimedia + TV experience = An HTML5 Leanback TV webapp from

This post really attracts me:

An HTML5 Leanback TV webapp that brings SPARQL to your living room |

"When you are sat on the sofa at the end of the day relaxing and watching TV, maybe eating food and not in the mood to have to keep constantly making decisions about what to watch you might not think that you are in a situation where Linked Data and SPARQL queries could be useful. Yet the flexibility of the data that can be obtained from data sources supporting these technologies makes them ideal candidates to power a Leanback TV experience."

"...By taking an existing template and an existing, very flexible, source of data we can create a whole new way for people to discover content on offer"

Well, people keep on asking : "where is linked data? How can I feel it?". Here is a good example. Well, you can say it could be done using Web 2.0 mashups. Yes, you can. But in this example, it is SPARQL endpoint from Open University, which distinguish it from normal Web 2.0 ways. Web 2.0 enable you to publish your data using some Web technologies, such as Restful Web Services or Ajax. However, applications still don't understand each other, while experienced developers can "understand". In Web 3.0, you not only publish your data using common Web technologies, but also using Semantic Web technologies. You represent your data in rdf, publish it through SPARQL endpoint, using 3XX and content negotiation to dereference your rdf data, etc, etc. People with creative thinking can then build more powerful applications on it. That's linked data!

It seems to me that the roadmap described by Tim Berners Lee is, step by step, becoming true. You can never image how you can use linked data. You can't! Just because people are so creative.

Monday, 1 August 2011

Is Rich Snippet Saving Linked Data as well as Multimedia?

As I said in my previous post Want a semantic web/ linked data job?, how linked data is distributed to end users will depend on big companies. The current "big companies" will not necessarily survive after 10 years. When you are using the latest Google Chrome browser, can you still remember that it was Netscape who introduced massive users about what is a browser for the first time. Anyway, me, as a researcher and developer of linked data, have to flatter Google, Yahoo, Microsoft's favours, i.e. I have to create what they think valuable, and publish data to their favours... poor me...

Here is an interesting presentation about Google's vision of using Linked data in its search:
How Google is using Linked Data Today and Vision For Tomorrow

Before I looked the presentation, I would expect Google mentions some giant projects in Linked Data, something like DBpedia, Freebase, RKBExplorer, Jena, Sesame, etc. But quite to my surprise, the presentation is more like an advertisement of RDFa and microformats (known as Rich Snippet). Why? Google, you naughty boy! Google is a search engine indexing web pages, so where is the reason for Google to develop another big chunk of functions like Sindice to crawl triple stores? Google is clever and has already dominate most users' search experience. If you go to Sindice, you will find that if you are not an expert of semantic Web, you will have no idea how to do an efficient search and how to understand the search results.

Google adopt the RDFa as the pattern of publishing linked data and use it to provide better search results. Some vocabularies, such as Event, Reviews, Geo location, etc, are now machine-understandable ( or more exactly, "Google understandable").  OK, what I really care about is the position of multimedia in Rich Snippet. If you copy the url of a Youtube video replaying page into Sindice's inspector tool, you won't find any RDFa. Is Youtube a company owned by Google? Yes! But where is RDFa in Youtube?

Well, there are some resource about multimedia and RDFa Google is trying to work on RDFa and videos:
Supporting Facebook Share and RDFa for Videos
It's a good step forward at least we can see some basic metadata about video in RDFa. And if you do embed these attributes in your page, you will find your search result in Google will display an thumb image instead of just a link. Isn't it great?

However, that is not the end. Where are the media fragments then? I am fed up with WHOLE multimedia search now. I need to find media fragments! Can you index media fragments in RDFa? This picture is quite amazing in Google's linked data presentation:

But unfortunately, this is only a mockup! I am recently working on a demo of indexing UK Parliament debate video. I have tried to embed RDFa into the debate replay page. See the screen cast of the demo from youtube video. I used Media Fragment 1.0 draft and some Linked Open Data Event (LODE) ontologies to give a simple model of the debate.

I think Google will index media fragments embedded in RDFa one day. I can't see any reason why not. It will be same to Bing, Yahoo, or even Baidu in China~~~

Thursday, 28 July 2011

Linked Data Books (1): Linked Data - Evolving the Web into a Global Data Space

Compared with Computer Science, World Wide Web and Artificial Intelligence, linked data is a relatively new subject to learn. So it is important to find some books or articles to break the ice for beginners and bring them into the magic world of linked data. I think the very first book, which thoroughly described linked data and linked data applications, is Tom Heath and Christian Bizer's Linked Data - Evolving the Web into a Global Data Space.

Dr. Tom Heath is the lead researcher at Talis, which is the leading research institution in linked data and semantic Web. Christian Bizer is one of the creators of well-known DBpedia. They wrote this book to demonstrate the state-of-art of linked data. Book starts from how linked data comes into being and why linked data is important in the current Web. Then they introduces the principles of linked data, which severs as the basic knowledge of publishing data on the Web.

Well, you have know something about the Web first in order to understand why there are four linked data principles. If you know nothing about Web architecture and what is URI here are some links you might find helpful:

Architecture of World Wide Web
What is URI on wikipedia

This book nearly covers everything you would like to know about the current linked data, such as how to choose URIs, dereferecing methods and choosing vocabularies. It also mentions many tools (such as D2R server, sindice inspector, etc) that would be useful for developers to publish data.

One of the figures I really like is the Figure 5.1 linked data publishing workflow

Linked data publishing options and workflow
This figure covers most of the routes leading from traditional structured data to linked data. More importantly, publishing linked data is not a one step task, you have to be patient and carefully evaluate each step.

This is a great book, well from a technician's point of view. Everyone who wants to know linked data or any developer who wants to shift their systems to linked data should not miss this book. But don't forget that modern technologies are usually driven by big companies. Only the market and users can finally decide which technology survives. It is the same case to linked data. I am wondering, after ten years, when I look back into the content in this book, how many of them will become well-accepted and how many of them will just vanished.

Further reading:

Monday, 25 July 2011

A short tutorial of Linked data: Southampton ECS Web Team › WTF is the Semantic Web?

University of Southampton is several pioneers all around world to publish the whole university data into linked data cloud. You can find the data at:

Building, transport, curriculum, financial, classrooms, etc, information are all published as linked data. Chris Gutteridge is the main developer this open data. Here is his short tutorial of linked data:

Southampton ECS Web Team › WTF is the Semantic Web?

Saturday, 23 July 2011

Want a semantic web/ linked data job?

20 years ago, if somebody post a job position saying he or she wants to employ a web developer, nobody would understand what's that! However, see what is happening now. Not saying the big business, every tiny small business needs a web site to advertise themselves. Everybody knows what is a Web developer and there are thousands and millions of Web developers out there.

Will this be the a similar story for linked data developers in 20 years time? We know PHP, Java, XML, Javascript, etc. Later there might be job requirement for SPARQL, OWL, RDF, etc. How far away these semantic Web technologies from our daily life?

It seems to me that currently, jobs about linked data and semantic Web usually stay in big research institutions and big companies who are "living" on the Web, such as Google, Yahoo, Microsoft, facebook, etc. There are several pioneer companies, such as Talis, Garlik, Seme4 and ordnance survey, which focus on semantic Web and linked data solutions. These companies provide semantic Web related software and help you publish your data into linked data cloud. If your go to, you will find more companies or organisations like those.

To make semantic Web and linked data more popular, there is a gap we have to fulfill, which is the gap between research and money. Certainly, if we want to persuade Lord Sugar to invest a company doing linked data or shift his current business into linked data, we have to give him a clear idea how much is the budget and where is the profit. Every businessman will ask similar question when comes to the concept of linked data: "how can linked data make money for me? And what benefit linked data can bring to me which is impossible for the current technologies?"

It's not an easy question to answer as far as I know. At least, it's pretty difficult for me now to give a single example of what linked data can do (I mean linked data can do now, not in the future) but Google, IBM, Microsoft cannot.  However, I believe that the big vision of linked data is so great that it could not be achieved in a couple of days time. Business man in UK has already sat down together in mydata initiative to talk about how open data can finally make great profit to business. We need to be patient.

I hope that after a while, when I come back to or, I could find some jobs under the category of semantic Web or linked data publisher and bid on those jobs to earn some pocket money :=)

Thursday, 21 July 2011

Why linked data works

Why linked data works and what is the difference between semantic Web and Linked Data? Here is a famous presentation made by Tim Berners Lee in TED.

In my opinion, linked data is the way lead to the big vision of semantic Web. Around 10 years ago, when Tim firstly described the vision of semantic Web, we haven't got a clear idea how the goal could be achieved. Many attempts have been made since then: RDF, RDFs, Web Ontology Language, SPARQL query language, RDFa, Jena, Sesame, triple store, AKT project, etc. However, semantic Web was still mysterious "weapon" controlled only by a group of researchers. Massive Web users were not be able to enjoy the benefit described by Tim ten years ago.

Until recently, the coming up of linked data became the ice-breaker. More and more areas in everyday life starts to apply linked data. One famous example is the open government initiative ( So, where is the difference between semantic web and linked data?

The whole idea of semantic web is based on the mutual understanding of knowledge. The old way of reaching this mutual understanding is everybody sit down and have a chat in the hope that some day we can agree on certain vocabularies to describe the knowledge. I call it the top-down route to semantic Web. But, to be honest, is it possible? When I was doing my master degree in University of Southampton, the lecturer asked all the students to create an ontology to describe Pizza. Some students categorized pizza by vegetarian and non-vegetarian, while some of them categorized by the thickness of the pizza. Well, if we cannot agree on the description of a simple pizza, how can we agree on more complex concepts like planes and cars.

OK, the top-down route is not that realistic. So the pioneers of semantic Web found another way of doing semantic web, which is the idea of Linked data. In TED speech, Tim emphasized the importance of putting raw data online. This is the basic idea of linked data: I don't care what vocabulary do you use to describe the things you want to publish, just publish it! Well, THERE ARE four linked data principles ( that publishers should follow, but at least we don't need to agree on a complex ontology before we even could publish data. I call this bottom-up way.

So, next, why the bottom-up way works? Here is a metaphor. Two thousand years ago, there were languages like English, Chinese, French, etc. But Chinese didn't speak English and they were not even be bothered to learn English because Chinese didn't know British. With the development of the world's trading, Chinese, Japanese, French British have to do business with each other and here comes problem: we don't speak the same language! Somebody had the solution: let's create a completely new language which is "simple" enough for everybody for all countries to understand, which is just like the top-down method. Unfortunately, until now, all such efforts are fruitless. The real situation now is that English becomes the defacto "world language". I don't speak Japanese, but I can speak to a Japanese using English.  The basis of English becoming world language is the development of world trading and the communication among different cultures is necessary. So the idea of linked data is like trying to put products into the world market first and let the market choose how the knowledge representation can connect to each other. I can expect that like the English language, we should put data open online first. We should admit that there are diverse understanding of concepts and then worry about how to  reach a mutual understanding even though we decide to keep the diversity of understandings.

Why we need applying linked data principles into multimedia?

There has been the fast growth of multimedia sharing and annotation applications on the Web, which generate great amount of annotations for the multimedia resources. However, the indexing of these annotations to improve searching for media fragments, instead of the whole multimedia resources, is still not satisfactory. Many people many have the experience that going through a long long long video to just find out the one or two minutes useful fragments. 

Linked data describes a series of methods of publishing structured data using semantic Web technologies into machine readable format. So the basic idea of linked data is to encourage people put their data online in a generalised format, so that everybody, including machine agents, can take a look at it. 

As I said, the current video search results are not satisfactory, because many annotations are still on the WHOLE multimedia resource level. For most applications, descriptions, tags and comments only annotate the whole multimedia resource. In addition, they are not connected with media fragments and there is no efficient mechanism, except for traditional search engines, to interlink media fragments and annotations across different repositories. That's why we need linked data to break out these barriers.