Semantic web: is it the big thing yet?

Mar 12, 2008

The ’semantic web’ is not a new idea. In fact, the idea has been around almost as long as the web itself, and its relevance is based on the very simple notion that much of the information that we make publicly accessible via the web is not yet organised according to any particular scheme. Resultingly, searching it can be very difficult, hence the great rewards that flow to companies like Google; their search engine does the best job of mapping the chaos, turning a vast jumble of information into something that is usable to ordinary humans.

Traditionally, search engines work by looking for the frequency and location of words within a document. When you search for ‘chicken pasta recipe’, the search engine attempts to find pages which contain all of those words. The more often these words occur, the better. If these words appear in the page title, or within page header tags, or in links pointing to the page (the more of these the better) then the ranking of the page improves. Google’s great innovation was counting the incoming links from other sites, but this probably stands as the last truly great innovation in search.

Until now. Yahoo! is set to make a major step towards semantic searching in an effort to recover market share against Google (as an aside: am I the only one who has noticed the flurry of activity from Yahoo! since Microsoft began their takeover attempt?). For the first time, a major search company is openly talking about making use of the growing amount of semantic data now available on the web. It has been a long wait, but the age of the semantic web is finally here.

What is the semantic web anyway?

Microformat-logo.pngHere it is necessary to briefly recap just what the semantic web is. Semantic data is, essentially, data with meaning. In the semantic web, 'Rob Knight’ isn’t just a combination of letters, 'Rob Knight’ is a person and the search engine knows it. The search engine may have some contact details, social networking profiles or other pieces of relevant data to display to someone searching for me. This is only possible if I - either directly or via third-party sites that I am a member of (e.g. LinkedIn) - make this data available in a format that explains what the data means. So, instead of a jumble of informated scattered all over a page (or several pages), data is made available in machine-readable formats. These are much simpler than HTML pages and are therefore comprehensible to computer systems like search engines. Many of the things that we describe in ordinary language in HTML pages can be re-described in some new semantic format. Microformats provide an easy stepping-stone for converting existing web page data into a semantic format.

Perhaps the biggest application will be in product search. Retailers will be able to provide far more detailed information about their products, information that means something to the computer systems responsible for searching and categorising this information. Consider a clothing retailer; each product can be described not just by name, but by colour, size, price, availability and more. When searching for 'blue jeans’, the search engine is no longer looking for the words 'blue’ and 'jeans’ somewhere on a web page; it can recognise that 'jeans’ are a type of product and that 'blue’ is an attribute of the product, and can then list all blue jeans that it knows about. And if the user wants to sort by price, they can. And since the search engine will also have semantic information about the retailer - their geographical location, for example - the search engine may be able to find the nearest retailer with the lowest shipping costs.

The collection of data about a 'thing’ (be it a person, company, product, place or any identifiable object) into discrete, searchable data files will also make it easier to detect connections between things. For example, a review of a product could link directly to the definition of that product on a manufacturer’s website, rather than linking to a simple HTML page. This enables the search engine to have considerable confidence that the review is about the product rather than, say, about the manufacturer, or the manufacturer’s website. This will enable much better dialogue between consumers, retailers and manufacturers, in a way which will, in the long run, produce changes in the way we do business that are at least as big as the changes made in the last decade.

The immediate impact

Having observed the technology industry for some time, I’m not convinced, despite the advantages of the semantic web, that there is an instant payoff. We’re only at the beginning of the change. And with the global economy looking shaky, investors and companies would be right to be wary of promises of technological salvation. The real work will happen over the next few years, just as the real work of building web 2.0 happened in the shadows of the dot-com crash. Those who innovate sensibly, delivering lower costs and better return on investment will find themselves well-placed for the upswing, just as those lonely pioneers who carried on inventing after the bubble burst found that they had struck gold once more as the web 2.0 trend appeared.

I will be following this up with a series of posts about semantic web technologies, going into more detail about the specific technologies and their implementations.