I worked with IBM folks back in the ’90s who did some of the early work on latent semantic analysis, which was later implemented in the more famous latent semantic indexing. The original concepts were used to identify the meanings of words–words with multiple meanings can be “disambiguated” (as the techies say) by looking at the surrounding words. But most people are thinking of something far more complicated than that when they talk about semantic search. Recently, I came across a company that makes it simple again, and I think they are onto something.
Herb Roitblat, a Ph.D. and co founder of OrcaTec, found that post I wrote a while back on semantic search, and he complained to me that, although my post laid out some of the problems of semantic search, the biggest problem is that the folks behind the semantic Web have appropriated the term “semantic search.” Herb said it concisely:
In my opinion, semantic web tools are not much good for search because they force everything into fixed categories and then try to force people to conform to those categories. Just tonight I had the experience of trying to pick a category for my blog into a category and had a hard time figuring out which category to put it in.
Herb went on to outline how OraTec’s new search engine, Truevert, focused on green search (environment and sustainability), that delivers true semantic search based on some of the same latent semantic analysis ideas that I saw at IBM ten years ago, but as usual in this field, the implementation is worth a lot more than anyone’s idea. Once again, let’s see how Herb explains the idea behind Truevert:
A search for clothing returns pages about eco-friendly clothing, not the location of the nearest Gap store. A search for meat returns pages that talk about organic meat, wholesome baby food, and the environmental impact of meat, just what you would expect from a green point of view.
Truevert performs such magic without having built its own search engine–it actually consists of a semantic search layer built atop Yahoo! Search. Herb uses and example to illustrate how Truevert works:
If the word “lawyer” appears in a document, then it is likely that “attorney,” “judge,” “case,” and “court,” will also occur. Conversely, if one or more of these related words occurs, then the document is likely to be about a “lawyer” even if that word is not present. Similarly, the word “court” in the company of other words like “ball,” “player,” and “basket” is more likely to be about basketball than about litigation. The model computes the patterns of word usage from a population of documents and uses those patterns to predict what the documents are about. Similarly, when people understand a sentence, each word in the sentence helps to disambiguate the other words in the sentence. For example, consider the sentence, “the tree surgeon examined the young man’s palm.” By the time you get to the word “palm,” you have a pretty good idea what that word means.
This approach is what distinguishes Truevert from the mainstream search engines, which can’t guess at what sense of meaning a person is using in their search. Herb explained how that can be helpful in a green vertical search:
For example, a search for “CFL” returns documents about compact fluorescent light bulbs, not the Canadian Football League. A search for refrigerators returns pages about solar and high-efficiency refrigerators. A search for coffee returns pages about fair-trade and organic coffee. It knows what words mean, not just in a dictionary sense, but also in the sense of what’s important to this community. People don’t have to work so hard to find the information that suits their interests.
Time will tell if Truevert is successful, and whether the technology behind Truevert will spawn a string of other subject-oriented vertical search engines. But the idea that word combinations reveal the subject of documents is a good one, and Truevert appears to be a strong implementation of that idea. I don’t expect vertical search engines to suddenly undermine Google’s dominance of the search business, but I do believe that a smart approach like Truevert’s might finally make some vertical search engines profitable.