A conversation with PODZINGER

Wondering how podcasts and other audio files are found by search engines? Find out in my June Biznology newsletter in an interview with two of the people behind the hot audio search engine PODZINGER.

If you’re a subscriber to Revenue Magazine, you’ll see my column in the July-August issue on how podcasts can help your search marketing efforts. In that column, I touched on a new audio search engine called PODZINGER, which can find the actual spoken words in a podcast in response to a search.

I conducted an interview with two of the folks behind PODZINGER, Alex Laats, the President of Delta Division of BBN (which owns PodZinger)and Marie Mateer, the Vice President of Commercial Speech Solutions for BBN.

Mike: I am not an expert in speech recognition. Can you give me some background on PODZINGER’s technology?

Alex: BBN started speech recognition in the 1970s and what was interesting about your blog [entry] is that phonemic technology is actually the older of the two technologies. Phonemic has problems because recognizing a sound is not the same as recognizing a word. Our large vocabulary speech recognition techniques do a better job of providing relevant search results because they recognize words.

In the late 1990s, we made a decision to focus on a large vocabulary approach rather than phonemic for two reasons:

better accuracy of recognizing words
you can display the results as text for the searcher to scan

With phonemic approaches, the searcher must listen to the audio [of every match found] and false positives are a worse user experience. We have been funded by DARPA [to develop this technology] because none of their experts believed that phonemic approaches were better [than large vocabulary ones].

Marie: Large vocabulary approaches provide higher accuracy because if some words are not recognized, [the recognizer] can fill in missing words [from context].

Alex: [At some point,] we want to integrate named entity extraction and other [natural language processing] techniques.

If you wanted to find the podcast where Howard Stern was on 60 Minutes, and you typed “Howard Stern on 60 Minutes” as your search, a phonemic system would need the podcast to have the words “Howard Stern on 60 Minutes” recorded in that order to find it [which is very unlikely to be present on the right audio file]. We clearly take advantage of phonemes as a complement—we add the phonemic representations of proper names.

Marie: [One problem with phonemes is that] “Stern” could match the audio phonemes for “best earnings” which would be a false positive. We’re considering using phonemic approaches for search terms not found in our 65,000-word dictionary, but phonemic [has problems because it] is hard to match words spoken with different accents. There may be some applications where phonemic works better because there is a higher proportion of new words [(words not in the speech recognition dictionary), but you can always just add those words and then speech recognition works better again].

We’ll add [a full phonemic analyzer to complement our existing large vocabulary analyzer] when the market needs it—when they say we’re not finding what they’re looking for.

Alex: We could do both [a full phonemic analyzer and our existing large vocabulary analyzer] but we’ve decided not to implement a full phonemic analyzer [yet]. But we do add new words with a phonemic representation along with information on its common usage [to aid its recognition by the large vocabulary speech recognizer].

Nexidia [or any phonemic analyzer] works well with Call Center applications [where there is a very diverse set of words being used].

Marie: If you are concerned about performance or memory footprint, phonemic analyzers tend to run faster over just 48 phonemes rather than 65,000 words.

Mike: Why don’t you run into more misses in just 65,000 words? That does not seem like a lot to me, considering how many proper nouns there are.

Marie: For podcasts, people just don’t tend to search for words that are missing. We’re constantly adding and deleting words from the dictionary to keep up with the news. There’s no technology that has all the answers. [One of the good things about a speech recognition approach is that] people can interpolate from context what’s missing from the [snippet when the analyzer misses a word or two].

Alex: We’re going for good search results more than good speech recognition.

Mike: Are you concerned about Google, Yahoo!, or one of the other mainstream search engines entering this space?

Alex: Our target is to provide this technology to partners—so far Google and Yahoo! are experimenting only with searching metadata by authors. Like Amazon’s A9 does with search inside the book, we want to do search inside the audio or video.

Mike: You don’t seem to consider Nexidia a competitor. Who are your competitors?

Alex: The press is talking about Blinx and Podscope. Podscope uses a phonemic-based approach, so you should try them to see whether you think the results are better than ours. Blinx does have some speech recognition technology using a large vocabulary approach [like ours].

Mike: Would you say you’re more likely to pursue a strategy of licensing to search engines than pursuing your own public portal?

Alex: I would not say either is more likely or less likely, I think it was a mistake by Overture to give up their public site to go after private label deals and it came back to bite them because they didn’t have a consumer site pushing them to improve the user experience. We give [licensing and the public portal] equal weight. You can do ads based on both keywords and on the context itself.

Mike: What’s an example of context?

Alex: A combination of words or names that were not purchased [as keywords in AdSense, for example]. For example, Adam Stern is an obscure Red Sox outfielder whom no one has likely purchased as a keyword. By being able to recognize words as opposed to phonemes, Google can show ads based on the Red Sox or baseball, which probably do have paid ads for them.

Mike: Thank you both for your insights today.

Trending Now

A conversation with PODZINGER

Mike Moran

Related Posts

Join the Discussion Cancel Reply

POPULAR POSTS

Team Flow Institute Releases Recommendations on How to Prepare for the Successful Integration of AI

Envisioning the Future of Human Work in the Age of AI: The Team Flow Institute 2024 Forecast

How Digital Technologies Are Revolutionizing the Commercial Real Estate Industry

Lessons Learned from the MGM Hack

Team Flow Institute Launches to Create a Collective Vision for the Fourth Industrial Revolution