Customer-centric companies depend on Speech Analytics to understand and predict customer behavior in ways never before possible."
About Us
- Company Profile
- Phonetic Search Technology
- Phonetics Advantages
- Intellectual Property
- Integration
- News & Events
- In The News
- 2012
- 2011
- 2010
- 2009
- 2008
- 2007
- 2006
- Conferences
- Upcoming Webinars
- All Archived Webinars
- Community and Awards
- Awards
- Community
- Board & Management
- Executive Management
- Technical Management
- Board Members
- Investors
- Careers
As Video Proliferates, Search Tools Struggle to Keep Up
Vendors offer more than basic text-based search, but advanced features are still years away.
By: J. Nicholas Hoover
Information Week
Mar. 31, 2007
For The NewsMarket, A Web site that hosts hundreds of hours of video footage, most of it PR-related background video known as B-roll that's used by 10,000 TV news outlets in 144 countries, search is among its most important capabilities. Journalists need to be able to find relevant video snippets for their stories quickly and easily.
The NewsMarket started out seven years ago trying to develop its own search engine but soon found video search was no easy task. Unlike text search, video search can't rely on the simple parsing of characters and words or on link analysis. Instead, it needs a way to characterize the video content itself, and today that's typically accomplished by using word-based descriptions or some sort of audio-to-text translation. Compared with Google, it's an immature art.
As video explodes on the Web and grows in importance in business, so will the need for sophisticated video search. Many companies want to be able to locate appropriate video, or reuse video they already have, for communicating and training, says Stopher Eagen, U.S. CEO of knowledge management and search software company Autonomy. And they want to make sure the appropriate people can find the video they need when they need it.
VIDEO TAG--YOU'RE IT
The NewsMarket, which now uses Autonomy's video search application, relies mostly on meta tagging its videos, attaching identifying details and, in some cases, even descriptions of the various scenes in the video file itself. The NewsMarket tags content by general subject--or beat--area, as well as by the source and specific topics, like "auto show" or "Microsoft Windows."
Tagging is the typical way of searching video. But Autonomy's and other vendors' search engines do more than just look through the tags. "Tagging sounds great, but you're not really finding the video, you're finding a piece of data about the video," says Suranga Chandratillake, CEO of Blinkx, a consumer video search engine.
Embedded audio is a powerful tool for video search engines, whether through transcription or phoneme search, which searches the actual sounds rather than relying on dictionaries. One common method is automatic audio-to-text transcription. However, transcription can be sloppy, often resulting in meaningless or inaccurate words. "We've tried some tools that were voice to text, and they just didn't work very well," says Rogulja Wolf, streaming content manager at Sandia National Laboratories. Sandia has used BBN and several other companies' products to search its video, which includes everything from meetings to highly technical process reviews. "We spent more time editing than what it was worth," Wolf says. Sandia is now testing Sonic Foundry's phoneme search tool and finds it works better.
Both Autonomy and Sonic Foundry, as well as other vendors such as Nexidia, rely more on phoneme-based search rather than transcription. With phoneme search, users type the subject into the search engine, and the software matches what was typed with a list of phonetically spelled words drawn from the audio track of the video being searched. Phonemes aren't matched to text until someone actually searches for them, and even then words don't need to be spelled correctly because the engine is searching for sounds rather than words.
Autonomy uses both phoneme search and transcription. The two act as multipliers, since Autonomy uses a probabilistic model that builds on what it's heard before to clean up phonemes and account for accents in the transcription process. Eagen says Autonomy also can do speaker recognition. For example, if a company has a lot of videos from its CEO, the engine can be taught to recognize any video in which the CEO is speaking.
Since Sonic Foundry was built for searching Webcasts, it reads text in PowerPoint presentations that run in a segregated window alongside the video. "You're improving the accuracy of these results through these multimodal processes," says Rimas Buinevicius, Sonic Foundry's chairman and CEO. Other applications, Autonomy included, can actually read text and subtitles embedded into the video itself.
CHOOSE YOUR WORDS
Autonomy's automatic clustering feature, which The NewsMarket uses, groups content based on tags and other unstructured information. So videos that contain the word "education," either in a tag or somewhere in the audio, would be grouped automatically as soon as the video is uploaded. It can also cluster based on dictionary meanings of words. When users search with Autonomy, the engine also can provide suggestions for related videos based on user history and what other users searched, similar to the way Amazon.com's search works.
Nexidia also categorizes content, though in a more structured way. Users must define lists of words that, for example, make up a product category for a retailer. When the search engine hears those words in a video, it automatically categorizes that video based on the number of times it hears words that are connected to a specific category.
Another interesting video search feature is active monitoring. Nexidia's system can listen in on all the live streaming video or videoconferences happening inside an organization and alert employees if it hears words or elements important to their jobs.
One of the remaining hurdles for video search is just that--actually searching the video itself. It's easy to search text meta tags, not so easy to search audio tracks, and searching actual video content is a challenge. As Sonic Foundry's Buinevicius notes, some of the video on the Web is just "five-minute videos of a Coke bottle exploding, and there's nothing to analyze."
With more complex videos, vendors are exploring ways to conduct deeper analysis. Autonomy bought rival Virage three years ago, partially for its research into technology that could detect the split second that the screen fades during a scene change. That technology might let the video search engine break down video content into searchable chapters.
Autonomy is using that capability to provide more precise search results. For example, it can locate a specific scene that's relevant to a user's search rather than an entire movie. Autonomy also can do limited recognition of elements in a screen. For example, its security video technology uses facial recognition capability to distinguish people who belong in a building from those who don't.
Video search technology that goes beyond ferreting out the familiar from the unfamiliar requires advanced technology that's still years away, Autonomy's Eagen says. But as video becomes more prevalent in business, Autonomy and other search vendors will need to provide those advanced capabilities sooner rather than later.




