musings on search

For general rambling.
Post Reply
Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

musings on search

Post by Jonathan »

seems to me like there are two kinds of p2p file search. the first kind is when you're searching for specific content. "Battlestar Galactica Season 1 Episode 1." the second is when you're searching for generic content. Show me videos with puppies in them! are these semantically (and thus, possibly structurally) different kinds of queries, or am i being needlessly pedantic?

i bring it up because the p2p systems i have used have generally slotted into being better at one kind of search. napster: specific content. kazaa: generic content. bt: specific content. i wonder if these differences are due to the different methods by which each system implemented file discovery. if so, that suggests a Better system which implements either two different file discovery systems or some kind of hybrid approach.

Jason
Veteran Doodler
Posts: 1520
Joined: Fri Jul 18, 2003 12:53 am
Location: Fairfax, VA

Post by Jason »

I do this sort of stuff for a living and I have no idea what you're talking about ...

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Forget what I said for a minute. You do search sort of stuff for a living?

Jason
Veteran Doodler
Posts: 1520
Joined: Fri Jul 18, 2003 12:53 am
Location: Fairfax, VA

Post by Jason »

Part of my current position is to develop solutions for knowledge management. Knowledge management seems to be a catch all phrase right now, but for me it entails a fair amount of trying to find tech and products to perform extraction, categorization, search, and analysis of very disparate data sources.

An a good example would be like on my last project. We were developing a proposal for the army for their world-wide portal. I was helping to prototype and demo two products to take large quantities of documents and categorize them using different taxonomies that could then be searched based upon concepts. It was a fairly nice demo. I have no idea whether it would work in real life, but still a fairly nice demo.

Peijen
Minion to the Exalted Pooh-Bah
Posts: 2790
Joined: Fri Jul 18, 2003 2:28 pm
Location: Irvine, CA

Post by Peijen »

Jason wrote:stuff
Is the demo similar to what I saw at informedia? One of the competiting group was demo their recognition engine, and they show a video clip and their engine narrated "Paul is crossing the street" during the section where a guy crossed the street. My advisor was like "Wow!!!! Look at that, their engine can tell it's Paul by looking a 5'10" guy from the back of his head!! *snicker*"

Jason
Veteran Doodler
Posts: 1520
Joined: Fri Jul 18, 2003 12:53 am
Location: Fairfax, VA

Post by Jason »

Well I'm not doing anything with video or images ... yet. So far it's pretty much just taking a lot of documents and organizing them so that you only have to look through a subset to find something. I guess the next logical step is to do fact extraction on the documents, but I haven't seen anything that does that really well yet. In essence, I'm trying to take unstructured data and give it structure.

Jason
Veteran Doodler
Posts: 1520
Joined: Fri Jul 18, 2003 12:53 am
Location: Fairfax, VA

Post by Jason »

Actually, I think informedia has come a long way. I recently saw a demo where they had segmented a CNN program for about six months and allowed you to search for a particular topic. I saw something made by BBN which was comprable to what informedia was doing and it could do it in foreign languages.

Peijen
Minion to the Exalted Pooh-Bah
Posts: 2790
Joined: Fri Jul 18, 2003 2:28 pm
Location: Irvine, CA

Post by Peijen »

No my point was the data and result were heavily rigged in alot of these demos.

Jason
Veteran Doodler
Posts: 1520
Joined: Fri Jul 18, 2003 12:53 am
Location: Fairfax, VA

Post by Jason »

oh. I don't think the demo was rigged. We took customer (in this case the army) documents and just ran it through third-party software and just changed the names of the topic headings that came out. The results seemed reasonable. I mean not as good as a human would be, but a lot better than just clumping everything together. The search demo could have been considered rigged in the fact that we took a system that was currently working at the NIH to show off the product. It took a fair amount of work to get that system running and also could be considered a special case because doctors and researchers are annal about documentation and referencing.

bob
Poser
Posts: 344
Joined: Fri Jul 18, 2003 1:26 am
Location: p-town, pa
Contact:

Post by bob »

Dwindlehop wrote:Forget what I said for a minute. You do search sort of stuff for a living?
I do too. Yesterday, for example, I searched for movies and downloaded Ocean's Twelve. But it turned out to be Harold and Kumar mislabeled. Damn them!

Post Reply