Monday, December 20, 2004

Flaw in Google's New Desktop Search Program

[The Rice University group] was able to create a Java program that makes network connections back to the computer from where it was downloaded and then make it appear as if it were asking for a search at [That is] enough to fool the Google desktop software into providing the user's search information. The program was able to do anything with the results, including transmitting them back to the attacking site.

Friday, December 17, 2004


Ever use Skype?
Lot of signs show that Skype should be the dominant IP telephony technology.
It is the top rated Net phone on CNET.

Ever heard about Jyve?
This is the community for skype users.
And the word "community" is not only a buzz here.
Jyve offers ultra-specialized services for finding people that share interests.
It promise to become a major entry point for Skype and a credible marketplace for businesses.

Friday, December 10, 2004

Google "Suggest"

A quite useful "topic browing" tool..

While you type a query, the most popular variations of your query are proposed.

This is interesting because many researches of late 1990 propose this kind of feature to improve the search efficiency. However, it was often offered as a list of phrases you can only see after performing an initial search. Google expand your query while you type!

Now, can they create the same thing in a multilingual version?

Wednesday, December 01, 2004

How to extract information from text with Balie

Balie ( is a kick start for the extraction of information from multilingual texts.

It identifies the language, tokenizes the text, finds sentence boundaries and guesses part-of-speech.

Here is the Java code for all those steps:

// Load language identification module
LanguageIdentification li = new LanguageIdentification();

// Create a tokenizer for the appropriate language
Tokenizer t = new Tokenizer(li.DetectLanguage(strText), true, true);

// Tokenize the text

// Obtain the token list
TokenList tokenList = t.GetTokenList();

Then, you can loop through the tokenList and extracts key phrases, named entities, semantic roles or use the term frequency information for text classification or text clustering.