Wednesday, January 18, 2006

Call for Computer-generated papers

The conference AAAI 2006 will have a workshop on Computational Aesthetics: AI Approaches to Beauty & Happiness. In this workshop, computer-generated papers (1 page) are accepted!

Monday, January 16, 2006

The Nébuloscope

The Nébuloscope is a French tool that makes the bridge between information retrieval and tag clouds. It returns a tag cloud (made of automatically extracted keyphrases) given a query on an index.

Here's the cloud for the query "carnaval de québec":

Friday, January 13, 2006

Reporters Without Borders urges Internet users and bloggers to support its recommendations on freedom of expression.

On 6 January, Reporters Without Borders issued six concrete proposals aimed at ensuring that Internet-sector companies respect free expression when operating in repressive countries. The organisation calls on bloggers and Internet users to sign an online petition in support of this initiative

I prefer to link to their press release than signing yet another online petition..

Wednesday, January 11, 2006

Turn your NLP application into a powerful UIMA annotator

The UIMA framework allows an information processing software to be put in the middle of a powerful and flexible service environment. UIMA manages input (e.g.: batch processing), parallel or distributed execution, logging, communication with other modules and delivery (e.g.: web service).

Here's how you can create a (minimal) UIMA annotator with your NLP software. In this example, I work in Eclipse 3.0:

  1. Define the type you will annotate using a Type System Descriptor File object (File > New > Other > UIMA > Type System Descriptor File). For instance, define a type "MyNamedEntity" if your system extract named entities from text. If you've installed the UIMA-Eclipse pluggin, you can simply fill the "Type System" tab.
  2. Create a class that extends the "JTextAnnotator_ImplBase" class. This is the default implementation for an annotator that process textual data.
  3. Your class must implement at least the "process" method and, basically, this method can be written in 5 parts:

    3.1) Obtain the text to process: String docText = aJCas.getDocumentText();
    3.2) Call your NLP system
    3.3) For each element of information found by your system, create a typed object as defined at step 1, for instance: MyNamedEntity ne = new MyNamedEntity(aJCas);
    3.4) Set the properties for this object, minimally its start and end position in number of chars from the document start. Ex.: ne.setBegin(999);
    3.5) Add the object to UIMA indexes: ne.addToIndexes();

  4. Finally, create an Analysis Engine Descriptor File (File > New > Other > UIMA > Analysis Engine Descriptor File ) that describes your service. If you've installed the UIMA-Eclipse pluggin, you can simply fill the "Type System" and "Capabilities" tabs.

That's all.

An sample implementation of a named-entity annotator can be found on the Balie CVS.

Thursday, January 05, 2006

.NET compliant Python

Let the future begins.

Microsoft releases a distribution of the Python programming language that is is "well integrated" with the rest of the .Net programming framework. The project is codenamed IronPython and is available for download under an open source license.

Wednesday, January 04, 2006


Quaero (the Latin word meaning "to seek") is an European (Franco-German?) initiative aimed at creating a multi-media search engine with a wide range of functionnalities.

The idea seems clearly to create a search platform for the benefit of the european industry that faces (mainly) the US domination. The business model explicitely talks about "Search software licensing".

Among other interesting aspects of Quaero is the partenership with Exalead, a promising search engine company from France. Quaero promises to deliver solution for multimedia search (speech-to-text, OCR, image recognition), multilingual search (cross-language IR, translation) and automatic annotation (named-entity recognition, categorization).