Monday, August 29, 2005

Unstructured Information Management Architecture SDK

IBM' UIMA technology seems to be a HUGE text*-processing toolkit...

* also audio, video, images..

Update: UIMA seems NOT to include any NLP capability. It is rather a framework that allows cascading moldules, crunching corpus and logging anything in between...

from the website:

Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, etc.) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies.

This technology, the UIMA SDK (Software Development Kit), is an all-JavaTM implementation of the UIMA framework, and it supports the implementation, description, composition, and deployment of UIMA components and applications. It also supports the developer with an Eclipse -based development environment that includes a set of tools and utilities for using UIMA.

http://www.alphaworks.ibm.com/tech/uima

Friday, August 19, 2005

Automating Blogger in Java (using Atom API)

Here's how to automate Blogger in Java. It uses the specs of the Atom API:

The first thing to do is to download the excellent HTTPClient by Ronald Tschalär:

http://www.innovation.ch/java/HTTPClient/

Then, automating blogger (or any other Atom-based blog tool) consists in the following lines of Java code (I omited the "try-catch"):


HTTPConnection con = new HTTPConnection("http://www.blogger.com/");
con.addBasicAuthorization("Blogger", "username", "password");
con.removeModule(Class.forName("HTTPClient.CookieModule"));
HTTPResponse rsp = con.Get("atom");


et voilà!

Tuesday, August 16, 2005

CLiNE 2005 - A regional conference on computational linguistics

The online proceedings for CLiNE 2005 are available online:

http://www.crtl.ca/cline05/papers_enfr.htm

Contributions are mainly from Ottawa/Gatineau and Montreal regions.

Wednesday, August 10, 2005

Legit Torrents

iBiblio just opens a Torrent section to share all its legit stuff.

http://torrent.ibiblio.org/index.php

You'll find Linux kernels, Eclipse, Project Gutenberg texts and more!

Friday, August 05, 2005

PhD Life Plan


phd082803s


(from phdcomics)

Start.com

Microsoft incubation project Start.com is a search engine making brilliant use of Ajax (Javascript + DOM manipulations + XML + CSS).

A really interesting feature is the hability to "pin" any search result listing on the home page along with RSS feeds, weather etc.

Worth a look!

Thursday, August 04, 2005

The equation of a successful software company

The equation of a successful software company:

Best Working Conditions → Best Programmers → Best Software → Profit!

Tons of arguments why "if you try to skimp on programmers, you'll make crappy software, and you won't even save that much money" are given in this article.

Tuesday, August 02, 2005

Google: King of Innovation!

Filling a patent - for the common mortal in the common organisation - follows an extraordinary invention. For Google, it seems it is now a boring routine made of simple strategic moves.

Their last move was to fill a patent around RSS advertising. The idea is so obvious and was discussed so early before the application (Dec. 31, 2003) that the story all sounds ridiculous.