home ¦ Archives ¦ Atom ¦ RSS

Maui and Wikipedia Miner

Link parkin’:

maui:

Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles.

Wikipedia Miner:

WikipediaMiner is a toolkit for tapping the rich semantics encoded within Wikipedia.

It makes it easy to integrate Wikipedia’s knowledge into your own applications, by:

  • providing simplified, object-oriented access to Wikipedia’s structure and content.
  • measuring how terms and concepts in Wikipedia are connected to each other.
  • detecting and disambiguating Wikipedia topics when they are mentioned in documents.

Bonus, Chromium Compact Language Detector:

Wonderfully, Google has open-sourced most of Chrome’s source code, including the embedded CLD (Compact Language Detector) library that’s used to detect the language of any UTF-8 encoded content. It looks like CLD was extracted from the language detection library used in Google’s toolbar.

It turns out the CLD part of the Chromium source tree is nicely standalone, so I pulled it out into a new separate Google code project, making it possible to use CLD directly from any C++ code.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.