NMH: Milestones

Posted on: Wed 31 January 2007

My third blogiversary came and went on Jan 18th.

This post is post 1500. Not bad for four year's worth of work. While I'm not huge into analysis, I'm pretty confident I've avoided padding this blog with a lot of personal fluff, status updates, and echo chambering. Despite that (and no comments), I'm still pretty high up on ego searches for my name and media hack.

And finally I completed my first full month with my new employer, Lockheed Martin Corporation (cf Sayonara, Evanston). I work in a small, Arlington, VA based divison of Lockheed Martin Advanced Technology Labs. The gig involves much less basic science than a faculty position, a much bigger business development focus, and more time to concentrate on hacking up proof of concept technologies. And of course a tilt towards the military and intelligence communities. I'm having a lot of fun!!

Northwestern's been kind enough to provide me a courtesy adjunct appointment, which will expire in August, so you may see some winding down of this blog, and a slow migration to personal hosting, probably with a reduced technical focus. Then again I've said that before and not followed through. We'll see.

Ciao!!

NMH: Many Eyes Thoughts

Posted on: Wed 31 January 2007

Obviously I have a great deal of admiration for Many Eyes, both the project and the folks behind it, many of whom I've met. Herewith, some minor suggestions and thoughts:

Echoing my same complaint about swivel, data without task is a hard sell job. Maybe they need a curator who authors interesting data exploration contests.
Having a viz distribution mechanism, ala YouTube video players, would be a major feature upgrade.
While discussion near a viz is great, a groundbreaking capability would be an elegant naming mechanism so you could point people to particular points in a visualization. A radical road to follow would take each viz as it's own little REST space, read only, from which URLs could be passed around. Other than the snapshot capability, there's really no good way to say, "Look at the viz here!".
Betcha there's prefuse inside.

Just thinking out loud.

Delbosc: plush

Posted on: Wed 31 January 2007

PyLucene is the Rube Goldbergian combination of Python and Java Lucene. It gives you the core guts of a flexible, high performance, full text indexing engine in Python, but isn't very friendly to work with on an exploratory basis. Benoit Delbosc's plush is an initial crack at providing a nice interactive command line for PyLucene.

Horman & Butscher:

Posted on: Wed 31 January 2007

Link parkin': WikidPad, a notebook/outliner for Windows. Open source with Python inside!

[Via Brian Carnell]

Gray: Missing At Sea

Posted on: Mon 29 January 2007

In my second or third year at Cal, I took the grad database class with Michael Stonebraker. Adam Sah pulled a little stunt for our final exam and invited Jim Gray to show up and sit in the front row. Stonebraker's reaction was mildly entertaining. Whoever you are, Jim Gray has probably forgotten more about relational database management systems than you ever knew. Except maybe for Hector Garcia-Molina. Ha ha, only serious.

Werner Vogels reports that Jim Gray went sailing recently and hasn't been heard from. I only had a micro-moment with Gray, but I'm with Vogels praying for his safe return.

Ephemeral Security: Mosquito Lisp

Posted on: Mon 29 January 2007

Link parkin': Mosquito Lisp

Mosquito Lisp is a network-oriented and compact Lisp with strong influence from Scheme. It is available as part of the Mosquito Remote Execution Framework distribution, and there is a Reference Manual.

A quick scan leaves a distinctly Erlangish flavor. Distributed security hacking in Scheme sounds entertaining.

Auer et. al: dbpedia

Posted on: Mon 29 January 2007

No matter what you think of Wikipedia's quality, it's sort of cool that you can download the entire contents of Wikipedia. That's a whole lot of human generated text, mostly structured, mostly vetted, that motivated hackers can grovel over.

Enterprising German hackers Sören Auer, Chris Bizer, Richard Cyganiak, Jens Lehmann, and Georgi Kobilarov have put together dbpedia:

dbpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. dbpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

Basically they've extracted a couple of decent sized datasets of well structured information nodes from Wikipedia, e.g. music albums, city entries, and put Semantic Web search on top. You get some pretty powerful query capabilities out of this.

I don't hang with this crowd, but seems to me that Wikipedia snapshots would make great grist for web mining and text mining folks.

Carter, et. al: Iraq Fallen Viz

Posted on: Sun 28 January 2007

Shan Carter, Aron Pilhofer, Andy Lehren, and Jeff Damens of The New York Times put together an impressive multimedia interactive on fallen US soldiers in Iraq. This is a nice example of combining interactive visualizations and infographics so that a user can "drill down" into data without having to lose a bunch of context.

A key component is the adjustable/slidable window, which is reminiscent of Oliver Steele's expialidocio.us. Wonder where Oliver wandered off too?

[Via infoesthetics]

Ippolito: simplejson

Posted on: Sun 28 January 2007

JSON is a format increasingly emitted by web services, being fairly lightweight, flexible, and cross-language. Bob Ippolito's simplejson is a pure Python JSON toolkit:

simplejson is a simple, fast, complete, correct and extensible JSON encoder/decoder for Python 2.3+. It is pure Python code with no dependencies. simplejson 1.5 is a major update that provides better Python 2.5 and Windows compatibility, and two new features that control encoding (indent for pretty-printing, and separators for generating optimally compact JSON)

Good to have in the toolbox!

Holovaty & Kaplan-Moss: Django Book

Posted on: Sun 28 January 2007

It's 2007 and Apress is scheduled to ship a book on Django. But if you can't wait, you can read beta chapters of the Django Book online.

Greene: Digg Viz

Posted on: Sun 28 January 2007

Not a whole lot of meat, but Kate Greene's article on Digg's use of visualization to combat gaming indicates that their tools could have some upside. However, the article doesn't clearly portray anyone from Digg as signing on to this notion. It's mostly Stamen Design talking about stuff they developed. Interesting numbers on story submissions and number of diggs/votes processed though.

[Via SmartMobs]

Sampson & Clapper: rawdog & curn

Posted on: Fri 26 January 2007

LInk parkin': I was doing my bimonthly search for pieces of the Emacs of Aggregators (TM) and ran across:

rawdog:

rawdog is an RSS Aggregator Without Delusions Of Grandeur. Written in Python, it uses Mark Pilgrim's feed parser to read RSS 0.9, 1.0, 2.0, CDF and Atom feeds. It runs from cron, collects articles from a number of feeds, and generates a static HTML page listing the newest articles in date order. It supports per-feed customisable update times, and uses ETags, Last-Modified, and gzip compression to minimise network bandwidth usage.

curn:

Unlike many RSS readers, curn does not use a graphical user interface. It is a command-line utility, intended to be run periodically in the background by a command scheduler such as cron(8) (on Unix-like systems) or the Windows Scheduler Service (on Windows).

Rawdog is conveniently written in Python, but curn's plug-in mechanism looks more polished, taking advantage of Java metaprogramming. If you just want to monitor a pile of feeds and don't need megascale performance, these might be a good place to start.

CIIR & alias-i: Lemur & LingPipe

Posted on: Fri 26 January 2007

Link parkin': Two text mining toolkits:

Lemur:

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining.

LingPipe:

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Lemur is bleeding edge research grade stuff, while LingPipe is a stable commercial product.

Colburn: Snap! Followup

Posted on: Fri 26 January 2007

Rafe Colburn, also on the "-1 Snap Previews" bandwagon, got some interesting commentary on a blog post regarding the issue, including how end users can opt out of the previews and a couple of other strategies for defeating the pesky popups. Apparently, it's a bit trickier for publishers though.

Also serves to remind me to subscribe to the rc3 webfeed.

[Via Scripting News]

Machhausen: Feed'N Read

Posted on: Thu 25 January 2007

Aggregator++: Feed'N Read is an open source, Java/Eclipse based RSS aggregator. Yet another option for potential aggregator hacks. Plus there's always BlogBridge

Snap: -1 Previews

Posted on: Thu 25 January 2007

Along with quite a few other folks, I can't stand the usage of Snap previews. I have yet to hit one of these where I can actually read the little preview, in essence obfuscating the destination!!

-1 on usability.

Hammersley: MT Atom Support

Posted on: Thu 25 January 2007

According to Ben Hammersley et. al.'s book, Hacking Movable Type, MT already supports the Atom publishing protocol. Hold the train on moving to WordPress.

Nolan: EarthSLOT

Posted on: Wed 24 January 2007

Digging around for information on Google Earth, I ran across the EarthSLOT project:

EarthSLOT is a collection of 3D GIS and terrain visualization applications designed to allow scientists, resource managers, educators, and the public better understand our planet and the earth science that goes on here.

...

Our mission at EarthSLOT is to advance earth science and earth science education through the use of on-line 3D terrain visualization and GIS tools. This technology is improving at a rapid pace, as more 3D engines come on-line and more developers begin using them. What seems to be lacking in the community right now is a site that hosts applications from various engines, reviews the technology, and discusses their strengths and weakness in regards to earth science and earth science education. Our goal, therefore, is to serve as this repository and forum for 3D applications that advance earth science and earth science education using any 3D software engines.

Mike Nolan also has a good overview of some 3D geospatial engines, including Google Earth.

Ortega-Ruiz: emacs hacks

Posted on: Wed 24 January 2007

Link parkin: minor emacs wizardry, a well written blog of emacs and elisp hacking.

Flickr: Machine Tags

Posted on: Wed 24 January 2007

Flickr has a new extension to their tagging mechanism: machine tags. As far as I can see, it's a minor extension to the tagging syntax, allowing for a colon separated prefix and then keyword/value pair separated by an =, e.g bmd:some=value. All the real action though is in the fact that Flickr's search API actually leverages the machine tags for more sophisticated queries. Seems like a slow drift towards the semantic web.

Mullenweg: WordPress Atom

Posted on: Tue 23 January 2007

Tantalizingly, Matt Mullenweg teases Atom API support in WordPress 2.2. Now that might get me to reconsider WordPress for my next blogging platform.

Miller: MacFUSE Spotlight Folders

Posted on: Tue 23 January 2007

Link parkin': A Macintosh filesystem where accessing folders generates queries to Spotlight, Apple's desktop search service. Written completely at user level using MacFUSE.

As predicted, there are all sorts of fun tricks that can be pulled with user level file systems and builtin Mac OS X services.

IBM VCL: Many Eyes

Posted on: Tue 23 January 2007

Following up on my admiration for their Communication-Minded Visualization manifesto, Martin Wattenberg , Fernanda Viégas, and the rest of their compatriots at IBM's Visual Communication Lab have launched Many Eyes. Many Eyes is the concrete manifestation of the themes in their manifesto:

Many Eyes is a bet on the power of human visual intelligence to find patterns. Our goal is to "democratize" visualization and to enable a new social kind of data analysis.

Similar to swivel, the site supports the upload of data sets, generation of visualizations from those data sets, and discussion around the visualizations. The main difference is that Many Eyes has quite a bit different set of non-traditional visualizations.

I'm betting there'll be some interesting research papers to come out of this.

[Via Tim O'Reilly]

Fowler: Dabble DB Plugin API

Posted on: Mon 22 January 2007

A few years ago, I pondered the possibility of pluggable web applications, invoking St. Graham to solve the problem. As described by Chad Fowler, looks like Dabble, the Web DB, has taken a good stab at a web app plug-in mechanism. The gist is to plug-out, shipping off plain old CSV text to a URL, and receiving CSV in return. There must be some trickiness in avoiding DOS attacks, intentional or unintentional, but it's probably a bit easier than dealing with sandboxed code.

[Via Simon Willison]

Kantor: Streampad

Posted on: Mon 22 January 2007

From Paul Lamere:

Streampad is a web music player. Its not just a music player that plays in your browser. Its a music player that plays the web.

...

Streampad can also be used to give you access to your own music collection when you are on the road. Streampad has a little server that runs on your computer that will serve up your music collection so whereever you are you can listen to your home music collection. For those of us that have terrabyte-sized music collections that don't fit on an iPod or a laptop.

Streampad is another example of the universal music player - it lets you play music from any source - helping you to play your music where ever you are (as long as you are connected to the web).

Streampad is currently a one man mission of Dan Kantor, formerly of del.icio.us.

By the by, Lamere's Duke Listen's! is a good read.

Frisch: Nostalgy

Posted on: Sun 21 January 2007

Alain Frisch's Nostalgy plug-in makes filing e-mail messages in Thunderbird a snap from the keyboard.

Skrenta: Blogging

Posted on: Sun 21 January 2007

Maybe it's because I've been following NU alum Rich Skrenta's company, Topix.Net, but it just seems to me like he's been blogging forever. However, he launched another blog in mid December so he could stretch out on some topics, and he's doing a great job. Well worth the addition to your aggregator.

WWW2007: Workshops

Posted on: Sun 21 January 2007

An interesting menu of workshops is being offered as part of this year's International World Wide Web Conference. Weblogs have moved off onto their own conference, but there's a second edition of the tagging workshop, along with others on socially constructed knowledge, query log analysis, and IPTV. Also, the conference location is Banff, Canada which I've heard is quite scenic.

Kipp & Campbell: DIU Tagging Patterns