NMH: Milestones

My third blogiversary came and went on Jan 18th.

This post is post 1500. Not bad for four year’s worth of work. While I’m not huge into analysis, I’m pretty confident I’ve avoided padding this blog with a lot of personal fluff, status updates, and echo chambering. Despite that (and no comments), I’m still pretty high up on ego searches for my name and media hack.

And finally I completed my first full month with my new employer, Lockheed Martin Corporation (cf Sayonara, Evanston). I work in a small, Arlington, VA based divison of Lockheed Martin Advanced Technology Labs. The gig involves much less basic science than a faculty position, a much bigger business development focus, and more time to concentrate on hacking up proof of concept technologies. And of course a tilt towards the military and intelligence communities. I’m having a lot of fun!!

Northwestern’s been kind enough to provide me a courtesy adjunct appointment, which will expire in August, so you may see some winding down of this blog, and a slow migration to personal hosting, probably with a reduced technical focus. Then again I’ve said that before and not followed through. We’ll see.

Ciao!!

NMH: Many Eyes Thoughts

Obviously I have a great deal of admiration for Many Eyes, both the project and the folks behind it, many of whom I’ve met. Herewith, some minor suggestions and thoughts:

  • Echoing my same complaint about swivel, data without task is a hard sell job. Maybe they need a curator who authors interesting data exploration contests.
  • Having a viz distribution mechanism, ala YouTube video players, would be a major feature upgrade.
  • While discussion near a viz is great, a groundbreaking capability would be an elegant naming mechanism so you could point people to particular points in a visualization. A radical road to follow would take each viz as it’s own little REST space, read only, from which URLs could be passed around. Other than the snapshot capability, there’s really no good way to say, “Look at the viz here!”.
  • Betcha there’s prefuse inside.

Just thinking out loud.

Delbosc: plush

PyLucene is the Rube Goldbergian combination of Python and Java Lucene. It gives you the core guts of a flexible, high performance, full text indexing engine in Python, but isn’t very friendly to work with on an exploratory basis. Benoit Delbosc’s plush is an initial crack at providing a nice interactive command line for PyLucene.

Horman & Butscher:

Link parkin’: WikidPad, a notebook/outliner for Windows. Open source with Python inside!

[Via Brian Carnell]

Gray: Missing At Sea

In my second or third year at Cal, I took the grad database class with Michael Stonebraker. Adam Sah pulled a little stunt for our final exam and invited Jim Gray to show up and sit in the front row. Stonebraker’s reaction was mildly entertaining. Whoever you are, Jim Gray has probably forgotten more about relational database management systems than you ever knew. Except maybe for Hector Garcia-Molina. Ha ha, only serious.

Werner Vogels reports that Jim Gray went sailing recently and hasn’t been heard from. I only had a micro-moment with Gray, but I’m with Vogels praying for his safe return.

Ephemeral Security: Mosquito Lisp

Link parkin’: Mosquito Lisp

Mosquito Lisp is a network-oriented and compact Lisp with strong influence from Scheme. It is available as part of the Mosquito Remote Execution Framework distribution, and there is a Reference Manual.

A quick scan leaves a distinctly Erlangish flavor. Distributed security hacking in Scheme sounds entertaining.

Auer et. al: dbpedia

No matter what you think of Wikipedia’s quality, it’s sort of cool that you can download the entire contents of Wikipedia. That’s a whole lot of human generated text, mostly structured, mostly vetted, that motivated hackers can grovel over.

Enterprising German hackers Sören Auer, Chris Bizer, Richard Cyganiak, Jens Lehmann, and Georgi Kobilarov have put together dbpedia:

dbpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. dbpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

Basically they’ve extracted a couple of decent sized datasets of well structured information nodes from Wikipedia, e.g. music albums, city entries, and put Semantic Web search on top. You get some pretty powerful query capabilities out of this.

I don’t hang with this crowd, but seems to me that Wikipedia snapshots would make great grist for web mining and text mining folks.

Carter, et. al: Iraq Fallen Viz

Shan Carter, Aron Pilhofer, Andy Lehren, and Jeff Damens of The New York Times put together an impressive multimedia interactive on fallen US soldiers in Iraq. This is a nice example of combining interactive visualizations and infographics so that a user can “drill down” into data without having to lose a bunch of context.

A key component is the adjustable/slidable window, which is reminiscent of Oliver Steele’s expialidocio.us. Wonder where Oliver wandered off too?

[Via infoesthetics]

Ippolito: simplejson

JSON is a format increasingly emitted by web services, being fairly lightweight, flexible, and cross-language. Bob Ippolito’s simplejson is a pure Python JSON toolkit:

simplejson is a simple, fast, complete, correct and extensible JSON encoder/decoder for Python 2.3+. It is pure Python code with no dependencies. simplejson 1.5 is a major update that provides better Python 2.5 and Windows compatibility, and two new features that control encoding (indent for pretty-printing, and separators for generating optimally compact JSON)

Good to have in the toolbox!

Holovaty & Kaplan-Moss: Django Book

It’s 2007 and Apress is scheduled to ship a book on Django. But if you can’t wait, you can read beta chapters of the Django Book online.

Greene: Digg Viz

Not a whole lot of meat, but Kate Greene’s article on Digg’s use of visualization to combat gaming indicates that their tools could have some upside. However, the article doesn’t clearly portray anyone from Digg as signing on to this notion. It’s mostly Stamen Design talking about stuff they developed. Interesting numbers on story submissions and number of diggs/votes processed though.

[Via SmartMobs]

Sampson & Clapper: rawdog & curn

LInk parkin’: I was doing my bimonthly search for pieces of the Emacs of Aggregators (TM) and ran across:

rawdog:

rawdog is an RSS Aggregator Without Delusions Of Grandeur. Written in Python, it uses Mark Pilgrim’s feed parser to read RSS 0.9, 1.0, 2.0, CDF and Atom feeds. It runs from cron, collects articles from a number of feeds, and generates a static HTML page listing the newest articles in date order. It supports per-feed customisable update times, and uses ETags, Last-Modified, and gzip compression to minimise network bandwidth usage.

curn:

Unlike many RSS readers, curn does not use a graphical user interface. It is a command-line utility, intended to be run periodically in the background by a command scheduler such as cron(8) (on Unix-like systems) or the Windows Scheduler Service (on Windows).

Rawdog is conveniently written in Python, but curn’s plug-in mechanism looks more polished, taking advantage of Java metaprogramming. If you just want to monitor a pile of feeds and don’t need megascale performance, these might be a good place to start.

CIIR & alias-i: Lemur & LingPipe

Link parkin’: Two text mining toolkits:

Lemur:

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining.

LingPipe:

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Lemur is bleeding edge research grade stuff, while LingPipe is a stable commercial product.

Colburn: Snap! Followup

Rafe Colburn, also on the “-1 Snap Previews” bandwagon, got some interesting commentary on a blog post regarding the issue, including how end users can opt out of the previews and a couple of other strategies for defeating the pesky popups. Apparently, it’s a bit trickier for publishers though.

Also serves to remind me to subscribe to the rc3 webfeed.

[Via Scripting News]

Machhausen: Feed’N Read

Aggregator++: Feed’N Read is an open source, Java/Eclipse based RSS aggregator. Yet another option for potential aggregator hacks. Plus there’s always BlogBridge

Snap: -1 Previews

Along with quite a few other folks, I can’t stand the usage of Snap previews. I have yet to hit one of these where I can actually read the little preview, in essence obfuscating the destination!!

-1 on usability.

Hammersley: MT Atom Support

According to Ben Hammersley et. al.’s book, Hacking Movable Type, MT already supports the Atom publishing protocol. Hold the train on moving to WordPress.

Nolan: EarthSLOT

Digging around for information on Google Earth, I ran across the EarthSLOT project:

EarthSLOT is a collection of 3D GIS and terrain visualization applications designed to allow scientists, resource managers, educators, and the public better understand our planet and the earth science that goes on here.

Our mission at EarthSLOT is to advance earth science and earth science education through the use of on-line 3D terrain visualization and GIS tools. This technology is improving at a rapid pace, as more 3D engines come on-line and more developers begin using them. What seems to be lacking in the community right now is a site that hosts applications from various engines, reviews the technology, and discusses their strengths and weakness in regards to earth science and earth science education. Our goal, therefore, is to serve as this repository and forum for 3D applications that advance earth science and earth science education using any 3D software engines.

Mike Nolan also has a good overview of some 3D geospatial engines, including Google Earth.

Ortega-Ruiz: emacs hacks

Link parkin: minor emacs wizardry, a well written blog of emacs and elisp hacking.

Flickr: Machine Tags

Flickr has a new extension to their tagging mechanism: machine tags. As far as I can see, it’s a minor extension to the tagging syntax, allowing for a colon separated prefix and then keyword/value pair separated by an =, e.g bmd:some=value. All the real action though is in the fact that Flickr’s search API actually leverages the machine tags for more sophisticated queries. Seems like a slow drift towards the semantic web.

Mullenweg: WordPress Atom

Tantalizingly, Matt Mullenweg teases Atom API support in WordPress 2.2. Now that might get me to reconsider WordPress for my next blogging platform.

Miller: MacFUSE Spotlight Folders

Link parkin’: A Macintosh filesystem where accessing folders generates queries to Spotlight, Apple’s desktop search service. Written completely at user level using MacFUSE.

As predicted, there are all sorts of fun tricks that can be pulled with user level file systems and builtin Mac OS X services.

IBM VCL: Many Eyes

Following up on my admiration for their Communication-Minded Visualization manifesto, Martin Wattenberg , Fernanda Viégas, and the rest of their compatriots at IBM’s Visual Communication Lab have launched Many Eyes. Many Eyes is the concrete manifestation of the themes in their manifesto:

Many Eyes is a bet on the power of human visual intelligence to find patterns. Our goal is to “democratize” visualization and to enable a new social kind of data analysis.

Similar to swivel, the site supports the upload of data sets, generation of visualizations from those data sets, and discussion around the visualizations. The main difference is that Many Eyes has quite a bit different set of non-traditional visualizations.

I’m betting there’ll be some interesting research papers to come out of this.

[Via Tim O’Reilly]

Fowler: Dabble DB Plugin API

A few years ago, I pondered the possibility of pluggable web applications, invoking St. Graham to solve the problem. As described by Chad Fowler, looks like Dabble, the Web DB, has taken a good stab at a web app plug-in mechanism. The gist is to plug-out, shipping off plain old CSV text to a URL, and receiving CSV in return. There must be some trickiness in avoiding DOS attacks, intentional or unintentional, but it’s probably a bit easier than dealing with sandboxed code.

[Via Simon Willison]

Kantor: Streampad

From Paul Lamere:

Streampad is a web music player. Its not just a music player that plays in your browser. Its a music player that plays the web.

Streampad can also be used to give you access to your own music collection when you are on the road. Streampad has a little server that runs on your computer that will serve up your music collection so whereever you are you can listen to your home music collection. For those of us that have terrabyte-sized music collections that don’t fit on an iPod or a laptop.

 

Streampad is another example of the universal music player – it lets you play music from any source – helping you to play your music where ever you are (as long as you are connected to the web).

Streampad is currently a one man mission of Dan Kantor, formerly of del.icio.us.

By the by, Lamere’s Duke Listen’s! is a good read.

Frisch: Nostalgy

Alain Frisch’s Nostalgy plug-in makes filing e-mail messages in Thunderbird a snap from the keyboard.

Skrenta: Blogging

Maybe it’s because I’ve been following NU alum Rich Skrenta’s company, Topix.Net, but it just seems to me like he’s been blogging forever. However, he launched another blog in mid December so he could stretch out on some topics, and he’s doing a great job. Well worth the addition to your aggregator.

WWW2007: Workshops

An interesting menu of workshops is being offered as part of this year’s International World Wide Web Conference. Weblogs have moved off onto their own conference, but there’s a second edition of the tagging workshop, along with others on socially constructed knowledge, query log analysis, and IPTV. Also, the conference location is Banff, Canada which I’ve heard is quite scenic.

Kipp & Campbell: DIU Tagging Patterns

The abstract from Margaret Kipp and Grant Campbell’s, “Patterns and Inconsistencies in Collaborative Tagging Systems:An Examination of Tagging Practices (PDF)”

This paper analyzes the tagging patterns exhibited by users of del.icio.us, to assess how collaborative tagging supports and enhances traditional ways of classifying and indexing documents. Using frequency data and co-word analysis matrices analyzed by multi-dimensional scaling, the authors discovered that tagging practices to some extent work in ways that are continuous with conventional indexing. Small numbers of tags tend to emerge by unspoken consensus, and inconsistencies follow several predictable patterns that can easily be anticipated. However, the tags also indicated intriguing practices relating to time and task which suggest the presence of an extra dimension in classification and organization, a dimension which conventional systems are unable to facilitate.

Hmmm, sounds like they might have discovered some of the interesting effects I found when looking at Flickr (PDF), but I’ll have to read the full paper. Maybe, I should look at applying their techniques to my old data.

[Via search engine land]

Yang: Nginx vs. Lighttpd

Nice comparison by Scott Yang of two lighweight http servers, nginx and lighttpd, for when Apache is a bit on the heavy side.

Singh: MacFUSE

Link parkin’: Google hacker Amit Singh releases MacFUSE, allowing you to write file system hacks in user level code on the Macintosh.

Combine with the Mac’s powerful, uniform application scripting and other automation tools for powerful effect, e.g. make blog posting as easy as dropping a file in a folder.

Linden: Findory Sunsetted

Foo. Looks like Findory will be riding off into the sunset. I originally pooh poohed Findory’s core concept but over time came to appreciate its utility, not to mention Greg Linden’s wonderfully blogging about the whole process. Hopefully one day Greg will write up and publish some of his experience and techniques.

Hmmmm, maybe he should have been a bit more ruthless, although we’d all be the worse for it.

Apache: Solr

Neato. Solr is an Apache incubator project that turns Lucene into an enterprise search server. The cool thing about Solr is that it has an exceedingly pleasant RESTful web services api. To quote the FAQ:

Solr itself is a Java Application, but all interaction with Solr is done by POSTing XML messages over HTTP (to index documents) and GETing search results back as XML, or a variety of other formats (JSON, Python, Ruby, etc…)

Cross platform. Flexible, powerful, extensible, full text search. Easy to program aginst. What’s not to like?

PNNL: InfoViz Tech

Link parkin’: Pacific Northwest National Laboratory’s varied visualization technologies.

Baumgart, et. al: OverSim

A talented team of German researchers is developing OverSim, a framework for doing simulations of overlay networks. Overlay network techniques, as canonically exhibited by Gnutella, are a foundation of academic P2P research. Of course you’d like to try out your protocols and applications before you cut something loose on the real Internet.

Mehta: Tagline Generator

Chirag Mehta’s PHP based Tagline Generator is a handy little tool. This should remind folks that you can have tag clouds without necessarily having tagging.

Probably a pretty straightforward Python port.

bard: xmpp4moz

Not just the Jabber messaging protocol embedded in your web browser, but a whole philosophy and suite of applications built on top. Sort of like a radically open source KnowNow before they went all enterprisey on us.

[Via the l. m. orchard del.icio.us feed]

James: RESTful DIU API

Link parkin’: Paul James takes the del.icio.us API and makes it truly RESTful.

Debatty: VJ Culture Book

Since I’ve revealed that I’m a lapsed DJ, some of my noodlings about music and tech might start to make more sense. Looks like I’ll have to pick up Michael Faulkner’s “Audio-Visual Art and VJ Culture” eloquently reviewd by Régine Debatty. I’m particularly interested in the hw/sw combos these folks use, which the book covers, plus you get a bigass 130 minute DVD to boot.

merholz: AskCity Doesn’t Suck

Peter Merholz, aka peterme, demonstrates some of the interesting features of AskCity. AskCity is Ask.com’s local search tool said tool making heavy usage of maps for information presentation. AskCity takes online maps as media artifacts to a new level, presenting an interesting challenge to Google and Yahoo!.