I’ve been meaning to link park the Kotlin programming language. In general, I’m just a programming language nerd and when Google promoted Kotlin for official Android programming, the language hit my radar. Via some random Web surfing (people still do that right?) I came across this brief RedMonk overview of why Kotlin is gaining in popularity:
The short version is that Kotlin is a JVM-based language originally released in 2011 by the JetBrains (makers of IntelliJ) team from St Petersburg, Russia. Like Scala, an inspiration for the language, Kotlin is intended to improve on the Java foundations both syntactically and otherwise while trading on that platform’s ubiquity.
I enjoyed Derrick Harris’ interview with the founders of StackRox:
In this episode of the ARCHITECHT Show, StackRox co-founders Sameer Bhalotra and Ali Golshan break down the state of container security and the new technology they have built to solve it. Bhalotra and Golshan have deep histories doing cybersecurity everywhere from startups to Google to the White House, which they draw on to discuss the security threats and opportunities that microservices present, as well as best practices for cybersecurity in general. This week, StackRox emerged from stealth mode after building the product and company for nearly 3 years.
Sameer and Ali had interestingly different backgrounds coming from government and enterprise consulting. From a total nerd perspective, they came across as a skoosh slick in their answers and choreographed handoffs, but I’ll chalk that up to being well-polished founders who’ve been on the fundraising and customer development trail for a while. That’s how you gotta sound to get C-suite types to fork over the cash.
But on the surface there are some neat ideas in the StackRox product. In the same way that networking technology has become disaggregated, microservices architectures have disaggregated applications and allowed for deeper introspection, monitoring, and remediation.
Have to say, I’ve been impressed by the guests that Harris has been able to get for his interviews.
If it happens, I could get into a graphic novel version of Takeshi Kovacs.
Author Richard K. Morgan will bring Altered Carbon, the Philip K. Dick Award-winning novel published by Gollancz in the UK and soon to be adapted as a Netflix television series, to Dynamite Entertainment with all-new, in-continuity stories, exclusively available in the comic book and graphic novel formats.
Heck, this might be enough motivation to sign up for Netflix.
As I’ve said before, there’s been a bit of gardening going on here behind the scenes. This has made me revisit a number of older posts on this here blog.
Circa 2010, I was seriously investigating ways to get mobile data access for a reasonable price. The number of posts regarding the HTC Evo as a potential phone + hotspot combo is impressive. That’s a cute little time capsule of technology.
Not to mention there used to be some company called Palm back then.
Eventually I wound up just getting an iPhone, which at the time only provided 2GB of 3G connectivity per month. Eight years later, with rollover, I usually have 8GB of LTE for two devices for around the same price. Unlimited text messages to boot. The 8GB isn’t particularly impressive, but the rest of the kit vice price is of note.
I’m still on the iPhone (6S Plus), but becoming really intrigued by a top of the line Google Pixel on Google Fi. A friend of mine speaks highly of the Android experience and iOS isn’t providing any level of excitement to me these days.
Times may have changed but technolust never goes away forever!
Google Cloud Platform hosts a number of public datasets:
Public Datasets on Google Cloud Platform makes it easy for users to access and analyze data in the cloud. These datasets are freely hosted and accessible using a variety of data warehouse and analytics software, from open source Apache Spark to cutting edge Google technologies like Google BigQuery and Google Cloud Dataflow. From structured genomic or encyclopedic data to unstructured climate data, Public Datasets provide a playground for those new to big data and data analysis and a powerful repository for skilled researchers. You can also integrate with your application to add valuable insights for your users. Whatever your use case, these datasets are freely available on GCP.
The thing I find surprising is that the Common Crawl web archives aren’t on GCP, especially given Google’s web heritage. Apropos the late, lamented Fairness Doctrine, Common Crawl is hosted on AWS. There was a good, recent GCP Podcast episode with the Public Datasets team that had an e-mail contact. Maybe I’ll fire off a question.
Here be dragons. I know from personal experience but Hynek Schlawack explains why way better than me.
Proper cleanup when terminating your application isn’t less important when it’s running inside of a Docker container. Although it only comes down to making sure signals reach your application and handling them, there’s a bunch of things that can go wrong.
Really, as Hynek says, “Avoid being PID 1.”
A few years ago, I had the pleasure of meeting and chit-chatting with Paco Nathan. Back then he was with DataBricks, but now he’s at O’Reilly Media doing interesting things with Jupyter and learning. I enjoyed couple of his recent presentations. The first on AI inside O’Reilly Media.
And one on a TextRank rewrite in Python.
Yowsa! That slideshare shortcode actually worked. We’ll see how it comes out in the RSS feed
What is Iris?
Iris is designed to help non-expert programmers who understand what kinds of analyses they need to run (for example, creating a logistic regression model, or computing a Mann-Whitney U test) but not how to write the code to accomplish these goals. Iris also allows expert programmers to accomplish data science tasks more quickly.
Iris supports a broad set of functionality available in popular Python scientific libraries such as scipy and scikit-learn, and we intend to open source the system upon release.
And from a deeper explainer:
Iris supports interactive command combination through a conversational model inspired by linguistic theory and programming language interpreters. Our approach allows us to leverage a simple language model to enable complex workflows: for example, allowing you to converse with the system to build a classifier based on a bag-of-words embedding model, or compare the inauguration speeches of Obama and Trump through lexical analyses.
Iris is an academic research project led by Ethan Fast of the Stanford CS department. I’ll be interested to see how far this gets. Conversational agents that are domain specific, vertically integrated with an environment, and targeted at complex activities seem a bit more promising than the low bar tasks industry currently seems to be focusing on (cough, meeting scheduling, cough). Also feels like a “right moment” with Siri, Cortana, Alexa, Slackbots, Twitterbots, Xiaoice, Tay, and friends establishing a beachhead but bigger wins coming down the road.
Better late than never.
Hip Hop, can we get 30,000 RTs for our 30th Anniversary? pic.twitter.com/MVsrl4qbZi— Eric B and Rakim™ (@EricBandRakim) July 8, 2017
“You thought I was doughnut. You tried to glaze me.”
The funny thing about the iconic Paid In Full album is that I always found the album version of Eric B. is President ultra irritating. I was lucky enough to purchase the 12″ single well before the album came out. The single cut didn’t have that annoying grinding sound all over it. It was just the simple beat, Eric B. scratching, and Rakim’s dynamically unique rap style. That’s the real track to me.
30 years!! Damn time flies!
First of all, let me start by saying that literally everybody is doing (or claiming to do) AI in the bay area. AI has inflamed the spirits of pretty much every single software engineer, data scientist, business developer, talent scout, and VC in the greater San Francisco area.
All tools and services presented at the conference embed some form of machine intelligence, and scientists are the new cool kids on the block. Software engineering has probably reached an all-time low in terms of coolness in the bay area, and regarded almost as the “necessary evil” in order to unleash the next AI interface. This is somewhat counter-intuitive, as actually Machine Learning and AI are more like the raisins in raisin bread, as Peter Norvig and Marcos Sponton say.
I like the raisin bread analogy, which means the data platform engineering aspect of building AI products might be seen as a lucrative “dirty job”.
Seriously. How did I not know about this?
Since December 16, 2006 MixesDB is the database for DJ mixes, radio shows and podcasts.
Together with their dates, tracklists, file details and flyers a useful collection of artists, events, clubs, and podcasts is built:
The mixes are added by music lovers from all over the world. Our slogan: We care about correctness because most do not.
We don’t offer any downloads or secret ways to get download links.
Also Why No Padlock? helped me figure out why Chrome wasn’t giving me the prized lock. Which then led to installing the SSL Insecure Content Fixer plugin for WordPress. Now my image URLs are cleaned up automagically.
No thanks to systemd under Ubuntu Linux 16.04, which got itself twisted up and held me back from upgrading to Ubuntu 17.04. Boiled down to moving some arcane config file out of the way to allow a couple hundred odd packages to upgrade. That’s actually where the majority of my time was spent in this exercise.
Now I just have to figure what all the certificate mumbo jumbo actually means.
Traveling in the Kubernetes orbit, I couldn’t help but hear about some new Istio thing. Unfortunately, I didn’t really have time to dig in. Google Cloud Platform Podcast during the commute for the win:
Due to popular demand, this week Francesc and Mark are joined by Product Manager Varun Talwar and Senior Staff Software Engineer Sven Mawson to discuss all things Istio, an open platform to connect, manage, and secure microservices.
This document introduces Istio: an open platform to connect, manage, and secure microservices. Istio provides an easy way to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality.
Istio currently only supports service deployment on Kubernetes, though other environments will be supported in future versions.
Serendipitously, the latest episode of The ArchiTECHt Show podcast featured an interview with the CEO of Buoyant, William Morgan, about Linkerd, which seems to be an alternative product for service meshes. From the Linkerd site:
Linkerd is an open source, scalable service mesh for cloud-native applications.
Linkerd was built to solve the problems we found operating large production systems at companies like Twitter, Yahoo, Google and Microsoft. In our experience, the source of the most complex, surprising, and emergent behavior was usually not the services themselves, but the communication between services. Linkerd addresses these problems not just by controlling the mechanics of this communication but by providing a layer of abstraction on top of it.
Both platforms essentially put a proxy layer between the microservices and the underlying LAN network transport. The GCP Podcast made this crystal clear. Then a bunch of functionality related to distributed services can be factored out of the apps and into the service mesh (e.g., load balancing, retries, circuit breaking). Istio is k8s only at the moment, while Linkerd is friendly with other orchestration tools like Marathon on Mesos.
Once upon a time, I worked on a project that could have really used this technology.
From the GitHub repo
Winton Kafka Streams is a Python implementation of Apache Kafka’s Streams API. It builds on Confluent’s librdkafka (a high performance C library implementing the Kafka protocol) and the Confluent Python Kafka library to achieve this.
The power and simplicity of both Python and Kafka’s Streams API combined opens the streaming model to many more people and applications.
Wasn’t really into using Java to tinker with Kafka Streams, but now I’m intrigued. Wonder if the Python library is feature parallel?
According to DoesMySiteNeedHTTPS.com, yes.
“But my site doesn’t have forms or collect information from users.”
Doesn’t matter. HTTPS protects more than just form data! HTTPS keeps the URLs, headers, and contents of all transferred pages confidential.
Looks like I have some work to do.
Adrian Colyer is taking a well-deserved, short break from the morning paper. He presented a few back pointers to top material from this last “term” of reading. In addition, there’s some ways of translating the scale of differences that occasionally popup in computing into human recognizable scales:
And here’s something a little different which didn’t quite fit in any particular paper review as a fun thought to leave you with for now: developing an intuition for orders of magnitude and some of the numbers you see in CS papers.
Speaking of Python, I never knew Intel had their own custom, performance supercharged version:
The Intel® Distribution for Python* is an easy-to-access, integrated package that delivers faster Python* application performance on modern Intel® platforms. Available for Windows*, Linux* and macOS*.
Those stars are for a link to the varied trademarks. Gotta love those big corporate lawyers. That’s probably also why you have to go through an annoying registration form to download the bundle.
While Jupyter is quite hot these days, it did take a while to emerge. Some folks conflate IPython and Jupyter in casual conversation, but the projects have had distinctly different paths. Karlijn Willems did a deep dive into the differences and even got some feedback and input from the creators:
Today’s blog post intends to illustrate some of the core differences between the two more explicitly, not only starting from the origins of both to explain how the two relate, but also covering some specific features that are either part of one or the other, so that it will be easier for you to make the distinction between the two!
Consider also reading DataCamp’s Definitive Guide to Jupyter Notebook for tips and tricks, best practices, examples, and much more.
There are definitely some interesting twists and turns.
Julia Evans put together an extended blog post on “Linux tracing systems & how they fit together”.
The thing I learned last week that helped me really understand was – you can split linux tracing systems into data sources (where the tracing data comes from), mechanisms for collecting data for those sources (like “ftrace”) and tracing frontends (the tool you actually interact with to collect/analyse data). The overall picture is still kind of fragmented and confusing, but it’s at least a more approachable fragmented/confusing system.
Even better, she made a nice illustrated ’zine to go with.
Wrote a really quick zine out of the linux tracing tools post from yesterday. It’s not super fancy but here it is. It’s 12 pages, there’s a print version & a version to read on your computer as usual.
Been listening to a few episodes of the Datanauts podcast, and as I anticipated, the material is right up my alley. Actually, the hosts Ethan C. Banks and Chris Wahl really impressed me in a discussion of Apache Geode, an in-memory data grid (IMDG). While Banks and Wahl clearly aren’t distributed systems researchers / hackers, they asked exceptionally good, fundamental questions about how Geode fares under various conditions (e.g, network partitions).
So far, so good. I can recommend a subscription in your podcatcher to the Datanauts.
Something interesting is happening over at the University of Chicago.
As part of a plan to greatly increase the scale, scope and impact of computer science research and education across the University community, the University of Chicago has appointed prominent data science scholar Michael Franklin to chair its Department of Computer Science and to serve as senior advisor to the provost on computation and data science.
Ben Y. Zhao is a UC Berkeley CS Division alum, well-regarded systems researcher, and formerly at UC Santa Barbara. He just moved over to the University of Chicago this month.
I am Neubauer Professor of Computer Science at University of Chicago. Prior to joining UChicago, I was a Professor of Computer Science at UC Santa Barbara. My research covers a range of topics from large-distributed networks and systems, HCI, security and privacy, and wireless / mobile systems, mostly from a data-driven perspective. My current projects are focused on three areas: data-driven models of user behavior/interactions, security of online and mobile communities, and wireless systems and protocols. My work targets a range of top conferences, including WWW/IMC, Mobicom/SIGCOMM/NSDI, UsenixSecurity/NDSS/S&P, CHI/CSCW.
Luis Bettencourt, of the Santa Fe Institute, applies techniques from the complex systems community to the study of urban dynamics. He just joined up with the U of C, although he’ll maintain an appointment as external faculty to SFI.
Luis M. Bettencourt, a leading researcher in urban science and complex systems, has been appointed the inaugural Pritzker Director of the Mansueto Institute for Urban Innovation at the University of Chicago.
… In his research, Bettencourt uses the growing availability of data worldwide on topics ranging from transportation to housing to understand cities in quantitative and predictive ways. He is dedicated to creating new urban theory to explain how cities thrive and the challenges they face, based on the integration of ideas from urban disciplines such as geography, economics and sociology with methodologies from the natural and computational sciences. He also focuses on understanding the role of innovation and technological change as a driver of economic growth and human development in cities, across the world and throughout history. One of his most influential research projects has helped explain the systematic association between the size of urban areas and higher rates of economic productivity and innovation, as well as higher costs of living and violent crime.
I don’t know if these are coordinated events and I haven’t dug into any other recent appointments. Even if not, this is a kernel of talent that a world class university can build around for incredible outcomes. Also, with Northwestern University targeting a big expansion of the Computer Science program, a nice, metropolitan, bi-polar axis of computing research could emerge.
Yesterday I finished reading William Gibson’s Distrust That Particular Flavor. I’m on record as being a Gibson fanboy and a completist for all of his fiction, to the best of my knowledge. Yet Distrust That Particular Flavor had been sitting on my virtual ToRead pile for quite a long time.
The book is a collection of articles, book introductions, and speeches by Gibson, across a variety of venues: Wired, The New York Times, Time, Book Expo America, etc. As a former Wired subscriber, I was familiar with his style of journalism and had already read many of the articles, admittedly quite a while ago. Gibson self-acknowledges that he’s not really a journalist and is in fact not quite comfortable writing non-fiction. Thus, these snippets are truly of a distinctive flavor.
Overall, these are mostly interesting just as a time capsule of technological and cultural shifts, the most recent dated from the year 2006. Anybody remember AltaVista? There are a couple of standouts like The Road to Oceania but nothing earth shattering.
The collection also provides some insights into Gibson’s thinking as particular novels developed. William Gibson’s Filmless Festival is a useful precursor to Pattern Recognition.
Ultimately, Distrust That Particular Flavor is worthwhile if one is deep into @GreatDismal, Gibson’s handle for his prolific Twitter output. If not, no worries if you skip it.
Greg Linden and I go way back, although very superficially. He linked to a post on my old blog. I was a user of Findory. We may have exchanged a few emails. I still subscribe to his blog’s feed.
Even so, I get a kick out of “knowing” someone who’s had a big impact on the computing industry, as evidenced by Greg, along with collaborators Brent Smith and Jeremy York, receiving a Test of Time award for their research article on Amazon’s early item-based recommendation system.
On its 20th anniversary, the editorial board created its first ever “The Test of Time” award. I’m honored to say they gave it to our 2003 article, “Amazon.com Recommendations: Item-to-Item Collaborative Filtering”, which continues to be accessed, cited, and used in industry and research many years after its original publication.
Their follow-up article is also quite enjoyable. It provides practical insights into actually deploying such a recommendation algorithm, especially as experience has been gained over time. Congratulations!
Feeling sort of blunted this Friday.
Blunted Dummies, “House For All”. Classic house.
I’m thrilled that we have hit an exciting milestone the Kafka community has long been waiting for: we have introduced exactly-once semantics in Apache Kafka in the 0.11 release. In this post, I’d like to tell you what exactly-once semantics mean in Apache Kafka, why it is a hard problem, and how the new idempotence and transactions features in Kafka enable correct exactly-once stream processing using Kafka’s Streams API.
Jay Kreps, one of the creators of Kafka, dove deeply into the technical weeds of how this is achieved:
There is this claim floating around, and everyone seems quite sure it is true without knowing exactly why, that Exactly Once Delivery/Semantics is mathematically impossible. Yet despite this being apparently common knowledge, you rarely see people linking to some kind of proof of this or even a precise definition of what is meant by exactly-once. They link to other things such as the FLP result or the Two Generals problem as evidence, but nothing about exactly once. In distributed systems you can’t talk about something being possible or impossible without describing precisely what the thing is, as well as describing a setting that controls what is possible (asynchronous, semi-synchronous, etc), and a fault-model that describes what bad things can happen.
So is there a way we could define formally define a property like what we want to achieve?
Yes, it turns out that there is just such a property. …
The key things people need to ask are 1) what are the operational semantic definitions; 2) what are the failure modes; and 3) what are the guarantees under failures. Whenever you’re trying to determine if a distributed system claim is true, read the fine print. Closely.
I haven’t had my toes in the Kafka stream much recently but I’m fascinated by the toolkit’s rise in the developer community. Also, I used to be a bit of a messaging system wonk in my previous job. So this bit of news captivates me.
More to come on this…
I enjoyed this Datanauts podcast, “Unikernels Vs. Containers,” which dove into what unikernels are and why they matter. The interview guest, Adam Wick, really had a depth of knowledge from real working experience researching and using unikernel technology. He basically had the most concise explanation of unikernels I’ve heard yet. To paraphrase:
- Build your application for a kernel
- Turn the kernel into a library
- Combine your application and the library
- Throw away the stuff you don’t need in the library
- Congratulations! You’ve got a unikernel!
- Now just launch it on a hypervisor or even bare metal. It should be relatively secure and resource constrained.
I need to listen to that episode again as I was distractedly tuning out for certain segments. Sounded like there was some good discussion of when unikernels are actually a good use case fit.
Also, from examining the archives, the Datanauts Podcast looks right up my alley.
A little stale but an interesting deployment of citywide sensor tech:
The City will be installing smart nodes that can use real-time anonymous sensor data to do things such as direct drivers to open parking spaces, help first responders during emergencies, track carbon emissions and identify intersections that can be improved for pedestrians and cyclists. The information can be used to support San Diego’s “Vision Zero” strategy to eliminate traffic fatalities and severe injuries.
… The anonymous information from the sensors can be used by developers to create apps and software that can benefit the community.
I’m curious how such apps would be built and managed. At least from the building part, feels like a project at the UC Berkeley RISE lab might be applicable:
A critical part of enabling cities to implement their Vision Zero policies – the goal of the current National Transportation Data Challenge – is to be able to generate open, multi-modal travel experience data. While existing datasets use police and hospital reports to provide a comprehensive picture of fatalities and life altering injuries, by their nature, they are sparse and resist use for prediction and prioritization. Further, changes to infrastructure to support Vision Zero policies frequently require balancing competing needs from different constituencies – protected bike lanes, dedicated signals and expanded sidewalks all raise concerns that automobile traffic will be severely impacted.
… The e-mission project in the RISE and BETS labs focuses on building an extensible platform that can instrument the end-to-end multi-modal travel experience at the personal scale, collate it for analysis at the societal scale, and help solve some of the challenges above.
A few months ago, I was futzin’ around with Shazam to trainspot a particular song regularly played at Washington Wizards games. Don’t ask. I just happened to be killing time at a cafe and a really intriguing track popped up. Whip out the phone, bust out the app, and voilà, Start Shootin’ (Little People’s Americana Remix) is etched in my memory.
PEOPLE! WHY WAS I NOT INFORMED OF THIS TRACK IN A TIMELY FASHION!!
I am a damn sucker for a cut with killer piano and phat beats. The non-remix version works as well, a little more atmospheric and a little less urgent. Too bad Little People’s output has tailed off. Thank goodness both versions are available on Spotify so I can listen to my heart’s content.
Oddly enough, this is exactly the same way I discovered Tosca’s Rondo Acapricio.
Completed Nnedi Okorafor’s novella Binti today. Okorafor writes from an African perspective making Binti’s tale distinctive from the average fare. There are exquisite moments of self-discovery, tension, and terror. I empathized a lot from my own personal experience. And like a lot of good science fiction, the story says a lot more about The Now than The Future.
I also appreciated that there wasn’t a big reveal or shock ending. I was tensing up for a moment, but breathed easy after completing the final page.
Definitely recommended. A tight, efficient journey to other places that is well worth your time.
So O’Reilly Media departed from the online book purchasing business recently. Don’t worry, they’re still going to publish books, and videos, just not run a book selling ecommerce site themselves. Basically, they’ll be outsourcing to Amazon, Google Books, etc.
The fundamental reason seems to be that technical book sales have flatlined relative to the burgeoning all you can rent buffet of books, videos, and training material that is Safari. Last year, I sprung for a lifetime (fingers crossed), discounted, annual subscription to Safari. I’m coming up on the anniversary and will probably let the subscription renew, despite it not being particularly cheap. There is actually a metric crap ton of content that interests me, without even searching too hard. Sort of like a Spotify subscription for nerd stuff.
O’Reilly and I go way back. All the way back to the printed documentation for the X Window System. As a summer intern, I got stuck implementing an X server on the OS/2 platform. Lucky me, but those books were the reference.
When I read the news, I sort of went “hunh” then “enh”. There was some consternation over at Hacker News due to reasonably concerned, anti-DRM holdouts, but I think the eminently respectable Scott Meyers has the right analysis:
My guess is that a component of O’Reilly’s no-DRM policy was a hope that it would distinguish O’Reilly from other publishers and would attract buyers who felt strongly about DRM. Whether it did that, I don’t know, but O’Reilly’s decision to stop selling individual products at its web site suggests that DRM (or the lack thereof) is not an important differentiator for most buyers of technical books and videos.
All we can expect in this industry, is change.
I told you BSD Packet Filtering was cool. How cool? So cool that Julia Evans thinks it’s cool. She wrote up some notes on a talk regarding the original BPF paper. When she gets amped about something everybody gets an enthusiastic and accessible explanation.
But BPF has been around for a long time! Now we live in the EXCITING FUTURE which is eBPF. I’d heard about eBPF a bunch before but I felt like this helped me put the pieces together a little better. (i wrote this XDP & eBPF post back in April when I was at netdev)
There’s an easy peasy list of places in Linux where you can hang an eBPF program, plus her post is chock full of links to relevant material.
Brush with greatness. I used to be a barista for Nefeli Caffè on the north side of the Berkeley campus, around the corner from Soda Hall. I worked the closing night shift a lot. Steve McCanne was a regular. No small talk. Really hot, double cappuccino and see ya’.
NSDI is the abbreviation for the USENIX Symposium on Networked Systems Design and Implementation. It’s a highly regarded conference for Systems researchers. I’ve been occasionally scanning the proceedings for 2017, reading a paper here or there.
Some of the folks at the Stanford DAWN project attended the 2017 meeting and wrote up their perspective. Definitely provides a different angle from the way I was looking at the conference proceedings:
A group of us at DAWN went to NSDI last month. The program was quite diverse, spanning a wide variety of sub-areas in the networking and distributed systems space.
We were excited to see some trends in the research presented that meshed well with the DAWN vision.
In bullet points the trends were:
- More support for machine learning
- Video as a data source for analytics
- Embracing the use of hardware accelerators and FPGAs
- Frameworks that exploit fine grained parallelism
- High performance with high programmer productivity
DAWN is shaping up to be an interesting project, in the Berkeley CS tradition of highly collaborative research teams bounded by a a 5 year lifespan. Go figure, given some of the principals involved.
Also, Matei Zaharia and Peter Bailis popped up on an ArchiTECHt podcast, which was pretty informative. Alex Ratner had a related discussion with Ben Lorica on generating training data with limited resources.
Finished reading Chris Guillebeau’s The $100 Startup. Here’s the premise:
In The $100 Startup, Chris Guillebeau shows you how to lead a life of adventure, meaning and purpose — and earn a good living.
… In preparing to write this book, Chris identified 1,500 individuals who have built businesses earning $50,000 or more from a modest investment (in many cases, $100 or less), and from that group he’s chosen to focus on the 50 most intriguing case studies. In nearly all cases, people with no special skills discovered aspects of their personal passions that could be monetized, and were able to restructure their lives in ways that gave them greater freedom and fulfillment.
I’m hitting a point in my life where I want to think about different ways to generate income and/or have more control over how I spend my time. The book has been sitting in my ToRead pile for a while, so I finally got around to it.
It’s hard to gainsay a New York Times bestselling author, but two thoughts came to mind. One, a careful reader needs to control for survivor bias. The book’s approaches are reasonable and pragmatic, although a sharp businessperson could see them as a tad shallow. Still I have to wonder how many followed the advice and fell flat on their faces, or worse. Second, I personally got a heavy Jerry Maguire vibe. Not as much romance, but a lot of personally inspiring, guy or gal knocked down, get back up on their feet anecdotes.
I’m not widely read enough in this space to say whether The $100 Startup is a standout business book or not. YMMV, but it at least feels like a good starting point.
Bradford Cross is back. Following up on his projections for the AI startup market, he now has a four factor approach to building a business around AI. The factors from the top level? 1. Full stack products 2. Subject matter expertise 3. Proprietary data 4. AI delivers core value.
Read Cross’ in-depth post to get the details. And there’s plenty of detail. Apparently there’s also a YouTube recording of a parallel talk he gave that has some further pertinent details.
The one thing I have in my mind is how much “productionization” (sic) factors in. Cross would probably just roll it into point 1, but it’s pretty clear that at this point, actually building stable production systems around AI is ridiculously difficult. You begin in a place, model design and construction, that has extreme asymmetries in regards to expertise. Then layer on that the basic practices of software engineering aren’t particularly applicable to AI products which in addition consume resources in vastly different fashions than typical applications. There’s no Rails or Django for this stuff. Agile and DevOps aren’t straightforwardly applicable to AI unless you’re Google, MS, Amazon, FB, etc.
Could production AI capabilities be commoditized? Soon? Maybe, maybe not. I wouldn’t hold my breath though.
If you like the computer science research stuff that shows up here, you should subscribe to Adrian Colyer’s The Morning Paper blog.
an interesting/influential/important paper from the world of CS every weekday morning, as selected by Adrian Colyer
While Adrian’s selections tend towards the systems literature, he does foray into other important areas such as security, machine learning, and software engineering. He’s entertainingly eclectic.
I have 56 posts from his blog starred in my feed reader (yup, still use one, actually two). A bit of a retrospective look at those starred items could make for some good fodder.
By systems literature, I mean computer architecture, operating systems, networking, programming language design and implementation, databases, and distributed systems. Staples of a Berkeley grad CS education.
This blog has been around so long that it started off as a Movable Type site and ran that way for quite a few years. Then when MT hit a bad patch of little progress, I switched over to WordPress. I thought my conversion had pretty much completely preserved all the old content cleanly, but looking at some older posts recently, I discovered a significant amount of intrablog link rot.
The number of posts that needs fixing isn’t huge, so I’m just working through them by hand. But it’s an interesting look back at older material. I definitely wrote more and more about stuff I was doing. The link + pull quote post has been a staple of recent material. While easy on the generation muscles, I have been somewhat disappointed in myself on that front.
Every now and then, I run across an old gem like “Enthusiastic Making”:
Folks who dismiss various forms of social media as trivial and narcissistic often forgot this aspect. These media provide venues for pent-up creative enthusiasm. Yeah a lot of the results will be poor or even hateful. The act of making makes it easier for many more to care about others. In the aggregate, that’s a great thing.
Pretty much on target, although now I’m less optimistic about social media in general. Bad hombres have discovered our individual and social cognitive biases for to exploit for evil ends.
Anyhoo, I will be endeavoring to get back to more original content in this space. Might take a bit, but it will happen.