A few months ago, I was futzin’ around with Shazam to trainspot a particular song regularly played at Washington Wizards games. Don’t ask. I just happened to be killing time at a cafe and a really intriguing track popped up. Whip out the phone, bust out the app, and voilà, Start Shootin’ (Little People’s Americana Remix) is etched in my memory.
PEOPLE! WHY WAS I NOT INFORMED OF THIS TRACK IN A TIMELY FASHION!!
I am a damn sucker for a cut with killer piano and phat beats. The non-remix version works as well, a little more atmospheric and a little less urgent. Too bad Little People’s output has tailed off. Thank goodness both versions are available on Spotify so I can listen to my heart’s content.
Oddly enough, this is exactly the same way I discovered Tosca’s Rondo Acapricio.
Completed Nnedi Okorafor’s novella Binti today. Okorafor writes from an African perspective making Binti’s tale distinctive from the average fare. There are exquisite moments of self-discovery, tension, and terror. I empathized a lot from my own personal experience. And like a lot of good science fiction, the story says a lot more about The Now than The Future.
I also appreciated that there wasn’t a big reveal or shock ending. I was tensing up for a moment, but breathed easy after completing the final page.
Definitely recommended. A tight, efficient journey to other places that is well worth your time.
So O’Reilly Media departed from the online book purchasing business recently. Don’t worry, they’re still going to publish books, and videos, just not run a book selling ecommerce site themselves. Basically, they’ll be outsourcing to Amazon, Google Books, etc.
The fundamental reason seems to be that technical book sales have flatlined relative to the burgeoning all you can rent buffet of books, videos, and training material that is Safari. Last year, I sprung for a lifetime (fingers crossed), discounted, annual subscription to Safari. I’m coming up on the anniversary and will probably let the subscription renew, despite it not being particularly cheap. There is actually a metric crap ton of content that interests me, without even searching too hard. Sort of like a Spotify subscription for nerd stuff.
O’Reilly and I go way back. All the way back to the printed documentation for the X Window System. As a summer intern, I got stuck implementing an X server on the OS/2 platform. Lucky me, but those books were the reference.
When I read the news, I sort of went “hunh” then “enh”. There was some consternation over at Hacker News due to reasonably concerned, anti-DRM holdouts, but I think the eminently respectable Scott Meyers has the right analysis:
My guess is that a component of O’Reilly’s no-DRM policy was a hope that it would distinguish O’Reilly from other publishers and would attract buyers who felt strongly about DRM. Whether it did that, I don’t know, but O’Reilly’s decision to stop selling individual products at its web site suggests that DRM (or the lack thereof) is not an important differentiator for most buyers of technical books and videos.
All we can expect in this industry, is change.
I told you BSD Packet Filtering was cool. How cool? So cool that Julia Evans thinks it’s cool. She wrote up some notes on a talk regarding the original BPF paper. When she gets amped about something everybody gets an enthusiastic and accessible explanation.
But BPF has been around for a long time! Now we live in the EXCITING FUTURE which is eBPF. I’d heard about eBPF a bunch before but I felt like this helped me put the pieces together a little better. (i wrote this XDP & eBPF post back in April when I was at netdev)
There’s an easy peasy list of places in Linux where you can hang an eBPF program, plus her post is chock full of links to relevant material.
Brush with greatness. I used to be a barista for Nefeli Caffè on the north side of the Berkeley campus, around the corner from Soda Hall. I worked the closing night shift a lot. Steve McCanne was a regular. No small talk. Really hot, double cappuccino and see ya’.
NSDI is the abbreviation for the USENIX Symposium on Networked Systems Design and Implementation. It’s a highly regarded conference for Systems researchers. I’ve been occasionally scanning the proceedings for 2017, reading a paper here or there.
Some of the folks at the Stanford DAWN project attended the 2017 meeting and wrote up their perspective. Definitely provides a different angle from the way I was looking at the conference proceedings:
A group of us at DAWN went to NSDI last month. The program was quite diverse, spanning a wide variety of sub-areas in the networking and distributed systems space.
We were excited to see some trends in the research presented that meshed well with the DAWN vision.
In bullet points the trends were:
- More support for machine learning
- Video as a data source for analytics
- Embracing the use of hardware accelerators and FPGAs
- Frameworks that exploit fine grained parallelism
- High performance with high programmer productivity
DAWN is shaping up to be an interesting project, in the Berkeley CS tradition of highly collaborative research teams bounded by a a 5 year lifespan. Go figure, given some of the principals involved.
Also, Matei Zaharia and Peter Bailis popped up on an ArchiTECHt podcast, which was pretty informative. Alex Ratner had a related discussion with Ben Lorica on generating training data with limited resources.
Finished reading Chris Guillebeau’s The $100 Startup. Here’s the premise:
In The $100 Startup, Chris Guillebeau shows you how to lead a life of adventure, meaning and purpose — and earn a good living.
… In preparing to write this book, Chris identified 1,500 individuals who have built businesses earning $50,000 or more from a modest investment (in many cases, $100 or less), and from that group he’s chosen to focus on the 50 most intriguing case studies. In nearly all cases, people with no special skills discovered aspects of their personal passions that could be monetized, and were able to restructure their lives in ways that gave them greater freedom and fulfillment.
I’m hitting a point in my life where I want to think about different ways to generate income and/or have more control over how I spend my time. The book has been sitting in my ToRead pile for a while, so I finally got around to it.
It’s hard to gainsay a New York Times bestselling author, but two thoughts came to mind. One, a careful reader needs to control for survivor bias. The book’s approaches are reasonable and pragmatic, although a sharp businessperson could see them as a tad shallow. Still I have to wonder how many followed the advice and fell flat on their faces, or worse. Second, I personally got a heavy Jerry Maguire vibe. Not as much romance, but a lot of personally inspiring, guy or gal knocked down, get back up on their feet anecdotes.
I’m not widely read enough in this space to say whether The $100 Startup is a standout business book or not. YMMV, but it at least feels like a good starting point.
Bradford Cross is back. Following up on his projections for the AI startup market, he now has a four factor approach to building a business around AI. The factors from the top level? 1. Full stack products 2. Subject matter expertise 3. Proprietary data 4. AI delivers core value.
Read Cross’ in-depth post to get the details. And there’s plenty of detail. Apparently there’s also a YouTube recording of a parallel talk he gave that has some further pertinent details.
The one thing I have in my mind is how much “productionization” (sic) factors in. Cross would probably just roll it into point 1, but it’s pretty clear that at this point, actually building stable production systems around AI is ridiculously difficult. You begin in a place, model design and construction, that has extreme asymmetries in regards to expertise. Then layer on that the basic practices of software engineering aren’t particularly applicable to AI products which in addition consume resources in vastly different fashions than typical applications. There’s no Rails or Django for this stuff. Agile and DevOps aren’t straightforwardly applicable to AI unless you’re Google, MS, Amazon, FB, etc.
Could production AI capabilities be commoditized? Soon? Maybe, maybe not. I wouldn’t hold my breath though.
If you like the computer science research stuff that shows up here, you should subscribe to Adrian Colyer’s The Morning Paper blog.
an interesting/influential/important paper from the world of CS every weekday morning, as selected by Adrian Colyer
While Adrian’s selections tend towards the systems literature, he does foray into other important areas such as security, machine learning, and software engineering. He’s entertainingly eclectic.
I have 56 posts from his blog starred in my feed reader (yup, still use one, actually two). A bit of a retrospective look at those starred items could make for some good fodder.
By systems literature, I mean computer architecture, operating systems, networking, programming language design and implementation, databases, and distributed systems. Staples of a Berkeley grad CS education.
This blog has been around so long that it started off as a Movable Type site and ran that way for quite a few years. Then when MT hit a bad patch of little progress, I switched over to WordPress. I thought my conversion had pretty much completely preserved all the old content cleanly, but looking at some older posts recently, I discovered a significant amount of intrablog link rot.
The number of posts that needs fixing isn’t huge, so I’m just working through them by hand. But it’s an interesting look back at older material. I definitely wrote more and more about stuff I was doing. The link + pull quote post has been a staple of recent material. While easy on the generation muscles, I have been somewhat disappointed in myself on that front.
Every now and then, I run across an old gem like “Enthusiastic Making”:
Folks who dismiss various forms of social media as trivial and narcissistic often forgot this aspect. These media provide venues for pent-up creative enthusiasm. Yeah a lot of the results will be poor or even hateful. The act of making makes it easier for many more to care about others. In the aggregate, that’s a great thing.
Pretty much on target, although now I’m less optimistic about social media in general. Bad hombres have discovered our individual and social cognitive biases for to exploit for evil ends.
Anyhoo, I will be endeavoring to get back to more original content in this space. Might take a bit, but it will happen.
Previously, I claimed ignorance of blockchain technology. To help rectify the situation let us return to two of my former stomping grounds.
First, the University of California, Berkeley has a student led blockchain advocacy group appropriately entitled “Blockchain at Berkeley”
We’re a student-run organization at UC Berkeley dedicated to serving the Berkeley and greater East Bay crypto and blockchain communities. Our members include Berkeley students, alumni, community members, and blockchain enthusiasts from all educational and industrial backgrounds.
Our team consists of undergraduates from a variety of backgrounds: Electrical Engineering and Computer Science, Economics, Business, and more.
Looks like they have some great training material
On the other side of the country, MIT ran an interesting blockchain experiment
The objective of the study is to understand the process of diffusion of Bitcoin, a software-based, open-source, peer-to-peer payment system on the MIT campus. Bitcoin is an innovative payment network that allows for instant peer-to-peer transactions with zero or very low processing fees on a worldwide scale.
…The Bitcoin ecosystem currently resembles the state of the Internet in the mid-90s, i.e. many of the applications that will be built on top of it have not been created. This offers a unique experience for the most inventive and entrepreneurial students at MIT, as they will have a chance to experiment and test their ideas within a campus where the diffusion of digital currencies will be years ahead of anywhere else. Essentially, participants will be the first ones to see the opportunities and possibilities the technology opens up.
In the same way that MIT gave students early access to computing resources through the Athena project in 1983, this project intends to give participants early access to a digital currency. The ultimate objective is to place the broader MIT community at the frontier of this new exciting wave of innovation.
The MIT site is heavier on the research papers, especially from a diffusion of innovation perspective.
Had to slow the Princeton roll there, for a moment.
I’ve already noted the interesting TimescaleDB project. Recently, Michael Freedman, a co-founder of the Timescale company and a full professor at Princeton University, did a podcast interview with Ben Lorica. I found it to be quite enjoyable, diving into how and why TimescaleDB came to be:
We initially were developing a platform to collect and store and analyze IoT data, and certainly a lot of IoT data is time-series in nature. We found ourselves struggling. The reason a lot of people adopt NoSQL was they thought it offered scale in the ways that more traditional relational databases did not—yet, they often gave up a lot of the rich query language, optimized complex queries, joins, and an ecosystem that you get in these more traditional relational databases. Customers who were using our platform kept wanting all these ways to query the data, and we couldn’t do it with the existing NoSQL database we were using. It just didn’t support those types of queries.
What I didn’t know is how accomplished a systems researcher is Freedman. His research contributions go all the way back to turn of the millennium peer-to-peer stuff and he was talking “eventual consistency” before people got CAPed. Blame it on my Berkeley blinders.
Criminy! Now even the dang light bulbs can be hacked:
Then they installed 5 smart bulbs in the third floor of an office building and mounted the attack kit onto a DJI drone. Successful takeover was achieved with a fly-by (warflying) and attack code that caused all the lamps to repeatedly signal SOS in Morse code while the drone hovered in front of the building.
Seems like IoT will be the nth full employment act for security professionals ever devised.
Finished up rereading Richard K. Morgan’s Altered Carbon for a third time. My opinion hasn’t changed much since my last read. Still a great noir opening, lots of great action, and an entertaining science fiction premise.
And of course Quellcrist Falconer’s philosophies hold:
The personal, as everyone’s so fucking fond of saying, is political. So if some idiot politician, some power player, tries to execute policies that harm you or those you care about, take it personally. Get angry. The Machinery of Justice will not serve you here – it is slow and cold, and it is theirs, hardware and soft-… If you want justice, you will have to claw it from them. Make it personal. Do as much damage as you can. Get your message across. That way you stand a far better chance of being taken seriously next time. Of being considered dangerous.
After this time around though, I feel trimming the text by about 15 to 20% would have really tightened things up to no ill effect. All of the nemesis encounter stuff between Kovacs, Kadmin, and Kawahara could have been more tightly woven early into the novel.
Still highly recommended.
I don’t know why I was surprised but I recently decided to try my MLB.com Gameday Audio subscription from my laptop. Previously, for the past few years, I have been using the service exclusively from my phone. Hit the website, logged in with my pretty much never used account, and I was streaming play-by-play in 30 seconds at my desk. For some reason, I was under the impression that mobile and desktop were segregated.
Don’t know what I was thinking. MLB Advanced Media are the pros in this space.
More good civic work out of Princeton University
The US National Highway Traffic Safety Administration (NHTSA) is proposing a requirement that every car should broadcast a cleartext message specifying its exact position, speed, and heading ten times per second. In comments filed in April, during the 90-day comment period, we (specifically, Leo Reyzin, Anna Lysyanskaya, Vitaly Shmatikov, Adam Smith, together with the CDT via Joseph Lorenzo Hall and Joseph Jerome) argued that this requirement will result in a significant loss to privacy.
Ya think? SMH.
As advertised, today I attended the opening session of the local area Data Intelligence Conference. Overall, an enjoyable start. The talks I witnessed were good, on target, and eclectic. Not super deep on tech details or science, but generally thought provoking.
I’ve been there for a few meetups, but the Capital One space is even more impressive when in full conference mode. Also have to give the hosts credit for tasty lunch and break vittles. Plus, there were quite a few employees hanging out and chatting, leading to interesting discussion. The hallway track appeared to be vibrant although it’s been a while since I’ve done that heavily.
Due to an unfortunate scheduling snafu, it doesn’t look like I’m going to be able to attend the next two days. I suspect that Saturday will be the premier session, with a number of top notch talks, heavier attendance, and a quality after hours event. Ah, well!
If you have the interest, opportunity, and finances, I’d definitely recommend attending.
I aspire to once in my lifetime deliver a takedown as thorough as this dismantling of a new cryptocurrency from Emin Gün Sirer and Phil Daian:
The quick takeaway is that Bancor can be gamed by miners, and, even if the miners are naive or benevolent, will always trail the real market. It provides no efficiency guarantee during this discovery process, and will likely waste its reserves on market price discovery. You should think twice before you layer a coin on top of Bancor.
Let’s do a quick walk through the red flags we encountered as we read the code and documentation for Bancor tokens, known as BNT.
Followed by 29 distinct issues identified.
I’m not a cryptocurrency nerd, barely understanding how Bitcoin works at the 10,000 foot level. In the interest of fairness and balance, you might want to check out the related Hacker News discussion.
From the pitch:
Of course there is a lot more going on in this novel. The Last Good Man is a fast-paced, high-tech, military thriller that deals with autonomous weapons, big data, A.I., surveillance, remote warfare—and their effects on human relationships. But from the first day that the story truly started to take shape, I knew it would be centered on a woman. Specifically, True Brighton, retired US Army soldier, former helicopter pilot with frontline experience, a forty-nine-year-old mother of three who’s been happily married for three decades, and who is not at all ready to retire.
This speaks to me on so many levels, can’t wait to give it a read.
One additional benefit of using the SMACK stack is your choice of options for adding features or getting support. The LAMP stack had a broad set of commercial champions. The same is true with SMACK today. Each SMACK stack technology have leading commercial entities behind them, that offer supported enterprise products and support. Examples include Lightbend and Databricks for Spark, DataStax for Cassandra, Confluent for Kafka, and Mesosphere for Mesos. So if you’re already an expert to part of the stack or new to all of SMACK, a broad set of options are available.
Not sure I completely buy it. Even though I have deep technical respect for Mesos, Kubernetes seems to be the hot core of open source orchestration. (SK8SACK? Skatesack?). There are at least links to free e-books at the bottom of the article, for the low, low price of contact information, the better to spam you with promotional material.
Saw a post on Hacker News discussing messaging. A comment mentioned Californium. Being the messaging nerd that I am, had to chase the reference. Little did I know there was a whole IETF protocol for machine to machine messaging on resource constrained devices, CoAP:
The Constrained Application Protocol (CoAP) is a specialized web transfer protocol for use with constrained nodes and constrained networks in the Internet of Things. The protocol is designed for machine-to-machine (M2M) applications such as smart energy and building automation.
According to the IETF RFC (7252), CoAP has the following main features:
Web protocol fulfilling M2M requirements in constrained environments
UDP [RFC0768] binding with optional reliability supporting unicast and multicast requests.
Asynchronous message exchanges.
Low header overhead and parsing complexity.
URI and Content-type support.
Simple proxy and caching capabilities.
A stateless HTTP mapping, allowing proxies to be built providing access to CoAP resources via HTTP in a uniform way or for HTTP simple interfaces to be realized alternatively over CoAP.
Security binding to Datagram Transport Layer Security (DTLS) [RFC6347].
HTTPish packets over UDP with optional reliable transport.
Wonder if anybody uses CoAP in practice?
Just because I dig the many cool uses SQLite has been put to:
rqlite is a distributed relational database, which uses SQLite as its storage engine. rqlite uses Raft to achieve consensus across all the instances of the SQLite databases, ensuring that every change made to the system is made to a quorum of SQLite databases, or none at all. It also gracefully handles leader elections, and tolerates failures of machines, including the leader. rqlite is available for Linux, OSX, and Microsoft Windows.
A HotOS 2017 paper (PDF) authored by Maas, Asanović, and Kubiatowicz, hits on some of the systems trends that have piqued my interest. Definitely from a cloud-computing, datacenter perspective, but vertically integrated programming stacks on top of disaggregated resources holds promise. Does this make me look buzzword compliant?
In this paper, we argue that we should rethink how language runtimes are designed for the Cloud 3.0 era. We do this by laying out seven tenets of building language runtimes for the next generation of cloud data centers. We then distill these tenets into a proposal for a shared substrate to underpin these future runtimes.
With a title like “Return of the Runtimes: Rethinking the Language Runtime System for the Cloud 3.0 Era,” how could you go wrong?
Brush with greatness, Asanović and I were in the same incoming grad cohort in Berkeley CS. We even had a class or two together.
Speaking of Princeton, I’m intrigued by the Cornell-Princeton Center for Network Programming
The Center for Network Programming supports research on languages, algorithms, and tools for network programming, and facilitates closer interactions with partners in industry and government.
As a bona fide, card carrying, UC Berkeley computer systems junkie, this warms the cockles of my heart. Especially projects like frenetic. In particular, I sense that the new era of software defined networking (SDN) and network programming languages is a phase change. I may just be an old PLDI nerd but this feels like a big deal. At least it will be a lot of fun!
WebTAP is a public interest and research project at Princeton University looking into how Web entities track users through a variety of techniques. Based upon rigorous research methods they provide the public at large with insights and policy recommendations regarding online privacy.
OpenWPM is the software framework that WebTAP uses to conduct a large scale census of websites:
OpenWPM is a web privacy measurement framework which makes it easy to collect data for privacy studies on a scale of thousands to millions of site. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection, including a proxy, a Firefox extension, and access to Flash cookies. Check out the instrumentation section below for more details.
OpenWPM is the basis of an extensive academic publication (which I need to read).
The cherry on top is collection documentation and archiving of their Web census data. According to a recent blog post on a new notebook wrapper around the Web Census data, they collect 500GB of data on a monthly basis. Juicy!
Great work by computer science and policy researchers on behalf of the greater good.
Love this track:
Thievery Corporation’s Culture of Fear
Security alert on orange It's been on orange since '01, G I mean wassup man, can't a brother get yellow, man Just for like two months or something Goddamn, sick of that
A bit dusty, but I really enjoyed this O’Reilly Bots Podcast discussion between Jon Bruner and Tom Coates. Given the potential for a buzzword laden, hype explosion (voice interfaces, IoT, connected home, bots) the conversation was surprisingly thoughtful.
In this episode of the O’Reilly Bots Podcast, I speak with Tom Coates, co-founder of Thington, a service layer for the Internet of Things. Thington provides a conversational, messaging-like interface for controlling devices like lights and thermostats, but it’s also conversational at a deeper level: its very architecture treats the interactions between different devices like a conversation, allowing devices to make announcements to any other device that cares to listen.
Internet of Things is an area which feels compelling to me, on a number of different tech angles, despite the hype. Still pondering on bots.
Link parkin’: AgensGraph
AgensGraph is a new generation multi-model graph database for the modern complex data environment, which supports the relational and graph data models at the same time. AgensGraph supports ANSI-SQL and openCypher. SQL and Cypher can be integrated into single queries in AgensGraph.
AgensGraph is yet another PostgreSQL derivative. So at this point you could conceivably have one DB engine that has strong relational credibility, hardened geospatial functionality, support for time series via extension, semi-structured document data capabilities, and at least commercially developed graph data support.
Could this be the small to medium data management Rapture? Ha, ha! Only serious.
FPGA stands for Field-Programmable Gate Array (sic). I was struck by this article from The New Stack summarizing ways FPGAs can be incorporated into cloud computing offerings.
The array of gates that make up an FPGA can be programmed to run a specific algorithm, using the combination of logic gates (usually implemented as lookup tables), arithmetic units, digital signal processors (DSPs) to do multiplication, static RAM for temporarily storing the results of those computation and switching blocks that let you control the connections between the programmable blocks. Some FPGAs are essentially systems-on-a-chip (SoC), with CPUs, PCI Express and DMA connections and Ethernet controllers, turning the programmable array into a custom accelerator for the code running on the CPU.
The combination means that FPGAs can offer massive parallelism targeted only for a specific algorithm, and at much lower power compared to a GPU. And unlike an application-specific integrated circuit (ASIC), they can be reprogrammed when you want to change that algorithm (that’s the field-programmable part).
Give the article a read to hear about how Microsoft is using FPGAs to accelerate network packet processing for software defined networking, SDN, applications. FPGAs have always seemed to only be applicable in really niche, vertical applications, but this feels like a relatively broad use case to me. Also, a number of other really important verticals (crypto, *omics) along with potential to join in the AI hype wave would seem to make for a bright FPGA future. The “hardware microservices” portmanteau is a good soundbite.
FPGA programming has always been extremely difficult. I’m surprised that tightly integrated programming stacks haven’t emerged to make this a lot easier, given the relatively high value they would seem to bring. The article does hint at this potential future though. Alternatively, one can look at cloud APIs as eventually becoming the “programming stack” that many developers use to exploit FPGAs.
Feels like a trend to keep an eye on.
Completed Ramez Naam’s Crux today. Crux is the sequel to Nexus, which I read a while back. The basic premise rests on the intersection of nano-computation and cognitive augmentation leading to the emergence of post-human capabilities. Bits of quantum computing, climate change, and other speculative technologies are thrown in to boot. Mayhem ensues. In Crux, there are much heavier political and social dilemmas woven throughout.
Crux maintains the rapid action, technothriller pace of its predecessor. Per usual, the breakneck combat and carnage aren’t really my thing, but the rapid plot advances makes it an easy read. It’s been a while since I read Nexus, but I don’t remember a similarly bewildering number of characters as were eventually put in play in Crux.
Relative to other science fiction that I really love, Crux is lacking those slower, interstitial moments where the author paints out many of the unspectacular details of the world. Being almost all chase and conflict, there’s not much time for reflection on how the Nexus drug plays out in the more quotidian aspects of people’s lives.
Of course Crux ends in a cliffhanger since it’s part of a trilogy. I’ll have to read Apex to see how it all ends, but boy are there a lot of threads to tie off.
While it wouldn’t be my first selection off the bookshelf to read, Crux is definitely not time wasted.
Link parkin’: TimescaleDB
An open-source time-series database optimized for fast ingest and complex queries. Looks, feels, speaks like Postgres.
and “The Last Supper”.
Now you’ve seen the best parts of the entire Alien: Covenant production. None of which appear in the theatrical release.
Use your money to buy a ticket for Guardians of the Galaxy v2.
What is eBPF and why is it useful?
eBPF is a weird Linux kernel technology that powers low-overhead custom analysis tools, which can be run in production to find performance wins that no other tool can. With it, we can pull out millions of new metrics from the kernel and applications, and explore running software like never before. It’s a superpower. It’ll benefit many people on Linux as they’ll add a toolkit of new analysis tools, or use new plugins for deep monitoring. That’s what I’ll show in my Velocity talk: new tools you can use.
There are four other good questions to go along with the above.
I bought a ticket for the Data Intelligence conference:
The 2017 Data Intelligence conference, which will take place in Mclean, Virginia is the first machine learning gathering for the community using and developing machine learning and data intelligence. It is produced and underwritten by NumFOCUS, the 501(c)(3) nonprofit that supports and promotes world-class, innovative, open source scientific computing. Through the Data Intelligence Conference, NumFOCUS advances its mission of growing the international community of open source developers.
To be honest, the event description is a bit buzzword laden for my taste. I think this conference is a substitute for last year’s PyData conference in DC. This one islocal though, mostly over a weekend, and the entrance fee was the right price. Maybe I’m missing them, but DC based technology events that are grassroots and outside of the Federal space are hard to find. So I’m really looking forward to the conference.
Given how hiccupy (sic) this blog has been, I doubt there’s anyone reading who might also be in attendance, but give me a shout if you do actually exist.
Still in early release, Data Science on the Google Cloud Platform, might be a good read:
Valliappa (Lak) Lakshmanan, Technical Lead for Data & ML Professional Services at Google Cloud, is the author of the upcoming O’Reilly Media book “Data Science on the Google Cloud Platform” (now in Early Release). In the following Q&A, Lak describes his reasons for writing this book, its intended readers, what readers will learn and how to think about the practice of data science on Google Cloud Platform (GCP)-based architecture.
The pull quote is from a post about the book over at the Google Cloud Platform blog.
Knocked off Liu Cixin’s The Three Body Problem this weekend, an extremely entertaining tale of initial alien encounter. While the overall plot and literary execution are outstanding, the key factor is that this is a translation from a popular Chinese work. In general, the shift from Western norms is bracing and in particular, the Communist Revolution in China is woven throughout the story to devastating effect. The overall reverence for science, apparent in the text and Liu’s afterword, is also refreshing.
The tale additionally invokes serious consideration into humanity, inhumanity, and the fate of man on Earth.
My only nit is that at the end, the aliens are heavily anthropomorphized, which didn’t work for me. But I acknowledge the wry symmetry that Liu invoked by doing so.
Apologies for the copyediting twitches.
Derrick Harris used to be a GigaOM reporter on the big data beat. When GigaOM went under he moved on to doing media for Mesosphere.
Looks like Harris launched out on his own again in January, doing a combo of blogging, newsletter, and podcasting. All of it can be found at architecht.io. The interviewee lineup on the podcast looks especially good with some high profile names like Eric Brewer, Mike Olson, Jay Kreps, and Julia Austin.
The aim of this article has been to introduce a selection of recent techniques that provide approximate answers to some general questions that often occur in data analysis and manipulation. In all cases, simple alternative approaches can provide exact answers, at the expense of keeping complete information. The examples shown here have illustrated, however, that in many cases the approximate approach can be faster and more space efficient. The use of these methods is growing. Bloom filters are sometimes said to be one of the core technologies that “big data experts” must know. At the very least, it is important to be aware of sketching techniques to test claims that solving a problem a certain way is the only option. Often, fast approximate sketch-based techniques can provide a different tradeoff.
I say “insanely good” because there is some seriously hairy math behind these techniques. Yet Cormode makes the principles easily accessible to a general, admittedly already technically inclined, audience. As a former instructor, this is an article you could give to a bunch of upperclassmen and then spend two good lectures working through details and implications. No mean feat. Plus, these types of data structures are increasingly important to know about.
Pinboard has acquired Delicious. Here’s what you need to know:
If you’re a Pinboard user, nothing will change. Sad!
If you’re a Delicious user, you will have to find another place to save your bookmarks. The site will stay online. but on June 15, I will put Delicious into read-only mode. You won’t be able to save new bookmarks after that date, or use the API.
Not sure if I’m more surprised that del.icio.us is still live or that it went for so cheap.
Simit, A language for computing on sparse systems.
Simit is a new programming language that makes it easy to compute on sparse systems using linear algebra. Simit programs are typically shorter than Matlab programs yet are competitive with hand-optimized code and also run on GPUs.
With Simit you build a graph that describes your sparse system (e.g. a spring system, a mesh or the world wide web). You then compute on the system in two ways: locally or globally. Local computations apply update functions to each vertex or edge of the graph that update local state based on the vertex or the edge and its endpoints. This part of the language is similar to what you find in graph processing framework such as GraphLab and its descendants.