10 November 2011

It's a good day ... for data science!

Despite the lab coat, he's more of an engineer than a scientist.
Kaiser Fung recounts three hours in the day of the life of a "data scientist." The post triggers a few observations.

First, what Fung doesn't mention is that this is actually fun. Screwing around with computers is a perfect example of nonwork, in that it is labor-intensive enough to feel like you're being productive while having no actual value added. (Much like blogging!) But unlike much nonwork (in the real world, examples include answering the phone, answering emails, going to meetings, and so forth), writing code is like solving a whole bunch of logic puzzles all at once. And the frequently (apparently) arbitrary relationship between success and effort makes you feel like a lab rat in one of those experiments that prove that random rewards are more successful at generating effort than rules-based ones.

Second, the term "data scientist"is a little misleading. Just as most mad scientists are actually mad engineers, so too are most data scientists really data engineers, at least day-by-day. (There's nothing wrong with that; engineers get things to work! The software engineers who built Google's search functions are praiseworthy!) But note what Kaiser is doing: he's moving data from X to Y. No hypo testing, just problem-solving.

Third, I'm again reminded of the difference in practice between the life of quants, quals, and squishes. (In classic social science tradition, I'm breaking up the dialectic and calling this progress.) Quants spend their time wrestling with datasets, which is often way harder than quals or squishes believe. Quals spend their time wrestling with cases, which is often much harder than quants or squishes admit. And squishes spend their time figuring out the substrate of reality, which confuse quants and quals who simply assume that problem away.

But at the end of the day, the quant approach is actually more collaborative than quals or squishes admit, and the qual/squish approach is more solitary. Because so many quant problems are engineering in nature, two (or more) heads are better than one--and once a given problem is cracked, the answer is open to everyone immediately. But squishes and quals have to rely on a lot of tacit knowledge. It's very easy for me to consult with someone on a Stata problem. It's very, very hard for me to consult even on a qual topic I know well, like the Nixon administration.

No comments:

Post a Comment