When people talk about analytics jobs, they usually have a mental picture of a single job and skill-set. They talk about analysts, or data analysts (in Silicon Valley we call them data scientists). But it’s helpful to categorize data analysts by the kind of job they have. The builders of the analysts’ tools must have the same skill-set, but at a much deeper level.

The first job type is the office worker. Today, every employee is expected to be able to produce some analytics. We all know the basic tools we get from Microsoft, Google or Apple. Proficiency in more specialized tools like Adobe Illustrator, InDesign, Acrobat, FileMaker and Tableau are a plus. The office worker is expected to be able to convert data between formats like CSV and Excel. Workers are typically given assignments like “Prepare a presentation explaining our performance and suggesting how it can be improved.” To do that well, an office worker must produce visualizations – generally graphics from tables using Adobe Illustrator, Microsoft Excel and PowerPoint. By nature, the office workers are domain experts in their daily activities.

The second job type is a data analyst in a traditional company. The all-around data analyst must be proficient in a relational database system like MySQL as well as Excel. The analyst must also have a good understanding of descriptive statistics. A key skill is to be an expert in munging data across applications and file formats; this is also known as data shaping, wrangling, ETL, etc. The required statistical expertise is not deep, but basic A/B testing and Google Analytics experience are required. Presenting and selling the results of an analysis are very important, requiring the ability to be able to do basic data visualization in Excel and Tableau. The data analyst has to have a good understanding of the company’s products and general, well-rounded skills.

The third job type is that of an analyst in a startup company, where a typical assignment might be “please munge our data.” This requires proficiency in the basic tools and the ability to move fast: go for the low-hanging fruits and be able to quickly implement a new analysis or visualization by writing Excel macros, Access programs, or R functions, which in turn requires a good knowledge of the available libraries in Excel, R or Tableau. A startup data analyst must be proficient in the implementation of advanced parsers and creating ad hoc MySQL databases for persistent storage. Basic statistics knowledge, for example, contingency tables and Poisson tests, are also a must. Since a startup does not have historical data, the analyst must be able to do the ground-truthing by themselves. Since a lot of the data may come from social networks, this job type also requires the ability to use linguistics functions to clean up unstructured text and extract useful information.

An analyst in a data company has a completely different job. Here data is the product: “we are data — data is us.” This requires a formal background in mathematics, statistics, machine learning or linguistics (i.e. natural language processing (NLP)). The analyst must be able to discriminate among the various algorithms and understand their parameters. On the bright side, most data is already munged, but the analyst must be able to customize parsers and workflows. Understanding privacy laws is a must, especially the European ones because while the Internet has no borders, laws definitely do and they come with debilitating fines. The analyst in a data company must have a good sense of emerging techniques, like topological data analysis.

The fifth job type is an analyst in an enterprise, where they are members of an established data team with experts in various tools. By “enterprise” we mean a reasonably sized non-data company that is data-driven (to distinguish it from the second job type). The work is about data, but data is often not central to the product. An example is the fourth industrial revolution, or industry 4.0. This analyst is a generalist with broad experience, a jack-of-all-trades. For survival, this analyst must be able to find blind spots where niche roles can be played. It requires heavy experience in munging and aggregating data from all possible sources: SQL and NoSQL, logs, IoT, social networks (Twitter, LinkedIn, Facebook, etc.), news feeds, REST services, data.gov, Google Public Data Explorer, etc.

See the table above for a summary of all these job types and what they require. So what are the skills that really top-notch data analysts will need? There are many, but here are just a few:

  • Tools of the trade: SQL, R, Java, Scala, Python, Spark, MapReduce
  • Basic statistics: distributions, maximum likelihood estimation, statistical tests and regression.
  • Machine learning: k-nearest neighbors and random forests.
  • Linear algebra and multivariate calculus.
  • Data munging: imputation, parsing, formatting; a.k.a. wrangling or shaping.
  • Data visualization and communication: Tableau, ggplot, d3.js
  • Software engineering: logging, performance analysis, REST interfaces and connectors.
  • Curiosity for emerging technologies, like algebraic topology

Thinking like a data scientist: business sense, approximations, and teamwork.