The end goals of research data science

What are the end goals of research data science?

One of them is being able to explain the world, ideally in a causal fashion. (See: The goal of scientific model building is high explanatory power)

The kind of data scientist we need for this kind of work is different from that of business data science (see: The end goals of business data science).

The context in which research data science operates is one where business processes (and their corresponding business outcomes) are oftentimes not well defined. Thus, defining the ROI is a less straightforward task than it might otherwise be. As such, we need to view research data science as an investment for the future, just like any research organization is.

Also a thing to read: Researchers think mechanistically about the world

The end goals of business data science

What are the end goals of business data science?

I think one of them is automation, being able to capture value from existing business processes by automating out manual, repetitive tasks, so that humans can be free to do other things.

Data science, as applied directly in most business contexts, usually revolves around business goals that are easily-defined in monetary terms, and where well-established processes are available to optimize. Put bluntly, to be able to save on costs in a business process, or to increase the amount of profit extracted from the business process.

Research vs Business Data Science

One of my colleagues (well, strictly speaking my boss' boss) recently crystallized a very important and key idea for my colleagues: the difference between biomedical research data science and tech business data science. I gave his ideas some thought, and decided to pen down what I saw as the biggest similarities and differences.

The goals between the two "forms" of data science are different:

There are issues that I'm seeing in the data science field. Some of the problems I have seen thus far.

And what I think is needed:

The key difference, I think is that The end goals of business data science is about capturing value from existing processes, while The end goals of research data science is about expanding new avenues of value from unknown, un-developed, and un-captured business processes. The latter is and has always been an investment to make; in a well-oiled system, the former likely generates profit that can and should be invested in the latter.

Researchers think mechanistically about the world

What do we mean by "thinking mechanistically"? This refers to being able to think mechanistically through data generating processes. In research data science, this likely requires a deep knowledge of the field that one is applying quantitative methods to, i.e. domain expertise. Though deep domain knowledge is usually correlated with doctoral training or many prior years of work experience, a newcomer can compensate for domain knowledge deficits by demonstrating the skill of being able to learn domain knowledge really quickly, or by having complementary breadth of modelling knowledge that has been honed in a diverse set of settings.

For someone who has not yet had the domain expertise to think mechanistically about a problem, they need to Learn how to learn fast.

The goal of scientific model building is high explanatory power

Why does mechanistic thinking matter? In The end goals of research data science, we are in pursuit of the invariants, i.e. knowledge that stands the test of time. (How our business contexts exploit that knowledge for win-win benefit of society and the business is a matter to discuss another day).

When we build models, particularly of natural systems, predictive power matters only in the context of explanatory power, where we can map phenomena of interest to key parameters in a model. For example, in an Autoregressive Hidden Markov Model, the autoregressive coefficient may correspond to a meaningful properly in our research context.

Being able to look at a natural system and find the most appropriate model for the system is a key skill for winning the trust of the non-quantitative researchers that we serve. (ref: Finding the appropriate model to apply is key)