DATA SCIENCE is a twenty-first-century buzzword and umbrella term for many different types of work.1 It essentially refers to any type of work that organizes or analyzes data. With the increase of dataset sizes in neuroscience and biology more broadly, data science approaches certainly have a place in our field.2 When it’s related to genetics or molecular biology, this type of work is typically called bioinformatics. But data science is much, much broader than scientific research: These days, companies and governments are collecting all types of information about almost everything.
In data science, there is a focus on using statistical techniques to make sense of trends in multifaceted datasets and also an emphasis on clean, clever visualizations. The data in data science could be from any source or in any subject—sometimes it is data that a company collects on its employees, in other cases it is data about the incidence of certain words in rap songs.3
Sometimes, the data is important information about a pressing, national healthcare issue.
Ensuring that potent drugs such as opioids get into the right hands and out of the wrong ones is a difficult and important task. Not only is there an issue with patients illegally distributing drugs, but nurses and doctors are also liable as well. The problem of drug diversion—when healthcare professionals take small quantities of drugs for their own purposes—is one that many companies and governments are motivated to solve.4
This may not sound like a data science problem, but many companies are treating it like one.5 There is a lot of data about the flow of prescriptions through a healthcare provider. There is an initial inventory of drugs, the prescriptions for each patient, and a log of delayed and cancelled transactions. Each of these variables helps to build a predictive model of how many drugs should be in the inventory, and they can serve as a signal for drug diversion. If there is less drug in the inventory than expected, concerns might be raised.
Many of the skills that you might learn as a neuroscientist, such as designing experiments, analyzing data, and communicating your results, will overlap quite a bit with the role of a data scientist. Let’s break down the different types of data science to see where there is more or less overlap—if you’re a researcher already, I think you’ll find that the tasks here are not that different from what you already do.6
There are various types of operations in data science, beginning with collecting data and ending with reporting the results of your analysis. Capturing data could take many different forms—it may mean administering surveys, monitoring people’s interactions with technology, or observing behavior. For example, a data scientist at Apple might analyze the way individuals use their watch throughout the day, whereas a data scientist working for Google might ask its employees questions about how they prefer to work.
After data is acquired and entered into a database, it needs to be managed. Data science has also come to embody managing databases, cleaning and wrangling data, and what is often called data architecture. As information becomes more complex and multimodal, we increasingly need people who understand how to build intuitive and efficient databases.
Once data is organized, companies need people who can describe and analyze it. This could be as simple as determining means or percentages—for example, how many people listen to Celine Dion’s “My Heart Will Go On” during any given Tuesday evening? Or it could be much more involved, running a complicated regression to understand the demographics of Celine Dion lovers and whether or not listening to her predicts listening to other artists. Many data science efforts are using a class of approaches collectively called machine learning techniques. Diving into the nitty gritty of these techniques is a bit beyond our scope here, but essentially these approaches allow you to use one dataset to predict trends in a second dataset. These are popular techniques for anything from image recognition to trying to predict what you’ll listen to after Celine Dion.
Lastly, the communication of these results to other people in the company or to the public is very important. Data science teams normally have individuals tasked with writing white papers, giving presentations, or writing public articles about their findings. Often, the outcomes of these projects will cycle back to the data acquisition stage to inform future projects and data collection.
Knowing how to code, at least somewhat, is almost always a prerequisite for a data science job. Much of data science is coded in Python, SQL, or R, so these are good languages to be familiar with before applying to data science positions. It also helps to have done at least one project in one of these languages. If you’d like to tackle some neuroscience data, there are many open-access datasets, such as through the Allen Institute for Brain Science. Or you can tackle a Kaggle challenge.7 Once you have some code to show off, make yourself a GitHub page and host your code there—this is one main way that employers will evaluate your coding abilities.
As mentioned above, machine learning is really popular in data science, at least at the moment. Fortunately, there are many online resources where you can learn these types of approaches. You’ll rarely have to code up your own machine learning analysis from scratch—there are already coding package that do this for you—but you should know the theory behind them and how they work.
Given the sudden need for data scientists, there are also many data science bootcamps that you can apply for. These are usually a couple of months (full time), typically cost about $15,000–$20,000, but pretty much guarantee you a job after graduating. At minimum, they’ll give you the skills you need and connect you with other people in the field who are in data science careers. Some of these bootcamps are online, whereas others (especially in the San Francisco Bay Area or at your local university) are full-time, in-person classes. In addition, there are just a few data science fellowships designed specifically for recent PhD graduates, such as the Data Science Fellowship at Insight.8
Alternatively, some companies offer paid or unpaid internships for new data scientists. Such internships are a clear way to show employers that you know what it’s like to work in an industry setting, and that your expertise can apply to real-world problems.
You don’t need a PhD for a career in data science, but it can give you important training that may be hard to find elsewhere. Being able to ask the right questions of data and be critical of information sources—data maturity, if you will—is something that companies clearly value. According to Kyle Frankovich, a data scientist at Insight:
It’s not just knowing how to program. It’s not just knowing how to apply machine learning. It’s not just knowing how to run stats. I very much value the science aspect of data science, and I think there is this whole thing that you can’t learn through [coding] competitions or online coursework. And that is: you need to know experimental design, you need to know how to ask the right questions, to distrust data sources, data intuition…. The experience of going through a PhD, to be trained to think critically about these things, is difficult—not impossible, but difficult—to replicate.9
If you’re coming from academic research, there are a few things that might be startling about life in industry as a data scientist. You’ll spend more time as a data scientist thinking about stakeholders and product development and less time with your hands on a keyboard. There is a lot of discussion about the costs and benefits of building features before features are actually built.
Often the development of projects will move much faster in industry. In Kyle’s words, “You have a deliverable that needs to be done in two weeks, you work on it until it’s good enough, and then you ship it and move onto the next thing. That kind of cadence is something that many people are not ready for when they transition into industry.”10 For some people, this external pressure to get things done can be really helpful, but others may find it frustrating.
If you are someone who really loves the research side of things and would like to continue to work on a team that feels more academic (e.g., less product driven, doing more open-ended research, going to conferences, etc.), there are data science roles like that. They are rare, but they do exist. Many of those roles can be found at bigger companies that have the time and resources to fund more exploratory projects, but they can exist in smaller companies and startups as well.
Many data scientists are self-made. A formal data science undergraduate degree is a fairly new idea, which means that people in data science come from many different fields, including neuroscience. If you’re interested in coding and communicating data, this could be a great career for you. Here are a few resources if you’d like to learn more:
• A practical overview of analysis skills that are particularly relevant to neuroscience: Eric Nylen and Pascal Wallisch, Neural Data Science: A Primer with MATLAB® and Python (Cambridge, MA: Academic Press, 2017).