“With data collection, ‘the sooner, the better’ is always the best answer.” – Marissa Mayer
When it comes to data science, there are three major skill areas that are blended together.
1. Mathematics expertise
2. Technology; hacking skills
3. Business/strategy acumen
Data science is Where they all meet. The central point of mining data and creating a data product is to see the data quantitatively. There are correlations, textures, and dimensions in the data that are seen mathematically. Finding a solution through data becomes a brain teaser of quantitative and heuristic technique. To find a solution in a lot of problems involves coming up with an analytic model that is grounded in hard math. Understanding the mechanics underneath those models is crucial for success.
There is also a big misconception that data science only deals with statistics. While statistics do play an important role, it’s not the only math that is utilized. There are two main branches of statistics: Bayesian and classical statistics. When people start talking about statistics, they are most often talking about classical statistics, but understanding both is extremely helpful.
When you get into more machine learning algorithms and inferential techniques you will lean heavily on linear algebra. One example is that a popular way to find hidden characteristics within a set of data is with SCD, which has its grounding in matrix math and doesn’t have as much to do with classical statistics. Overall, it’s extremely helpful for a data scientist to have a pretty good understanding of mathematics in all areas.
When it comes to hacking, we’re not talking about breaking into other people’s computers. We are talking about the programmer subculture that is known as hacking. This is the ingenuity and creativity of using technical skills to create things and to discover new solutions to old problems.
Why is the skill of hacking important? Mainly because data scientists will use technology to help them gather a large amount of data, and then work with complex algorithms to understand it. This will require tools that tend to be more sophisticated than a spreadsheet. Data scientists must understand how to code, prototype fast solutions, and integrate complex data systems. Some of the most common languages that are associated with data science are SAS, R, Python, and SQL. Some of the less commonly used ones are Julia, Java, and Scala. But it’s not just having a good understanding of these language fundamentals. A good hacker can creatively work their way through different types of challenges so that they are able to make their code work.
This means that a data science hacker is great at algorithmic thinking, which means that they can break down tough problems and then rework them so that they are solvable. This is crucial because they must work with a lot of algorithmic problems. They need a good mental comprehension of tricky and high-dimensional data control flows. They must be fully clear on how all of the pieces work together to create a cohesive solution.
It’s also important that a data scientist is a tactical business consultant. Since data scientists work closely with data, they can learn things from data that other people can’t. This makes them responsible for translating their observations into shared knowledge and sharing their strategy on how they think the problem should be solved. A data scientist needs to be able to share a clear story. They shouldn’t just throw out data. It needs to be presented in a cohesive discussion of a problem and its solution which uses data insights its basis.
Having business acumen plays just as an important role as having an acumen for algorithms and tech. There must be a clear match between business goals and data science projects. In the end, the value won’t come from the tech, data, and math. It will come from leveraging all this information into valuable results for the business.
Let’s break down the prerequisites for data scientists a bit further.