image
image
image

Conclusion

image

You have officially made through this comprehensive guide to data science. There was a lot of information in this book, and I hope that it will help you in a career in data science. The best way to proceed is to read back through this book and practice all of the different guides. Focus on the parts that you find more difficult until they are no longer difficult.

The goal of data science is to help improve the decision-making process for different businesses, which is done by basing decisions on different insights that has been extracted from large sets of data. Data science as a field encompasses a certain set of principles, algorithms, problem definitions, and process for finding useful patterns from a large set of data.

Today, the decision-making process of data science is used in almost every area of modern society. Some of the ways that data science could affect the daily life of humans includes figuring out which ads should be presented online; which friend, movie, and book connections you are shown; which emails end up being sent to your spam folder; what offers you end up getting when you renew cell service; how much your health insurance premiums cost you; the timing and sequencing of traffic lights; how the drugs you could take were designed; and the locations where your city’s police are targeting.

The growth of the data science industry across society has been driven by social media and big data, the quickening of computer power, the huge reduction in computer memory cost, and the creation of powerful methods for data modeling and analysis. All of these factors together mean that it hasn’t ever been easier for businesses to process, gather, and store data. Along with this, innovations and the bigger applications for data science means that the ethics of using the data and individual privacy is an even bigger problem.

When it comes to coding, the best thing you can do is code. Try out all of the different coding languages and get familiar with them all just in case you have to use them. The most commonly used language is Python, so take extra care in learning it.

As you have probably figured out, there is a pretty big debate over which programming language a data scientist should use, especially when you are just learning about data science. There are a lot of people who think that programming language R is the best, which is wrong. There are even a few that think Scala and Java are the best and they are also wrong. Python, in my opinion, is the obvious best option.

If you’re still not convinced, let’s review some of the best features of Python that makes it the best for learning and doing data science:

Python may not be the favorite programming language among all data scientists, but it’s the best to start with. Data scientists could see other programming languages as more pleasant, better-designed, or more fun to code with, and yet they still end up using Python when they start a new data science project.

Similarly, the best way to make sure that you understand mathematics is by practicing math. Data science, and this book is not inherently math based so you may not have to worry about “doing math.” That said, you can’t go into a data science career without knowing something about math, especially linear algebra, statistics, and probability. This means that in the areas that are appropriate you need to really dive into different mathematical equations, mathematical axioms, and mathematical intuition. Try not to let the math involved scare you away from data science because it really is only a small part of it.

While math and statistics may seem boring, they really are amazing tools, especially when it comes to data science. Statistics can be used to help explain things from the idiocy of participating in the lottery to DNA testing. Statistics can be used to identify the factors that are associated with things like heart disease and cancer. They can also help people spot cheating on standardized tests. Statistics can also be used to help you win game shows.

Besides learning the information found in this book, it’s important that you realize the future of data science. Jobs in data science have grown from 2.3 million in 2015 to 2.9 million in 2018.

The combination of cheap and fast computation with statistical methods has allowed for lots of new methods such as machine learning. This doesn’t even consider the cheaper and more reliable ways we have to store data today. That means we’re storing even more of it. This is why businesses want to find statisticians who can code or programmers that understand stats, as well as the desire for new tools to help store and process data.

While the tools used for the job could change in the future, the need for these types of people isn’t going to go anywhere. The need for a data scientist isn’t going anywhere. While Python could be a distant memory in ten year (doubt it) that’s just how programming language works. Somebody will come up with something more efficient, and the data scientist will have to learn how to use it.

In the future, we will likely see new sources of data. While the most common datasets normally include clickstream data or sales and purchase data, more data scientists will start asking for sensor-generated data from vehicles, retail environment, manufacturing lines, and offices.

There will be new tools that will make this work easier to do. This can be seen with BI tools and open source libraries in the Python and R communities. There are now algorithms, which you would have had to code from scratch ten years ago, available through “from sklearn.neighbors import LSHForest”.

Lastly, we will see quantitative methods and data science become more distributed throughout several roles instead of only being concentrated in a single department or role.

I hope that through this book I gave you a sense that playing around with data can be fun because it is actually fun.