image
image
image

Introduction

image

Data is all around us, in everything that we do. Data science is the thing that makes human beings what they are today. I’m not talking about the computer-driven data science that this book is going to introduce you to, but our brain’s ability to see different connections, learn from previous experiences and come to conclusions from facts. This is truer for humans than any other species that have lived on the planet. We humans depend on our brains to survive. Humans have used all of these features to earn out spot in nature. This strategy has worked for all of us for centuries, and I doubt we will be changing anything any time soon.

But the brain is only able to take us so far when we are faced with raw computing. The humans can’t keep up with all of the data that we are able to capture. Therefore, we end up turning to machines to do some of the work: to notice the patterns, come up with connections, and to give the answers to many different questions.

Our constant quest for knowledge is ingrained in our genes. Using computers to do some of the work for us is not, but it is where we are destined to go.

Welcome to the amazing world of data science. While you were looking over the table of contents, you may have noticed the wide variety of topics that is going to be covered in this book. The goal for Data Science from Scratch is to give you enough information about every little section of data science to help you get started. Data science it a big field, so big that it would take thousands of pages to give you every bit of information that makes up data science.

In each chapter, we will cover a different aspect of data science that is interesting.

I sincerely hope that the information in this book will act as a doorway for you into the amazing world of data science.

Roadmap

Chapter one will give you a basic rundown of what data science is. It will go into the importance, the history, and the reasons data science matters so much.

Chapter two will go into everything that you need for data science. This will include the work ethics that are needed to make sure you are successful.

Chapter three will cover the advantages of data science. You will see the reason why so many people love data science.

Chapter four will cover how data science differs from big data, and how the two work together.

Chapter five will go into what a data scientist is and what they do. It will also cover the skills that a person needs to be a good data scientist. It’s important for a data scientist to be inquisitive, ask questions, and make new discoveries.

Chapter six will go into the reasons why a data scientist should be familiar with hacking.

Chapter seven will cover the why data scientists need to know how to code. You will also learn about the most common programming languages that data scientists use.

Chapter eight will talk about how a data scientist works with data, such as munging, cleaning, manipulating, and rescaling.

Chapter nine will go in depth about why using Python programming language is so important for a data scientist.

Chapter ten will look at the differences and similarities between data science, analytics, and machine learning.

Chapter eleven will teach you how to use linear algebra for data science.

Chapter twelve will go into the importance and use of statistics for data science.

Chapter thirteen will explain what decisions trees are and how to use them.

Chapter fourteen will explain what neural networks are and they way they are used.

Chapter fifteen will go into the different scalable data processing frameworks and paradigms, such as hadoop.

Chapter sixteen will cover all the applications of data science, such as process management, marketing, and supply chain management.

Code

Besides the sections in chapter seven where we will look at a few other programming languages, all the rest of code will be written in Python script. Python has been developed and has now become a very well respected and widely used language for the data scientists. So much so that it is pretty much the only language that data scientists use.

Whenever code appears in this book, it will be written in italic and will start and end with quotes. The quotes at the beginning and end should not be used when you type your own code, only use the italicized code. All of the codings will be explained so that you aren’t confused about what it is supposed to do or how it should be used.

As you dive deeper into data science you will find that there are lots of libraries, toolkits, modules, and frameworks that efficiently use some of the most common, and least common, data science techniques and algorithms. If you do end up becoming a data scientist, you will more than likely become intimately connected to NumPy, with pandas, with sci-kit-learn, and with many other libraries. These are all great tools for data science, but they are also ways for people who know nothing about data science to get started.

This book approaches the world of data science from scratch. This means that we will be starting on the ground floor and working our way up to a better understanding of data science so that you understand all of its many aspects.

It’s now time to get started on that journey. Make sure you are ready. It may even help to read the book through once, and then read it through again while working along with it. This will ensure that you fully understand what you’re doing, and not just blindly following along.