Chapter 6
In This Chapter
Finding the tools to learn
Teaching yourself what you need to know
Maybe you’ve already been in the IT workforce for a while — and you have the scars to prove it. You feel the waters of change churning. You’re thinking it may be time for a career change, and you hear big data calling your name. But you don’t have the time or money for extra schooling. What you know is that big data is all around you, and you have the aptitude to perform well in a big data role. If this sounds familiar, this chapter is for you!
If you’re going to chart your own path to a big data education, you should have an end goal in mind. Are you working today with business intelligence tools and want to become the next whiz in visualization analytics? Maybe you’re a Java programmer and you want to retool your skills around Hadoop, R, and Cassandra so you can apply for a big data programmer job.
This chapter helps you think about how to identify the technical gaps you may have and build a plan to fill them. It isn’t a cookbook, giving you the exact steps you should follow in order to magically become the perfect candidate for a big data job. That’s because there isn’t a specific set of steps that are guaranteed to get you where you want to go — you’re building a career, not a soufflé.
Changing careers requires a certain amount of risk taking, self-motivation, and not being afraid of what you don’t know. If you already have a background in programming, database management, or business intelligence, learning some new tools is a logical next step. If you have no background in software development, data, or math, your self-education may take more perseverance. But it’s never too late to make a change, so if big data is your goal, stick with it!
There are a host of resources — both formal and informal — that you can sink your teeth into if you want to educate yourself in preparation for a big data job. In this section, I cover the kinds of resources that are available, how you can access them, and what you can expect from them. (Appendix A offers some specific resources worth checking out.)
The first place you should start is with a book. Many books can guide you from theory to hands-on examples. The benefit of a book is that it often can serve as a desktop reference after you’ve become comfortable with a topic. What you should begin with should really depend upon the type of role you’re looking for and the level of exposure you’ve already had to technology.
If your current background is as a business analyst or marketing analyst, you’ll want to begin your research on some foundational topics like the high-level concepts of big data. Chapter 3 covers topics of the Four V’s, but you can find entire books dedicated to volume, variety, veracity, and velocity. You’ll also want to continue with books on big data use cases so that you can see how the application of big data applies to your industry. From there, you’ll be able to understand common analytics tools, which you’re likely familiar with. Chapter 7 and Appendix A cover many of these tools, which are used to tease out insights from structured and unstructured data.
If you’re experienced in programming or database technologies, you’ll want to read up on common modeling tools, languages, and scripting engines. Again, Chapter 7 and Appendix A give an overview of these resources. Make sure you understand common big data use cases, but from an implementation perspective rather than from a pure business standpoint.
If you’ve been a database administrator, take the time to explore the new data models of unstructured data. Find out how to pragmatically model, access, and integrate the new data models with traditional relational database systems. If you haven’t been involved in data warehouse projects, you should have a desk reference of those concepts. You’ll have to bring together traditional relational database models, denormalized data warehouse concepts, and unstructured data all together to provide the backbone of a big data project.
Some people learn by reading; others learn by doing. Online tutorials are extremely impactful — and plentiful — resources for getting started in a technology. There are two main mediums for online tutorials:
The online community is extremely robust, especially for programmers and data analysts. Since the advent of the Internet, the culture of open collaboration has grown from a simple sharing of ideas to full-blown co-development. Co-development is more than just sharing ideas on how to solve problems — it’s a community of people who work together to jointly develop software, usually under a collaborative, open-source license. Appendix A covers some of these open-source resources.
Online communities are great for people wanting to learn new technologies and concepts. Not only will you be able to get help on solving any problems you may have, but you’ll be able to connect to others who have solved similar problems or are on the same journey. You can even use crowdsourcing to co-develop your ideas and allow it to become a full-blown project or even participate in one yourself.
If you feel trapped in the tragic cycle of “How do I get experience if no one will give me a chance?”, you can participate today by contributing code, ideas, or testing to a host of open-source big data projects.
This culture of open knowledge sharing is an amazing tool for advancing all technologies both for commercial and public or free use. Oracle is a great example of this. Oracle boasts to be the world’s largest enterprise software company with the foundation of its massive revenue and profits being centered on the Oracle Database, not an open-source platform by any means. What many people don’t know is that Oracle has been and continues to be a key contributor and tester for core Linux libraries and functions like Libstdc++ and CRFS. If the community can collaborate to move technology forward, both enterprise and the public benefit.
Can anyone test this code? The answer is yes. With open-source software, the source code is also submitted to the public, so any bugs or gaps can be tested by the public. The more eyes on it, the better it becomes.
There are several types of online communities to check out:
Here are some open-source development communities worth checking out:
Membership is open to the public but is tightly controlled by the organizations. They still have the same characteristics as smaller open-source projects found on GitHub, for example, but with a greater degree of support, documentation, and active discussion.
You can’t control the climate of your workplace, but you can control how proactive you are. And the good news is, you’re more important than your workplace. If you have the right attitude and approach, you’re sure to find opportunities — maybe in places where you didn’t even know they existed.
In this section, I explain how to build a pattern for some of the specific technologies, places to learn them, and a sandbox to practice. (A sandbox is a place to run a test.) It goes without saying that this section is just a sample pattern for learning core technologies and where to use them. You can start educating yourself in any area you’ve determined is a gap for you. The important thing is to find a project. There is nothing better than a goal to motivate you. So, pick a problem to solve. Figure out what gaps you have in technology and build a plan to fill the gaps.
This use case is an example of how you can teach yourself big data skills by attacking a project with personal spending data. Suppose you want to do some big data analytics on your personal spending during the past three years to see if there are any predictive indicators or interesting insights you can learn by mashing up personal spending with the weather patterns in your city. Maybe you want to get a little more interesting and see if there are any correlations with your Facebook activity, status changes, or friends posting travel pictures. Who knows? That’s the point: Try to find some interesting patterns. Right now, you don’t know if any patterns exist.
The overall analytical goal is important only insofar as it gives you motivation and an objective. Success isn’t based on whether you get any new insights; instead, you succeed when you’ve taken a step closer to learning the skills you need to land the job you want. If your goal is business analytics, tweak this project to focus on the analytics side and run analysis on an existing and publically available dataset. If you’re trying to learn a particular programming language or data access tool, focus less on the business case and more on the tactical.
Although the following steps focus on learning new programming languages or analytics tools, this is only one pattern for learning, a best practice to help you build your own education plan. This example is not meant to be a tutorial in Python or Tableau. You can easily find more info on particular programs by using the resources mentioned earlier in the chapter.
Take time to journal your experiences — what’s working, what’s frustrating, and what you hope to accomplish while you’re learning. Writing is thinking. Looking back on your writing is an invaluable tool for self-evaluation.
Spend some time defining what technology skills you need to accomplish the project. Using the goals from the preceding section (yours may be different), you’ll be able to figure this out through your research on Python, reading forums, and just trying. If you don’t know what you don’t know, that’s okay. You’ll hit a bump and figure it out.
For this particular project, you’ll need to know
Determine what you don’t know and estimate how much effort it will take to fill in those gaps. Make some notes on where you think you should go for help. You don’t need to look into a crystal ball and try to divine what you don’t know. You’re just estimating the work effort to learn things. You already know at this point that Python is a gap for you. Do your best to estimate how much effort you think it will be to learn it.
Start executing your basic learning plan and go make it happen. For this project, you’ll start off getting basic Python skills. When you’re comfortable with Python, you’ll start making API calls to Facebook to understand how to access data and status changes. At this point, you may feel ready to do some more interesting work, like grabbing picture posts from specific dates that correlate with large credit card purchases you’ve made. You can build your skills along the way until you reach your goals.
The retrospective step is extremely critical and perhaps the most important step in the whole process. The retrospective (borrowed from the Agile software development method, which is a process to do work in small iterative chunks) is a time to look back on the process. Simply put, a retrospective is the exercise of looking back at the endeavor for the purpose of improving future performance. You look not only at what went wrong, but also at what went right, because you want to repeat those successes again.
Begin to evaluate whether your endeavor was successful so that you can improve your performance. Here’s where it gets fun! You’re not looking for a binary answer — you don’t just mark down a success or a failure and you’re done. When teaching students about project success and failure, I tell them to evaluate project success through two lenses: process and outcome. You can evaluate whether your process was successful and whether your outcome was successful.
Process and outcome both have three levers, each of which impacts success. The three process levers are
Here are some questions you can ask to illuminate this:
The three outcome levers are
Probing questions for outcome would include
Look at each of the six levers of success and determine how successful you think you are. Doing this type of activity allows you to reflect on your process and outcome with the purpose of learning what went well and what can be improved. By breaking it down into the six levers of process and outcome, you can home in specifically on what to improve next time rather than just going on your gut. If you’ve been keeping good notes through the process, you’ll be able to do this with relative consistency.
There are four possible states:
Here’s a sample of a process/outcome evaluation. The more you write, the more you’ll learn. I keep it simple here for the sake of illustration.