Chapter 6

Making Your Own Way (For the Experienced Professional)

In This Chapter

arrow Finding the tools to learn

arrow Teaching yourself what you need to know

Maybe you’ve already been in the IT workforce for a while — and you have the scars to prove it. You feel the waters of change churning. You’re thinking it may be time for a career change, and you hear big data calling your name. But you don’t have the time or money for extra schooling. What you know is that big data is all around you, and you have the aptitude to perform well in a big data role. If this sounds familiar, this chapter is for you!

If you’re going to chart your own path to a big data education, you should have an end goal in mind. Are you working today with business intelligence tools and want to become the next whiz in visualization analytics? Maybe you’re a Java programmer and you want to retool your skills around Hadoop, R, and Cassandra so you can apply for a big data programmer job.

This chapter helps you think about how to identify the technical gaps you may have and build a plan to fill them. It isn’t a cookbook, giving you the exact steps you should follow in order to magically become the perfect candidate for a big data job. That’s because there isn’t a specific set of steps that are guaranteed to get you where you want to go — you’re building a career, not a soufflé.

Changing careers requires a certain amount of risk taking, self-motivation, and not being afraid of what you don’t know. If you already have a background in programming, database management, or business intelligence, learning some new tools is a logical next step. If you have no background in software development, data, or math, your self-education may take more perseverance. But it’s never too late to make a change, so if big data is your goal, stick with it!

Learning on Your Own Time

There are a host of resources — both formal and informal — that you can sink your teeth into if you want to educate yourself in preparation for a big data job. In this section, I cover the kinds of resources that are available, how you can access them, and what you can expect from them. (Appendix A offers some specific resources worth checking out.)

Hitting the books

The first place you should start is with a book. Many books can guide you from theory to hands-on examples. The benefit of a book is that it often can serve as a desktop reference after you’ve become comfortable with a topic. What you should begin with should really depend upon the type of role you’re looking for and the level of exposure you’ve already had to technology.

For the business or marketing analyst

If your current background is as a business analyst or marketing analyst, you’ll want to begin your research on some foundational topics like the high-level concepts of big data. Chapter 3 covers topics of the Four V’s, but you can find entire books dedicated to volume, variety, veracity, and velocity. You’ll also want to continue with books on big data use cases so that you can see how the application of big data applies to your industry. From there, you’ll be able to understand common analytics tools, which you’re likely familiar with. Chapter 7 and Appendix A cover many of these tools, which are used to tease out insights from structured and unstructured data.

For the programmer

If you’re experienced in programming or database technologies, you’ll want to read up on common modeling tools, languages, and scripting engines. Again, Chapter 7 and Appendix A give an overview of these resources. Make sure you understand common big data use cases, but from an implementation perspective rather than from a pure business standpoint.

For the database administrator

If you’ve been a database administrator, take the time to explore the new data models of unstructured data. Find out how to pragmatically model, access, and integrate the new data models with traditional relational database systems. If you haven’t been involved in data warehouse projects, you should have a desk reference of those concepts. You’ll have to bring together traditional relational database models, denormalized data warehouse concepts, and unstructured data all together to provide the backbone of a big data project.

Online tutorials

Some people learn by reading; others learn by doing. Online tutorials are extremely impactful — and plentiful — resources for getting started in a technology. There are two main mediums for online tutorials:

  • Step-by-step guides: A step-by-step guide on the web carefully takes you through hands-on examples. It’s very similar to working through a book, but it can be easier if you have large chunks of code to input and build. Just the other day, my son started out teaching himself Python by using a book. He spent a couple hours entering several hundred lines of code only to be frustrated with correcting syntax errors from his transcription. Although there is a lot of value in coding things by hand from the ground up, it can also slow you down, especially if you’re already familiar with common coding constructs, logic, and data access methods. My son was happy — and frustrated — when he discovered that his book had some online resources and guided tutorials that would enable him to cut and paste his code into his editor.
  • Instructional videos: Videos posted on YouTube or other video-sharing sites can provide a level of detail that some books and online guides cannot provide because you have the chance to actually see someone doing something in real time.

Online communities

The online community is extremely robust, especially for programmers and data analysts. Since the advent of the Internet, the culture of open collaboration has grown from a simple sharing of ideas to full-blown co-development. Co-development is more than just sharing ideas on how to solve problems — it’s a community of people who work together to jointly develop software, usually under a collaborative, open-source license. Appendix A covers some of these open-source resources.

Online communities are great for people wanting to learn new technologies and concepts. Not only will you be able to get help on solving any problems you may have, but you’ll be able to connect to others who have solved similar problems or are on the same journey. You can even use crowdsourcing to co-develop your ideas and allow it to become a full-blown project or even participate in one yourself.

technicalstuff.eps Crowdsourcing (using the community of ideas to develop a great idea) is usually marshaled from an online community instead of employees of a company, but it can be used for commercial purposes. My Starbucks Idea (www.mystarbucksidea.com) is an open community designed to solicit great ideas from customers that Starbucks may turn into a product or service someday. Crowdsourcing has gained in popularity because of two main reasons:

  • Companies found that utilizing outsourced talent who didn’t necessarily expect financial compensation was cost-effective.
  • Companies found that by allowing experts from around the world to solve problems, they could get better and more diverse solutions.

If you feel trapped in the tragic cycle of “How do I get experience if no one will give me a chance?”, you can participate today by contributing code, ideas, or testing to a host of open-source big data projects.

This culture of open knowledge sharing is an amazing tool for advancing all technologies both for commercial and public or free use. Oracle is a great example of this. Oracle boasts to be the world’s largest enterprise software company with the foundation of its massive revenue and profits being centered on the Oracle Database, not an open-source platform by any means. What many people don’t know is that Oracle has been and continues to be a key contributor and tester for core Linux libraries and functions like Libstdc++ and CRFS. If the community can collaborate to move technology forward, both enterprise and the public benefit.

Can anyone test this code? The answer is yes. With open-source software, the source code is also submitted to the public, so any bugs or gaps can be tested by the public. The more eyes on it, the better it becomes.

There are several types of online communities to check out:

  • Boards and forums: If you’ve been programming or using business intelligence tools for more than two years, you’ll likely be familiar with the online forums, the most common and easiest-to-access communities. Forums are where people can post questions, code, and errors on specific topics, and the community of other readers can respond. Sometimes it’s moderated by a leader, and sometimes solely by other readers. The conversation is archived so that future users can explore problems and solutions. When you start to query the Internet with your code errors or problems, you tend to migrate back to the communities that are most active and post helpful responses quickly.
  • Internet relay chat (IRC): IRCs are simply chat servers that transmit text messages back and forth. Although IRC usage has declined during the past several years, there are still more than 500,000 active IRC channels. They’re a great way to get connected with a community of users in real time. For Hadoop, the IRC channel is #hadoop hosted at http://irc.freenode.net.
  • Open-source development communities: These are hosted communities on the Internet categorized by some sort of open-source project. Here someone — either a single individual or a group — posts source code for some software application, and then people contribute to all the coding and testing of that application. End-users of these applications are freely able to download the source code and can do with it what they want, within the confines of the open-source license. This is a wonderful way to get quickly plugged into a community as a project contributor or a tester. You may even want to offer up your own project to the community at some point.

    Here are some open-source development communities worth checking out:

    • GitHub (http://github.com): An online repository that facilitates collaboration among programmers. Projects can be both public and private.
    • Google Developers (http://developers.google.com): An online repository for projects based on Google applications.
    • SourceForge (http://sourceforge.net): An online repository for storing open and free software projects.
  • Software foundations: Software foundations are usually formal nonprofit organizations that started as simple open-source projects that matured over time because of widespread adoption. Classic examples of these are the following:
    • PHP (http://php.net): Home to the one of the predominate web programming languages.
    • The Apache Software Foundation (http://apache.org): Hosts all the Apache open-source projects, including the Hadoop framework.
    • Python (http://python.org): Python is a widely used scripting language with a wide level of adoption. Many big data projects are implemented using Python.

    Membership is open to the public but is tightly controlled by the organizations. They still have the same characteristics as smaller open-source projects found on GitHub, for example, but with a greater degree of support, documentation, and active discussion.

On-the-job training

tip.eps The best way to get a high-value education is to be proactive and find projects that you can learn where you work. For many people, finding their way into projects outside their specific area of expertise can be challenging in the workplace. Two factors affect your ability to find on-the-job training:

  • The climate of the workplace: Does leadership encourage innovation? Are there avenues to submit new ideas? Can you take training — internal or external — to develop your skills?
  • You: How willing are you to go outside your comfort zone to do what it takes to achieve your goals? Are you willing to put the required time in to learn new things? Are you willing to leverage your network colleagues and friends to get involved in new projects while maintaining your current workload?

You can’t control the climate of your workplace, but you can control how proactive you are. And the good news is, you’re more important than your workplace. If you have the right attitude and approach, you’re sure to find opportunities — maybe in places where you didn’t even know they existed.

tip.eps Identify a project at work that you can volunteer for and perhaps get a new position because of it. Or perhaps you can develop something so great it gets you recognized within your company.

Building Your Own Big Data Test Lab

In this section, I explain how to build a pattern for some of the specific technologies, places to learn them, and a sandbox to practice. (A sandbox is a place to run a test.) It goes without saying that this section is just a sample pattern for learning core technologies and where to use them. You can start educating yourself in any area you’ve determined is a gap for you. The important thing is to find a project. There is nothing better than a goal to motivate you. So, pick a problem to solve. Figure out what gaps you have in technology and build a plan to fill the gaps.

remember.eps The best way to learn is to do.

This use case is an example of how you can teach yourself big data skills by attacking a project with personal spending data. Suppose you want to do some big data analytics on your personal spending during the past three years to see if there are any predictive indicators or interesting insights you can learn by mashing up personal spending with the weather patterns in your city. Maybe you want to get a little more interesting and see if there are any correlations with your Facebook activity, status changes, or friends posting travel pictures. Who knows? That’s the point: Try to find some interesting patterns. Right now, you don’t know if any patterns exist.

remember.eps You’re trying to build some big data skills. At this point, you have enough knowledge of the required technologies needed to execute this project. What you don’t have are the skills to make those technologies work. You’ll learn these skills by doing.

The overall analytical goal is important only insofar as it gives you motivation and an objective. Success isn’t based on whether you get any new insights; instead, you succeed when you’ve taken a step closer to learning the skills you need to land the job you want. If your goal is business analytics, tweak this project to focus on the analytics side and run analysis on an existing and publically available dataset. If you’re trying to learn a particular programming language or data access tool, focus less on the business case and more on the tactical.

Although the following steps focus on learning new programming languages or analytics tools, this is only one pattern for learning, a best practice to help you build your own education plan. This example is not meant to be a tutorial in Python or Tableau. You can easily find more info on particular programs by using the resources mentioned earlier in the chapter.

tip.eps To execute this project, you need to create a project notebook. It can be digital (using Evernote, OneNote, or Notepad), or you can just use good old-fashioned paper. Project notebooks serve two very important purposes:

  • They’re pragmatic. Notebooks are the place for you to document ideas, learning plans, technical notes, and results.
  • They’re reviewable. You’ll find huge retrospective value (more on this later) in looking back at your progress.

Take time to journal your experiences — what’s working, what’s frustrating, and what you hope to accomplish while you’re learning. Writing is thinking. Looking back on your writing is an invaluable tool for self-evaluation.

Step 1: Define your goals

tip.eps Spend some time writing down your goals. Studies show that people who articulate their goals in writing are much more likely to accomplish them. Be specific. Here are some examples:

  • Get comfortable with basic Python.
  • Get hands-on experience with big data using social media data. Learn how to grab data from Facebook with Python.
  • Learn a visualization tool (see Chapter 7) to combine personal spending with Facebook data.
  • Complete this project within two weeks by working during the evenings and on weekends.
  • Spend less than $100.

Step 2: Take a skills inventory

Spend some time defining what technology skills you need to accomplish the project. Using the goals from the preceding section (yours may be different), you’ll be able to figure this out through your research on Python, reading forums, and just trying. If you don’t know what you don’t know, that’s okay. You’ll hit a bump and figure it out.

For this particular project, you’ll need to know

  • Basic Python.
  • Database skills, such as MySQL and Excel as a data source.
  • Tableau, a business intelligence and analytics software program.
  • Facebook application programming interfaces (APIs), which are access points that allow two applications — the one you’re building and Facebook — to communicate with each other.

Step 3: Mind the gap

Determine what you don’t know and estimate how much effort it will take to fill in those gaps. Make some notes on where you think you should go for help. You don’t need to look into a crystal ball and try to divine what you don’t know. You’re just estimating the work effort to learn things. You already know at this point that Python is a gap for you. Do your best to estimate how much effort you think it will be to learn it.

Step 4: Acquire knowledge

Start executing your basic learning plan and go make it happen. For this project, you’ll start off getting basic Python skills. When you’re comfortable with Python, you’ll start making API calls to Facebook to understand how to access data and status changes. At this point, you may feel ready to do some more interesting work, like grabbing picture posts from specific dates that correlate with large credit card purchases you’ve made. You can build your skills along the way until you reach your goals.

Step 5: Look back

The retrospective step is extremely critical and perhaps the most important step in the whole process. The retrospective (borrowed from the Agile software development method, which is a process to do work in small iterative chunks) is a time to look back on the process. Simply put, a retrospective is the exercise of looking back at the endeavor for the purpose of improving future performance. You look not only at what went wrong, but also at what went right, because you want to repeat those successes again.

warning.eps Do not skip this step. Just because you don’t have a customer doesn’t mean you don’t need to do something this formal.

Begin to evaluate whether your endeavor was successful so that you can improve your performance. Here’s where it gets fun! You’re not looking for a binary answer — you don’t just mark down a success or a failure and you’re done. When teaching students about project success and failure, I tell them to evaluate project success through two lenses: process and outcome. You can evaluate whether your process was successful and whether your outcome was successful.

remember.eps You’re trying to get a job, so treat this retrospective part of the exercise the way you would any project you’d do for a client or your boss at work.

Process and outcome both have three levers, each of which impacts success. The three process levers are

  • Tools: Were the tools of learning effective for you?
  • Time: Did you accomplish this project in the time you expected?
  • Effort: Was your level of effort to learn greater or lesser than what you anticipated? This isn’t a measure of how hard something was to learn, but whether the difficulty met your expectations.

Here are some questions you can ask to illuminate this:

  • Were you able to easily find resources?
  • Where those resources easy for you to comprehend?
  • Did you make the time to learn?
  • Did you meet your time goals?
  • Did you effectively use tutorials to learn new skills?

The three outcome levers are

  • Knowledge: Do you now possess the knowledge you set out to learn?
  • Learning objectives: Did you accomplish your learning goals? For example, did you get hands-on experience with Python?
  • Value: Is what you learned relevant to finding a job?

Probing questions for outcome would include

  • Did I learn the programming languages and software that I needed to (for example, Python)?
  • Can I do this again with a different set of data with less effort?
  • If I talked to my boss about this project, would she let me try something at work?

Look at each of the six levers of success and determine how successful you think you are. Doing this type of activity allows you to reflect on your process and outcome with the purpose of learning what went well and what can be improved. By breaking it down into the six levers of process and outcome, you can home in specifically on what to improve next time rather than just going on your gut. If you’ve been keeping good notes through the process, you’ll be able to do this with relative consistency.

There are four possible states:

  • Success/Success: Both process and outcome were successful.
  • Success/Failure: The process was deemed successful, but the outcome was a failure.
  • Failure/Success: The process was deemed a failure, but the outcome was successful.
  • Failure/Failure: Both process and outcome were failures.

Here’s a sample of a process/outcome evaluation. The more you write, the more you’ll learn. I keep it simple here for the sake of illustration.

  • Process
  • Tools: Success. Effective use of online tutorials, forums, and videos. I learned a lot from texts and online tutorials. Whenever I had a question, they were answered by other programmers pretty quickly.
  • Time: Failure. It took four times as long as I thought it would to pull in Facebook data. I had to work past midnight every night, and I took a few multiday breaks.
  • Effort: Failure. Python was easy to learn, but getting Facebook status information into something I could use proved harder. Next time around will be easier now that I understand how to get around Facebook APIs. Tableau wasn’t hard once I got it pointing to the right data. I still have a lot to learn, but I got the basic hang of it.
  • Overall: Process failure. Next time, I need to budget way more time to get this done.
  • Outcome
  • Value: Moderate success. I learned a lot about Python and how to mash up transactional data with Facebook status. I need to do a few more small projects before I would want to try this for real at work.
  • Knowledge: Success. I’ve developed a good working knowledge of Python and Facebook APIs.
  • Learning objective: Success. It took way longer than I thought it would, but I did learn what I set out to learn. I can use Python well and do what I need to do with Facebook.
  • Outcome: Outcome success.
  • Overall: Failure/Success.
  • Notes: I need to better judge how long things take for me to learn. Generally, I’m underestimating the time required. But I think if I did a similar project again, I could do it faster because I know where to go to get information. Also, I’ve improved my framework for learning and know where I tend to waste time.