Chapter 9

Working as an In-House Big Data Specialist

In This Chapter

arrow Considering the role of corporate IT

arrow Working for a business unit

arrow Identifying the pros and cons of in-house jobs

Every business today relies on technology to get work done. It’s a given. You can’t make cars or even serve hamburgers without technology. Enter the role of corporate IT. There are three main corporate models that firms use to serve technology to their organization: centralized IT, decentralized IT, and a hybrid approach. These models depend on several factors within a company. The advent of corporate IT came about for several reasons, including purchasing power and overall efficiency.

Imagine a company with five business units of varying size. Each business unit requires computing needs from laptops and software, to managing payroll for its employees. If each group were in charge of negotiating and buying computer hardware, software, and payroll systems, the company would have a complicated hodgepodge of technologies that would be difficult to support. Plus, if each business unit negotiated its own deal, the company wouldn’t get the optimal price.

Enter corporate IT. By providing a set of shared technology services like technology procurement, networking services, and software development resources (programmers), organizations can improve efficiency for things like technical support and software development, and maximize spending through shared buying power. You may be asking yourself, “What about decentralized IT? What’s that?” Depending on the maturity of the business, or the cycle that the organization is in, there would likely be an IT group within a business unit. We call this decentralized IT. In this chapter, I look at jobs within the central IT group, as well as jobs within the business units or decentralized IT.

Working for Central IT to Serve an Organization

The mission of corporate IT is to provide shared computing services to the organization. What is a shared computing service? This could be desktop support, database services, help with buying laptops to providing Internet access to a department. The idea of running a centralized IT or a decentralized IT differs from business to business. At some level, every large organization — public or private — has some sort of model for central IT services. Many groups that have centralized IT for programmers or database services will also have specialized IT needs at the departmental level. (I cover that in the next section, “Working for a Business Unit.”)

Often, these roles are separated into providing application services to the business or infrastructure services, which include maintain core computing and storage services. Application services include the systems that run finance, human resources, and other systems needed to run the firm. Infrastructure services tend to be more ubiquitous in that they’re viewed like a utility. If a department needs computing resources for customer software development, it can leverage infrastructure services from central IT to provide those resources.

Interestingly, many big data projects begin as shadow IT within business units. As the demand for resources grows, some of these resources shift to central IT. Shadow IT is when groups outside of corporate IT buy and run technology largely without the knowledge of IT departments.

Looking at roles in corporate IT

In the end, how centralized IT is will differ from firm to firm. The big data roles typically fall in the following categories as they relate to infrastructure or application services:

  • Infrastructure specialists: Infrastructure specialists focus on keeping the required underlying hardware — like servers, storage, and network services — running.
  • Analytics programmers: Analytics programmers are resident programmers who work on specific projects that may come up across different organizations within the firm.
  • Systems administrators: Systems administrators keep the big data systems fine-tuned and running well, handle access permissions, and make sure that the latest security and software patches are installed to ensure smooth development, test, or production systems.
  • Systems architects: Sometimes called enterprise architects, systems architects create the blueprints that detail how the system components will be arranged. Just like an architect of a building, these people ensure that systems are built for durability and flexibility and that they adhere to corporate standards for security.

Examining a corporate IT job posting

Following is a sample job posting for a Hadoop programmer within corporate IT. This is an actual posting for shared services within a government agency (with the name changed to protect the innocent). You can see the types of experience needed and functions required, which can give you an idea of what a day in the life of a corporate IT programmer may look like.

  • Senior Hadoop Developer – Health IT
  • Job Description
  • Where will you find innovation and technology?
  • Acme Information Systems is seeking a Senior Hadoop Developer who will analyze data for potential fraud, waste, and abuse using a variety of data analysis methods, supporting the functional requirements of a fraud prevention system, and testing the outcome of the system. This position will be located in Baltimore, Maryland and will service business units throughout the organization.
  • The qualified applicant will become part of Acme’s information technology support services contract for the Social Security Administration (SSA).
  • Responsibilities for this position will include the following:
    • Ensure all programming meets the standards of the production infrastructure of fraud analytics to support business unit analysts.
    • Design, develop, and run analytical models.
    • Work with business partners to translate business requirements into analytical models for programmers to build supporting systems.
    • Assist others in understanding data as they summarize and present complex patterns of fraud in the form of easily deciphered datasets to SSA analysts.
    • Perform ETL (Extract, Transform, and Load) to extract the data from multiple DBMS sources and load into SAS. Set up standard templates and coding best practices.
    • Conduct data extraction that may include analyzing, reviewing, modeling, trending, and presenting information, based on provided specs, to support or refute hypotheses, leading to identification of fraud and abuse.
    • Design and prepare technical specifications. Assist in project artifact documentation in business requirements documents, functional specs, flow charts. Architects design system architecture and map business needs to data requirements.
    • Act as self-starter with the ability to take on complex projects and analyses independently.
  • Desired Skills and Experience
  • Basic qualifications:
    • Bachelor’s degree and 14 years of related experience OR 18 years of experience without a bachelor’s degree
    • 3 years of experience with big data analytics
    • Experience with ETL tools
    • Experience with open-source analytics tools like R
    • 3 years of experience using Hadoop ecosystem — HDFS, MapReduce, HBase, Hive, Python, and Pig. Experience with large-scale distributed analytics platforms, databases, and reliable data movement.

There are a few things here worth calling out, starting with the specialized knowledge of this particular organization. This individual is expected to be a subject matter expert (SME) in this particular field. In this case, the level of expertise is related to the federal agency’s regulatory requirements and knowledge of the enterprise architecture, which are standards the agency uses to build and deliver software applications.

In addition, this person will need to be able to work with business customers. This is a reference to the nature of corporate IT. Remember: You aren’t attached to a particular line of business — instead, you serve as a “shared service” to the entire organization. This can be a really good thing if you like to stay engaged with technology and be exposed to different functions of the organization. The downside, as I explore later, is that you’re a degree away from the core function of the firm.

The job also requires a significant amount of experience in related technologies. You don’t simply need to know how to program Hadoop. Of course, in order to be a Hadoop developer you need knowledge of database systems and general programming knowledge. But pay attention to the vastness of the types of technologies that are required. Statistical programming (R, which is an open-source statistical programming language); database knowledge; Extract, Transform, and Load (ETL) technologies; and scripting languages like Python.

So, what are the implications of these requirements? If you’re coming from a traditional programming background — say, you’re a C++ or Java developer — you must be comfortable in being self-directed and working outside the confines of your “language.” As you’ll see in your research of big data jobs, technologies and knowledge aren’t confined to one or two areas of expertise as they are in many areas of technology.

Finally, this is a senior position. This job requires a person to have 14 to 18 years of experience. It’s a paradox that this role within big data, which is only a few years old, would require such experience. But this is a great example of a job function for a senior-level IT professional who is able to retool her skills for a new field. Only 3 years of big data experience is needed, but you need 14+ years of overall IT experience.

tip.eps What if you don’t have 14 years of experience? Flip back to Chapter 2 and see how to plan your future path.

Working for a Business Unit

Most technology jobs within a firm naturally fall within central IT, but with big data, all that has changed. Because the promise of big data is to drive new revenue, insights, and innovation, many business units have directly hired people with this expertise. Earlier in the chapter, I mention that individual business units may have specialized needs, and to that end there are in-house IT jobs that aren’t a part of central IT. This is particularly true with big data.

There is a benefit to working directly within a department — you can more clearly see the fruits of your labor. You have a clear line of sight to revenue — your job function can be easily traced to the bottom line.

If you like being able to easily see how your work impacts the bottom line, working for a business unit is the place for you.

Pros and Cons to In-house Positions

Being the in-house big data expert — be it corporate IT or within a business unit — is a great path to consider if consulting or working for a product company isn’t for you. Working directly for a company has its own unique cultural, professional, and career path consideration — both positive and negative.

Pros

Working on the inside has a lot of benefits over working for a consulting firm or a product company.

Just today, I spent time with a soon-to-be college graduate who was reflecting on her internship at a major Big 4 accounting firm (see Chapter 8). Although she really liked having diversity in her job, she felt like she didn’t have a chance to go deep within her client’s product offering because her projects tended to last only six weeks or so. Working directly for a company allows you to go extremely deep within that firm’s business.

Here are some advantages of working for a company directly:

  • You’ll have a very unique skill set within the organization. In almost any in-house big data role, you’ll be the minority even among technologists.
  • You’ll have the opportunity to go very deep within an industry. Staying put in one place gives you a chance to gain deep industry experience. This can happen as a consultant, to be sure, but it will almost always happen for in-house talent.
  • You’ll be highly visible. Big data is extremely important to driving innovation and value. Any project you work on will have a high degree of management visibility and likely high reward for success.
  • You’ll have a chance to build something from the ground up. Most big data teams within a firm are small. As such, no matter your level, you’ll have a significant impact on how the team grows and matures.

Cons

Working directly for a company has its downsides. These cons are largely related to the type of culture you prefer to work in.

  • You’ll be a lone wolf. You may not be a “one-man wolf-pack” like Zach Galifianakis’s character in The Hangover, but you’ll certainly be in the minority. That means you’re the expert, and career development may be a challenge.
  • You may be pigeonholed. Some people who work within a company’s IT department can feel stuck after a number of years. If you like frequent change, this may not be the place for you.
  • You have limited earning potential. Money isn’t everything, but it is important. Consultants tend to have higher earnings, so be ready for not getting compensated as much as your consulting friends.