© Robert de Graaf 2019
Robert de GraafManaging Your Data Science Projectshttps://doi.org/10.1007/978-1-4842-4907-9_6

6. Promoting Your Data Science Work

Robert de Graaf1 
(1)
Kingsville, VIC, Australia
 

The previous chapters have seen us land the opportunity to do a useful data science project, confirm the customer’s willingness to implement the results, and ensure that the desired results will be achieved. In this chapter, we will look at how to ensure that your efforts up to this point are recognized in your organization and beyond. That recognition can be an important capital for landing and taking on more exciting work.

We have seen the importance of getting people’s attention to be allowed to begin and implement projects. You might be forgiven for thinking that that’s enough—you convinced someone to let you build something for them and delivered the goods.

Unfortunately, it’s only part of the story. You can’t rely on people to recognize the benefits they’ve been given, and you can’t rely on word of mouth to let them know. You have to take responsibility for people understanding the value of what you’ve done.

There are a few different modes to this. One mode is recording what happened inside your own organization and letting people know it happened. Another is communicating to people outside your organization or people who are otherwise new to you.

People often think of communication with the world outside their organization first, as that’s what promotes sales. That shouldn’t mean that if you are primarily in the position of trying to improve an organization from the inside, you shouldn’t be communicating in a very similar way. In fact, we will see that the structure and content transfer from one situation to the other quite well.

This may be more difficult than contacting people in your own organization. External people are less likely to realize you exist, to begin with, and have less inclination to assume that you are doing something that is of value to them. The idea behind the next tool we will discuss is to grab their attention by offering some information for free. The tool that has been developed to promote experts in this way is the whitepaper. It ought to be easier to get the word out to people in your own organization. They work within the same four walls that you do, so you have some ability to access them face to face. For this situation, the best thing to happen is that you can give a talk on your work directly, and we’ll discuss some ways to get the most benefit from that later in this chapter.

Data Science Whitepapers

A key way to let people know what you’ve done is to write a whitepaper. A whitepaper is a marketing document, which aims to showcase the author’s expertise in a particular area.

When writing a whitepaper, typically the author will try to either explain how they solved a problem with their expertise or teach some basic aspects in their field, with the aim of helping the reader understand when it’s time to call the experts. Hence, a tradesperson might share some tips around some very small jobs, leading up to the point that the reader should call in the professionals.

There are a great many guides to writing whitepapers throughout the Internet, often including a guide to structure. They come in enough variety that you can choose the one that fits your needs the best, so have a look at a few and choose one that makes sense in your own head. Two, to start you off are the comprehensive guide by the Content Factor1 (a whitepaper itself, hosted at a web site with other good example whitepapers) and the guide from Foleon.com,2 which has some pointers on how to distribute your whitepaper that aren’t found in every guide.

In the case of data science, though, there is a twist, which is that usually the author is using their data science expertise to solve a problem in an area where the reader is an expert, whereas normally the writer is an expert in the domain and the reader is not. This has a small but noticeable effect on the way that the document needs to be structured and how to approach the audience.

As a result, your first task will be to establish credentials in the area of reader’s problem domain, and as you are unlikely to have higher qualifications or experience in that area than the reader, straightforwardly offering up your own credentials is unlikely to succeed. Instead, the best path forward is likely the “show, don’t tell” approach, often seen in creative writing classes.

In this context, it refers to allowing the reader to see your characters in action and their story unfold, rather than writing out their traits or outlining the plot. In this context it means explaining the domain problem you worked on in a way that leaves no doubt of its importance to the field.

You wouldn’t be working on it if a solution wasn’t valuable, so explain where the value lies—many times it will take the form of this problem being a roadblock for a bigger target. Overall, demonstrating you understand how the problem affects their business allows you to win the audience over.

In many ways, this process within the paper is simply a recapitulation of the journey towards establishing trust I’ve presented in earlier chapters. The difference is that this time you don’t have the benefit of being in the room with your audience to begin a two-way conversation—you have to anticipate a little of the audience’s possible reaction to ensure you get there.

Once you’ve established the problem, the next step in the story will be how you solved it. In the context of data science, two tools will commonly be needed to obtain a solution — an adequate data set (“adequate” because most data sets fall far short of our “ideal data set”) and suitable analysis tools.

Given that so many data science tools are open source, there is a reasonable chance that the data set — if not in its original state, often after the cleaning and preprocessing you’ve performed on it — represents an advantage over competitors.

Hence, mentioning either the way the data was obtained or cleaned may be useful to further establish credibility. This is especially the case if you used advice from subject matter experts to improve the preprocessing process, for example, if there was a reason for missing data relating to the collection process that determined how those missing data were treated.

When discussing the algorithm used, it’s not just a question of correctly tailoring the discussion to a nontechnical audience but also a question of pacing. To maintain your readers’ attention, the whitepaper needs to have the feel and pacing of an unfolding story; too much detail on how the algorithm works and how you did it will slow the pacing and put off the reader.

Crucially, it is not necessary for the reader to come away with a complete understanding of the algorithm used to get your message across. It is almost more true to say that any description of the algorithm provides more color and interest than it provides a true explanation of how the algorithm works.

Applying your algorithm to data represents the second act in your three-act story, the first act being understanding the problem and the data. Here, the solution itself may not be the selling point, as important as it is. When you’re implementing something similar to a predictive model, the selling point will often be what you observed about your data along the way—an extra lesson about the way the variables interact with each other or a surprise about which variables are the most influential or the shape of the relationship.

Although the focus with whitepapers is almost always on external readers, there is room for whitepapers that are aimed at internal users or that don’t make a distinction. As an engineer, I worked for a manager who had the team maintain a library of whitepapers by another name on a variety of issues. They were good both for distributing directly to the customer and for keeping a variety of customer-facing staff informed.

Your whitepaper will make people remember you and think of you as someone who can be useful in their field if you frame it correctly. One of the biggest barriers to acceptance of data science solutions is going to be a feeling that data science is usurping expert knowledge—the whitepaper represents a golden opportunity to show that data science is not a usurper, but is complementary to expert knowledge.

Talking About Your Work

For an internal audience, you probably have more access to your audience, so you aren’t restricted to using a whitepaper to promote yourself to people within your organization (although we will see later that there are still times when that might be useful).

The best way to spread the word about what you have achieved is to do it face to face with a talk or a presentation. Back in Chapter 3, we looked at some of the things that make a successful data science talk from the point of view of persuading an audience.

It was clear in that context that persuasion was the main goal. You were trying to win over an audience who was going to decide whether or not to proceed with your project. You may believe that you’ve passed that point and can go easy on the sales pitch and proceed directly to dispensing information.

The danger for a lot of data scientists is to assume that the objective in a data science presentation is solely or largely to deliver information. This can be true if you are delivering a presentation to an audience of other data scientists to explain a technical point. However, this is not likely to be the most frequent or the most important scenario in which you find yourself giving a presentation.

The more frequent scenario is that you need to persuade people that what you are doing is a good idea, or convince them that your work has had a positive effect on the organization.

The reasoning comes back to a point that is frequently made in guides to preparing presentations, whether they come in five, six, or eight steps (a matter of personal taste—just choose the one that makes the most sense to you, similar to guides to whitepapers)—one of the first steps when preparing a presentation is to consider your audience.

As a data scientist speaking to an audience that contains non-data scientists, you can’t assume that your audience accepts that your work has value and you can’t assume that your audience has the same concept of what the value of your work is that you want them to have.

Even after the work is complete, you need to continue to sell the benefits. You also need to continue to avoid adding in technical details your audience will find extraneous.

Your tale of how difficult it was to speed up your algorithm considering the ancient machine and unsuitable operation system it was running won’t excite a non-data scientist audience. Your non-data scientist audience won’t care how impressed other data scientists were with the technical brilliance you displayed implementing a new kind of algorithm in a difficult to use coding platform—this audience won’t understand why it’s impressive and you will lose their attention in those sections.

Instead, they will care about how your innovation will reduce the time they spend doing things or how it helps ensure that what they spend their time on pays off. By sticking close to these attributes of your project, you will ensure that your work is remembered throughout the organization. Otherwise, you may find yourself relegated to explaining to external consultants what the different tables in your company’s data warehouse contain, so they can earn a huge multiple of your salary to do something you could do in your sleep.

This time, however, the benefits have either been realized or are about to be realized. Hence, for this part of the process it is important to have reviewed whether or not those benefits have been realized or are likely to be.

As much as possible stick to undisputed gains, otherwise you will run the risk of being challenged in your own meeting. If you are challenged successfully, you run the risk of losing some of your license for future efforts, which obviously defeats the purpose of having this sort of meeting.

In most cases, the gain will be big enough to not require embellishment, so you avoid making claims that can’t be substantiated or will antagonize your audience. Also avoid the temptation to oversell by referring to gains that haven’t occurred yet, especially gains that require additional rounds of work. Just stick with what’s happened so far.

There is a sweet spot on how often you do this. People are happy to take a few minutes out of their day to learn about the rest of their organization and about initiatives that make their jobs easier, but there is a frequency at which the exercise becomes a little too routine. Four or five times a year is probably the upper limit.

However, less than about two times and you’ll end up being forgotten, so make a commitment to yourself to get in front of those internal audiences by diarizing times to look for work within your department that is suitable for sharing with the rest of your organization.

If your organization is large, though, this could actually translate to more than two or three talks through the year as you will likely present to different groups at different times.

Presenting to the Outside World

A lot of data scientists do present their work to groups, to people outside their own organizations, for example to data science meetup groups, or to similar groups in their area. The reason is often to promote the data science team of your company as a great place to work that is doing interesting work.

More than that, by creating a presentation based on your work, thinking through how to engage an audience with that material, and therefore working out what about it would make an audience engaged, you will work out what about your work was important.

You can enhance that last benefit further by breaking free of the tendency to think about work in terms of specific projects and thinking about common lessons that apply across multiple projects. These could be technical lessons, for example, best practices that apply to particular tools, or more human-centered, such as the best way to talk to customers from a particular background or in a particular situation.

It also gives you a great chance to get a free validation of your model via questions from the audience or people who ask you things afterward. Although people are likely to be polite and encouraging, when they have questions you can’t easily answer, you will know they have found a hole.

The other side benefit is that as you will want to rehearse your talk with a friendly audience, most likely made up of people from inside your team, it’s an additional opportunity to talk to those people about what you’ve been working on or what the team has been working on, and an especially good opportunity to discuss the team’s work outside of the narrow goals of individual projects.

At the same time, there will only be a limited number of meetups, so the opportunities to speak will also be finite. Fortunately, you can achieve many of the same benefits by blogging, especially if you publish with a site like Medium.com, where there’s a decent-sized audience.

Even if you don’t use a site with a large audience that is generous with comments, the process of deciding the best projects or lessons to choose, and then explaining them from scratch to audiences who should be assumed to know nothing about your organization will help you reconsider what you are doing and find new ways in which your work is exciting.

Lastly, in both of these situations, audiences of other data scientists are the people who will give kudos for technical achievements you can’t get from a lay audience, as mentioned in the previous section. If you want feedback on your novel technical solution, these are the key avenues to find it.

Making History

One of the great things about being a data scientist is getting to try a lot of different ways to solve problems. Naturally, many of these attempts will be glorious failures, where the intended problem is not solved, but something is learned that can be used elsewhere.

Many will also be straightforward failures, where all you learned is that the proposed technique is not the right approach for that problem, or, at least, that the proposed solution requires too much effort to justify the payoff. These are important lessons to learn if your organization doesn’t want to try the same unsuitable approach on each problem every other year.

Therefore, you ought to be as proud of your failures as you are of your successes if you are to ensure that people don’t attempt your failed pathways again and again. Strange as it may sound if you are in the middle of working on something that looks like it won’t turn out right, the thought of the organization repeating your mistakes is more embarrassing than letting your co-workers know about them as they occurred.

To ensure that others don’t follow in your footsteps when you’d rather they didn’t requires you to be forthright about what has worked and what hasn’t. At the same time, as most of the time you won’t be able to predict precisely who it is that’s going to repeat your mistakes, you need to keep this information in a way that future users can find it.

This is one of the key outcomes of project documentation—recording what worked and what didn’t for someone in your shoes in the future, who could be you or could be someone else.

You can capture this aspect of your projects via “Lessons Learned” documentation, which should be considered a significant deliverable from any data science effort.

Effectively, these are documents where you record what was attempted, what worked, and what failed. They are different from a laboratory notebook, however, in that they are intended for a general audience, rather than just as a personal aide-memoire.

Therefore, you need to think carefully about how you will structure your account to match the intended purpose. In this context, the important thing is to cut to the chase, so that someone who wasn’t around to understand the context of the project you have been working on can still easily understand the important lesson that was learned as a result. As much as possible, leave the details of the business case for doing the work out—just enough information that people understand why you looked in this area at all.

■ Pro Tip

Successfully creating a library of information on previous efforts that is well used can be a substantial competitive advantage. For example, the author of The McKinsey Way3 says one of the great advantages of being at McKinsey is being able to access the database of work McKinsey has done on previous projects. On the other hand, in my first couple of jobs, I spent a lot of time re-establishing knowledge that had been lost, and I can attest that simply reinventing what you know has been established before doesn’t make one look forward to Mondays.

The crucial part is to agree where to put the lessons learned documents, as these will be essential parts of your organization’s corporate memory—as long as they can be found by anyone who needs to find them. Your organization’s network may be either a blessing or a curse for doing this—just placing into the shared drive runs a strong chance of seeing them forgotten and unfindable. Using a Git repository or similar is better, and is okay for the intended audience of data scientists. The trick, though, is to avoid keeping the lessons learned documents too closely tied to the individual projects they came from.

For the non-data science component of your company, it is better to get the message out through wider channels. These could include company newsletters.

In some of these forums, you wouldn’t want to refer too directly to things that haven’t worked as you expected. When you want to report on a data science project that didn’t go the way you expected, you need to reframe it away from the original goal. That is, emphasize what you discovered as if it was the goal from the beginning, and let the original goal appear as a secondary goal.

Differing Audiences for Documentation

The most obvious and natural audience for lessons learned documentation is the other members of your data science team. They are obviously most able to benefit directly from the knowledge that on some particular data set a particular approach that’s popular within your team doesn’t work as expected, or other similar insights.

This shouldn’t mean that you neglect writing documents that can be understood by a lay audience, especially for more senior management. If you fail to keep management informed on what you have learned, you run a strong risk that they will ask you to repeat work you have already done that you know won’t achieve the desired result.

At the same time, senior managers and others outside the data science function are unlikely to have either the time or the inclination to wade through the details of every project to discover the most important lessons learned for them. Instead, you’ve got to go to them.

When you prepare the documentation for this audience, you need to ensure that it is intuitive for them and speaks directly to their need. It’s fine to be local, in the sense of referring to in-house data sets, or your organization’s customers or product lines with your in-house terminology, but the technical side has to resonate with their level of understanding. Don’t be embarrassed to keep it very simple.

To help ensure that people who want the technical details can read them, and those who don’t can avoid them, consider the structure of your document carefully. By dividing the document into sections and marking it out with clear subheadings, you can help people find the parts they want to read most easily.

Finally, keeping it as short as possible will maximize the chances that people read enough of it to read the parts you want them to read. Obviously, every extra word you add adds to the risk that your reader loses interest and stops reading.

The totality of this advice might seem very familiar. In fact, realistically, what you are doing here is really creating a whitepaper for internal circulation within your company.

The goals are actually surprisingly similar—you may not realize it initially, but half the purpose of these is to ensure that you and your data science team are thought of early as the people who can help the business for any given problem. The crucial message here is that you can help with anything, and your answer will be useful.

The difference is going to be the length. It is extra important to keep the length of your document in check when you are writing an internal document. People are more inclined to assume there is value in an external whitepaper. This is partly because people know that the authors of external whitepapers see them as a potential source of income, and partly because there is a sense that the authors of the whitepaper are difficult to access. If you are someone they see every day or believe they could see any time they had the inclination, it will make it less likely that putting effort into reading your paper will seem worthwhile—you’d better make it a short and easy read.

The lesson is that just as you would never assume that external customers continue to see your value without you reminding them every so often, you can’t assume that your internal customers automatically see how valuable you are either.

Summary

Implementation isn’t the endgame. You need to ensure that others hear of your best results. You also need to ensure that you are the person who communicates what happened when projects didn’t succeed so you can explain the lessons you learned.

There are both written and spoken methods to do this. To promote your work, you may wish to write a whitepaper—when written well you can attract more work very effectively this way. It is important, however, to get the balance right and certainly to ensure that you are generous toward your reader. That is, you need to give the reader useful information, and not simply promote your product.

Documenting what you achieved is also essential. It might be tempting to think that the documentation that is intended for other data scientists is the end of the story. The other data scientists on your team likely know where to look for information on previous data science projects. The other group that is important not to forget is the non-data scientists, particularly as in many cases this group includes senior people who can get you to repeat work you’ve already done.

Although whitepapers are seen as documents for external stakeholders, you can use simplified versions of the same structure to create internal whitepapers that do the same job within your company.

It will be more work for you, but less work for your audience if you are able to present your work to your users personally. In general, people will be pleased to hear about innovations that reduce their workload, so they are keen to come to your presentation, but make sure that the results are what you will claim they are before you set up the meeting.

Across this whole chapter, one of the key lessons was the importance of learning as much as possible from your efforts while communicating to as many people as possible what you learned. These are some of the most useful initial steps you can take to build your data science team’s brand.

The next chapter builds on this idea to examine how to build behaviors that help your data science team learn more effectively and function more effectively at the same time.

Promotion Checklist

  • Have you developed a whitepaper that showcases a key insight you discovered in a way that also showcases your team’s capabilities?

  • Did you make sure your whitepaper gives the readers information they probably don’t have to build trust, and that it establishes your credibility in the relevant subject domain?

  • Have you presented your work to a local meetup group, showcasing a different side of it compared to the side you showcased to your customers?

  • Have you blogged about your work, presenting some of the work that you haven’t been able to present at a meetup group, or presenting some of the lessons you learned only by doing a few different projects?

  • Have you diarized when the next two or three times you will present the data science team’s progress to others in the business will be?