For this task, we will be translating the program we have in the main.py file into a Jupyter Notebook so that we can see the interface that Jupyter offers compared to a traditional Python script. Again, note that we will not be using PyCharm during this process. Now, let's look at the following steps:
- First, we will create a regular folder to follow this example without using PyCharm. Go ahead and open a Terminal at this directory as well.
- Then, we will need to install Jupyter, which can be done via the pip package manager:
pip install jupyter
- Next, since Jupyter is, in essence, a web application, we need to serve it via our local server by running the following command in the Terminal:
jupyter notebook
- This command will open a new tab in your web browser, displaying the current directory where you ran the command. For example, my Jupyter page opens at our current folder:
- From here, you can create new notebooks or upload existing ones from your local machine using the two buttons highlighted in the preceding screenshot. For now, we will use the New button and choose the Python 3 option to create a new notebook.
- Another tab in your browser will open, displaying the newly created notebook for you to edit:
- We can edit the name of the notebook in the top-left corner of the window. Furthermore, what we currently have inside the notebook is a code cell. As we mentioned previously, we would only enter a part of our code in a cell. Each cell can also be run independently from each other. For now, we will use this code cell to import the libraries that our program will be using. Enter the following code into the cell:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
- To run a code cell, you can click on the Run button, as shown here, or simply use the Shift + Enter shortcut:
- Next, simply type in individual parts of the program we have, into separate code cells. By the end, you should have the following notebook:
As we can see, any code output (be it printed output or visualization) is displayed immediately following the code that produces it. This, again, allows Jupyter users to read and edit their notebooks in a sequential and incremental way.
- To improve the readability of our notebook even further, let's add some Markdown to our code. Go ahead and insert a cell in front of our first one (using the Insert menu).
A newly inserted cell is a code cell by default. We need to convert it into a text cell to be able to enter Markdown code. To do that, select the new cell, click on the drop-down menu on the menu bar like so and choose the Markdown option:
- After this, enter the following Markdown code:
### Importing libraries
When this code runs, a level three Markdown heading will be produced.
- Here, we are using these headings to describe our individual code cells. In the same manner, insert a Markdown heading above each of your code cells, like so:
- We mentioned earlier that one reason for the popularity of Markdown is its support for mathematical equations in LaTeX. Let's see how that plays out in Jupyter. Insert a Markdown cell right before the Correlation matrix in heatmap section and enter the following code:
### Pearson correlation formula
$r_{XY}
= \frac{\sum^n_{i=1}{(X_i - \bar{X})(Y_i - \bar{Y})}}
{\sqrt{\sum^n_{i=1}{(X_i - \bar{X})^2}}\sqrt{\sum^n_{i=1}{(Y_i - \bar{Y})^2}}}$
In Markdown, the preceding code produces the formula for the Pearson correlation between two given arrays of numbers, which is what the corr() method in our code computes. After running the preceding code, you will obtain the following Markdown:
The ability to combine LaTeX and general Markdown text with live code makes Jupyter notebooks a flexible tool in data science projects. Being able to display the code in between text explanations of a data analysis process can help readers of a Jupyter Notebook follow what is being done to that data much more easily. This is why Jupyter notebooks are a common tool for making presentations and reports in data science teams.
Finally, when you finish working on your notebooks, you can come back to the Terminal and terminate the Jupyter server by using the Ctrl + C shortcut. Now, we have gone through the different basic uses of Jupyter notebooks. In the next section, we'll see how PyCharm supports this tool.