So, in the preceding data science pipeline we just went through, there are two main sections—data cleaning (where we remove inconsistent data, fill in missing data, and appropriately encode the attributes) and data analysis (where we generate visualizations and insights from our cleaned dataset).
The data cleaning process was implemented by a Python script while the data analysis process was done with a Jupyter notebook. In general, deciding whether a Python program should be done in a script or in a notebook is quite an important, yet often overlooked aspect, while working on a data science project.
As we have discussed in the previous chapter, Jupyter notebooks are perfect for iterative development processes, where we can transform and manipulate our data as we go. A Python script, on the other hand, offers no such dynamism—with a traditional Python script, we need to enter all of the code necessary in the script and run it as a complete program.
However, as illustrated in the Data cleaning and pre-processing section, PyCharm allows us to divide a traditional Python script into separate code cells and inspect the data we have as we go using the SciView panel. In other words, the dynamism in programming offered by Jupyter notebook can also be found with PyCharm.
Now, another core difference between regular Python scripts and Jupyter notebooks is the fact that printed output and visualizations are included inside a notebook, together with the code cells that generated them. While looking at this from the perspective of data scientists, we see that this feature is considerably useful when making reports and presentations.
Specifically, say you are tasked with finding actionable insights from a dataset in a company project, and you need to present your final findings, as well as how you came across them with your team. Here, a Jupyter notebook can serve as the main platform for your presentation quite effectively—not only will people be able to see which specific commands were used to process and manipulate the original data, you will also be able to include Markdown texts to further explain any subtle discussion points.
Compared to that, regular Python scripts can simply be used for low-level tasks where the general workflow has already been agreed upon, and you will not need to present it with anyone else. In our current example, I chose to clean the dataset using a Python script, as most of the cleaning and formatting changes we applied to the dataset don't generate any actionable insights that can address our initial question. I only used a notebook for data analysis tasks, where there are many visualizations and insights worthy of further discussion.
Overall, the decision to use either a traditional Python script or a Jupyter notebook solely depends on your tasks and purposes. We simply need to remember that, for whichever tool we would like to use, PyCharm offers incredible support that can streamline our workflow.