CHAPTER 5

Energia’s Data Analysis
Methods and Results

Upon their return to Energium, Chercheur would advise Energia to perform a content analysis on the newspapers of the kingdom to make sense of the main debate topics across the nation. This would help both him and her update themselves, after a long absence from Energium.

“What is a content analysis?” she asked him.

“You conduct a content analysis,” he said, “whenever you study the content of any recorded communication. It could be the content of a photo album, a stamp album, a sticker album, an autograph album, a musical album, a song, a painting, a book, a journal, a magazine, a newspaper, a website, a letter, a text message, or an email.”

“How do I perform the content analysis of newspapers in the case of my research?” she asked, both responsively and adumbratively.

“Listen, and make a mental note of it!” he exclaimed oratorically. “First, you come up with a research hypothesis, a general opinion or statement you would like to test. Once you have a hypothesis, you need to identify the dependent variable in the hypothesis and create at least two categories to operationalize or measure the dependent variable. Coding your categories numerically will help the analysis tremendously. After you have identified the groups in your dependent variable, you need to identify the unit of analysis, which is the small subset you focus on for data collection. In this case, your unit of analysis is article, I mean the articles in the newspapers. Next, you choose a time period, and you randomly select a newspaper to start reading its articles. For example, if your hypothesis is that the articles from newspapers in Energium echo the debate on renewable energy, as you read each article, you identify whether it is about renewable energy or not. If, for instance, you get more than fifty percent of the articles reflecting the topic of renewable energy, then you will conclude that renewable energy represents the main topic being debated in the kingdom, which would support your research hypothesis and reject the null hypothesis.”

To heed her mentor’s advice, Energia set to discover the trends in social debates in the Kingdom of Energium through a content analysis. She would randomly select a sample of one hundred local newspapers and monitor them to uncover local trends of publications. She started the process of the content analysis by conceptualizing two broad categories to classify the topics covered in the newspapers. Her first category was on the newspapers with at least one article on the benefits of renewable energy sources. She coded that group numerically as one (1). The second category was on the newspapers without any allusion to renewable energy, which she coded as zero (0). As she leafed through the newspapers, she found that seventy-five out of the one hundred newspapers had at least one article on the benefits of promoting renewable energies.

Encouraged by those findings, she decided to take a step further in the content analysis. She randomly selected ten of the one hundred newspapers and read all their articles. She wanted to test the hypothesis that the articles in the newspapers reflected the need to promote renewable energies. At this stage, she would create three thematic rubrics to organize the articles in the newspapers she had selected. The first rubric was on articles without any allusion to energy sources, the second was on articles advocating for non-renewable energy sources, and the third one was on articles in favor of renewable energy sources. She coded the three rubrics, using a numerical logic. The rubric on articles without any allusion to energy sources received the number 0 as its code. The rubric on articles advocating for non-renewable energy sources got the number 1 as its code. She assigned the number 2 as a code to the rubric on articles in favor of renewable energy sources.

Out of the two hundred ninety-eight articles contained in the ten newspapers, one hundred fifty-four articles examined the benefits of renewable energies and advocated for the need to promote renewable energy sources. Ninety-three other articles did not allude in any way to energy sources. Only fifty-one of the two hundred ninety-eight articles backed some advantages of non-renewable energy sources. The results showed that the category coded as 2 was the most dominant, meaning that the articles supporting renewable energy sources had the highest coverage in the newspapers, which confirmed Energia’s research hypothesis.

In light of those results, she got a clear message: she was not alone in the kingdom of Energium to think about the urgent need to promote renewable energy sources. She understood that the data collected from the newspapers echoed the voices of many citizens of the kingdom. She assumed that such relevant data reflected her fellow citizens’ concerns and wishes clearly. She felt comforted to understand that many of the citizens of Energium were supportive of renewable energies. The results of the content analysis would reassure Energia that she was fighting for a good cause. She felt happy that she was on the right path with her research initiative.

In view of the results of the content analysis, she drew near Chercheur and whispered in his ears, “Content analysis is very helpful for data processing. I really appreciate your guidance through this useful method.”

As she expressed her gratitude to him for advising her on performing a content analysis, Chercheur said, “Indeed, content analysis is a convenient method to process and analyze any recorded data, including data from diaries or journals, minutes of meetings, book reports or reviews, observations, interviews, and surveys. In the specific case of your research, you could use it to analyze the data you had collected from your observations, interviews, and surveys for meaning making. You could also do a content analysis of your personal diaries to apprehend or comprehend your recorded experiences about renewable and non-renewable energies.”

Chercheur was mindful to alert Energia on the strengths and weaknesses of content analysis. He said, “Content analysis would save you time and money; it is a relatively cheap, quick, and easy research method. You can conduct it behind your computer in record time. However, it only applies to recorded oral, written, or painted communications; you cannot use it for non-recorded communications. It tends to be superficial, as it proceeds by quantitative word counts generally. It does not perform an in-depth analysis that digs into the innuendos, the feelings behind the words used, and their meanings.”

“Which other methods would you advise for data analysis?” she asked him.

“I have a plethora of them boggling my mind right now,” he joked.

She motioned to him to spell them out: “Why don’t you spit them out, sir?” she jested.

“Certainly!” he answered. “I would encourage narrative analysis, comparative analysis (including t-test for independent groups, paired-sample t-test, and analysis of variance), correlation analysis, regression analysis, and factor analysis. The list could be longer, but you can focus on those ones.”

“What could you tell me about narrative analysis?” she asked.

“Narrative analysis is the study of a narration,” he replied. “It scrutinizes written and verbal communications.”

“I find it hard to differentiate between content analysis and narrative,” she complained.

“Content analysis and narrative analysis,” he said, “are similar to some degree, they both analyze human communications, by examining who said what to whom, how, and why. The difference between the two resides mainly in the depth of narrative analysis. While content analysis often remains on the surface of the words by focusing more on counting the manifest words to make conclusions based on the frequency of their usage or the number of times they appear in the narration, narrative analysis descends deeply inside the words to understand and exhibit their meanings and effects.”

“Please explain further what you mean,” she interjected.

“Narrative analysis,” he went on, “is highly interpretative; it emphasizes more meticulously the underlying assumptions and meanings of the words, by capturing the details of a written communication, including the meanings of punctuations and insinuations. In a spoken communication, narrative analysis reflects on the pauses, the eye contact, the facial expression, the gestures, the emotions, and other non-verbal attitudes occurring in the communication.”

“How would you illustrate the use of narrative analysis in the case of my study?” she interrupted.

“You could well apply narrative analysis to your diaries or journals,” he said. “You can use it to analyze the daily records of events and experiences you have been keeping. You can also perform a narrative analysis of the transcripts of your interviews on the benefits of wind energy in Wind City, by reflecting on the specifics of the narratives of the respondents, including the meanings of their words, their voice tones, their emotions, their gestures, and their hems and haws.”

Motivated by Chercheur’s suggestions, Energia would read through the transcripts of the interviews she had conducted in Wind City to organize the narratives and code them thematically. She would also listen to the narratives she had recorded in Wind City to understand the meanings of the respondents’ facial expressions, their gestures, their emotions, their pauses, or the hems and haws. She would engage in a double process of conceptualization and operationalization of the interviews.

After spending much of her time reading and listening to the transcripts of the interviews, she conceptualized and created two coding categories to classify the data from the interviews. The first coded category was about the health benefits of wind energy in Wind City. The second coded category was on the economic benefits of wind energy in Wind City. As she analyzed the narratives from the interviews, she identified clear indicators for each coded category. The indicators of health benefits included low CO2 emissions, breathing clean air, feeling relaxed, feeling healthier physically and mentally, and feeling happy. The indicators of economic benefits included the affordability of wind energy with citizens spending less money on electric bills, and the high employment rates in the sector of wind energy. The two sets of qualitative indicators mirrored the respondents’ multidimensional experience in terms of the benefits they had harnessed from wind energy.

Energia’s tight scrutiny of the interviewees’ facial expressions, tones, gestures, emotions, and pauses was amenable to clarifying the underlying assumptions of the narratives. The coding operations contributed to a better understanding of the latent or hidden content of the interviews in reference to the social and economic benefits of wind energy for the citizens of Wind City.

Satisfied with the results of her conceptualization and operationalization of the interviews, she exclaimed, “Wow! What a helpful method narrative analysis is!”

Chercheur nodded his head in agreement. “It is very useful for meaning making!” he exclaimed.

“What about comparative analysis?” she asked him.

“I would present comparative analysis,” he said, “as a process that allows a researcher to examine and compare two or more social units, objects, individuals, groups, cases, events, processes, interventions, documents, variables, or data sets. Though it can be qualitative as in situations of small-number cases (less than ten cases), it is often quantitative, because it covers a large number of cases (ten or more cases).”

“Could you specify and explain some quantitative tests for comparative analysis?” she interposed.

“Absolutely,” he said. “We could think of the independent t-test, the paired-samples t-test, and the one-way analysis of variance, to name a few tests used for comparison.”

“How would you define the independent t-test?” she asked. “When and how can I use it?”

“The independent t-test,” he said, “allows you to compare two independent groups, by testing the differences between the means of the two groups. For example, you could use the independent t-test to examine the effects of non-renewable energy in comparison to the effects of renewable energy on the environment. You could also rely on the independent t-test to compare if the men and women of the kingdom of Energium differ in their interest in renewable energy. Be mindful the independent t-test applies to cases where you compare only two independent groups. If you have more than two independent groups, you cannot use that test.”

“That is interesting!” she exclaimed. “What about the paired-samples t-test?”

“The paired-samples t-test,” he said, “allows you to compare pairs, or matching objects or subjects. Whereas the independent t-test requires independent groups, the paired-samples t-test requires matched groups or correlated groups. In practice, the paired-samples t-test involves a pre-test and a post-test design. For instance, you could use the paired-samples t-test to analyze the data from the experiment you had conducted in Choice City on the benefits of renewable energy.”

“This conversation is so helpful,” she interrupted. “How about the one-way analysis of variance?”

“The one-way analysis of variance,” he replied, “allows the researcher to compare three or more independent groups. Remember what I said previously about the independent t-test. If you are interested in comparing more than two groups, you cannot use the independent t-test to achieve your goal; the one-way analysis of variance is the tool for this level of comparison. For example, you can use the one-way analysis of variance to find out whether the type of energy source an automobile relies on would affect the level of air pollution in the environment.”

“This is all exciting,” she interjected.”

Excited and inspired by Chercheur’s explanations, Energia decided to spend the rest of that day applying the independent t-test, the paired-samples t-test, and the one-way analysis of variance to analyze the data she had collected in Sun City, Wind City, Choice City, and Water City.

She resolved to compare the monthly direct and well-to-wheel carbon emissions of gasoline-powered vehicles to the monthly carbon emissions of electric vehicles. She defined direct vehicle carbon emission as the carbon a vehicle would produce through its tailpipe and the carbon evaporating through fueling. She defined well-to-wheel emissions as the carbon from the vehicle using energy, as well as the carbon from the production, processing, and distribution of the vehicle and the energy it would use (whether it is electric energy or gasoline). In her perspective, the well-to-wheel emissions do not include the recycling process or disposal of the vehicle. Energia would include the disposal or recycling in another broader variable, one of the life cycle emissions of the vehicle. She perceived the life cycle emissions of a vehicle as encompassing the overall emissions relating to the production, the distribution, the operation, and the disposal or recycling of the vehicle and the energy it would use. She did not consider the variable of life cycle emissions in this analysis.

For the purposes of her study, Energia would use the independent t-test to analyze the data from a major automobile company in Choice City. The data set she relied on presented the city’s monthly averages of carbon emissions per vehicle and measured the carbon dioxide emissions in metric pounds for a sample of one thousand vehicles, including five hundred electric vehicles and five hundred gasoline-operating vehicles. The comparison revealed significant differences between the mean of the electric vehicles and the vehicles running on gasoline.

The results indicated that the electric vehicles had no direct carbon emission during operation. The gasoline-powered vehicles had high direct carbon emissions through their tailpipes and through evaporating fuel. The results also showed that electric vehicles had lower well-to-wheel carbon emissions compared to gasoline-powered vehicles. The relatively low carbon emissions associated with electric vehicles came not from their tailpipes, but from the electric power plants and the process of production and distribution of the electricity they used.

Energia would also use the independent t-test for testing how the male and female citizens of Sun City would agree with an increase in the city budget for solar energy. She randomly selected a sample of one hundred citizens of Sun City, including fifty males and fifty females. She would ask them to express their level of agreement with the increase on a scale of one (1) to four (4), with:

1 meaning strongly disagree,

2 meaning disagree,

3 meaning agree,

4 meaning strongly agree.

The results of the analysis showed a major difference between the females and males. The mean for the female citizens was greater than the mean for the male citizens. The support for increased spending for solar energy was higher among the female than the male citizens of Sun City.

She decided subsequently to examine whether first-time male and female drivers in Energium would differ in their inclination to driving electric cars or gasoline-powered cars. She randomly selected fifty first-time female drivers and fifty first-time male drivers from a population of Generation Z drivers in Energium. She would ask all one hundred drivers to choose between driving an electric vehicle and a vehicle running on gasoline. She would use the independent t-test to compare the means of the two groups. The analysis revealed a significant difference between the females and males. The mean for the female drivers was greater than the mean for the male drivers. Per the mean values, significantly more female drivers preferred driving electric cars than male drivers.

Obviously satisfied with the results of her t-tests analyses, Energia would extend her analytical efforts, by applying the technics of the one-way analysis of variance (ANOVA) to find out the differences in the levels of air pollution by electric vehicles, hybrid vehicles, and vehicles operating solely on gasoline. In that perspective, she would rely on the same data she had previously used, from the automobile company in Choice City. She defined the level of air pollution as the amount of greenhouse gases and other air pollutants (measured in metric tons) vehicles released in the natural environment annually. She would rank the levels of air pollution as low (coded as 1), medium (coded as 2), and high (coded as 3). The electric vehicles relied on electric power one hundred percent. The hybrid vehicles had both an electric mode and a combustible mode. The gasoline-powered vehicles ran exclusively on gasoline.

She performed the analysis over one hundred and fifty vehicles, including fifty electric vehicles, fifty hybrid vehicles, and fifty gasoline-powered vehicles. The mean values for the three groups of vehicles indicated that the vehicles running on gasoline had the highest emissions of carbon dioxide, methane, nitrous oxide, and other air pollutants such as nitrogen oxides. The hybrid vehicles followed with a medium level of emissions of carbon dioxide, methane, and nitrogen oxides. The electric vehicles came last with the lowest levels. The results of the analysis showed that the type of energy source a vehicle used had a significant impact on the level of air pollution, meaning that the vehicles running solely on gasoline polluted the air more than the hybrid vehicles, and far more than the electric vehicles. The analysis enhanced that the electric vehicles running on electricity from wind or solar energy recorded lower well-to-wheel or life cycle emissions, when compared to the electric vehicles running on electricity from coal and natural gas. On a side note, the analysis signaled that electric vehicles were quieter than vehicles operating on petrol or diesel, which provides a good remedy for noise pollution. In light of those results, she concluded that the electric vehicles were good for a clean environment and public health.

At the end of that day, Energia happily presented the results of the different analyses to her advisor for his feedback.

It was late, and Chercheur advised her to go rest.

She gave a deep groan, left reluctantly, and went back home.

The same was not true of her advisor. He withdrew to his office quietly, and spent some time reviewing the results of Energia’s data analysis. As he read through the analysis, feasting his eyes of the findings, his phone rang.

It was his wife, Luz.

“The children and I are waiting, I hope you are joining us for dinner today?” she said desperately.

Apparently surprised, Chercheur looked up. It was eight o’clock; he was due back home at six o’clock for dinner with his family.

“I am very sorry, Sweetheart!” he said to his wife. “It is my mistake, I am leaving my office right now.”

“You mean you are still in your office at this time,” interrupted Luz, “I hope everything is fine.”

“I got hooked on reviewing the results of Energia’s data analyses, and the time had slipped by quickly, it is my mistake,” he replied remorsefully.

“It happens; setting your alarm as a reminder could help you regulate your passion for research, for a balanced lifestyle.”

Chercheur got the message, and on his way out, he set his alarm right away for the following day.

Back to his house, he rushed through the entrance, and before Luz and their two children could welcome him back in, he said to them in a soft voice, “My dear wife, son, and daughter, I know I owe you all a thousand apologies. I am very sorry.”

His daughter Mira replied teasingly, “Let this be a warning for you, Mr. Researcher! Be punctual in the future, or you will get a fine, a probation, or a sentence.”

Her joke made their entire family giggle.

“No,” Chercheur interrupted with an emotional plea, “I may be able to handle a fine with your mother’s assistance, but I do not want a sentence, and I cannot afford a probation away from my lovely and loving family.”

“Be reassured I have learned from my mistake, and I promise sincerely to make up for this,” he added apologetically.

In the blink of an eye, Luz gave her husband an affectionate hug and covered his face with kisses. Mira and her brother Stellus followed suit, and the family moved to the dining room to enjoy a late dinner peacefully.

Chercheur was very grateful to Luz for her understanding and helpful tip.

Early in the morning, the next day, apparently satisfied with the results of the data analyses, he gave Energia high marks, but he added,

“Though your analysis reveals that the electric vehicles release less nitrogen oxides and less greenhouse gases such as carbon monoxide and methane, it also sounds the alarm on the roots or sources of electricity. Depending on their electricity sources, electric vehicles more than gasoline vehicles could have a greater acidification effect on the environment; their production processes and the manufacture of their batteries demand greater quantities of toxic minerals, including copper, nickel, and aluminum. This red flag reminds you that electric vehicles do not represent a perfect recipe for environmental health. So, don’t be fooled, be on your guard against any glamorous lure!

“Your analysis also seemed to shed light on another significant difference. Driving electric vehicles in geographic locations of Choice City that relied heavily on non-renewable energy sources, such as the fossil fuels, to generate electricity did not result in easily detectable and quantifiable direct emissions of greenhouse gases. Meanwhile, when driven in the neighborhood running on renewable energy, the electric vehicles clearly recorded zero direct carbon emission.”

He quickly followed up with positive comments when he said to Energia, “You show proficiency not only in data collection but also in data analysis. This illustrates how you develop your research skills by applying the research strategies you learn. Well done! You make considerable progress, and you earn full credit for it. I know you can do better by going even further.”

“I appreciate your compliments,” she replied tactfully, “I can go further only with your help.”

“At your service!” he exclaimed. “I am glad I can help.”

“How may I assist you?” he asked subsequently.

She giggled and said jokingly, “I will appreciate it, if your Highness can now lead me through the other analytical tools you had mentioned previously.”

“Could you be more specific?” he requested.

“I need your help to understand correlation analysis, regression analysis, time-series analysis, and factor analysis,” Energia responded.

Chercheur said, “Well! I understand correlation analysis as a test for a possible relationship between two variables at a minimum, by examining the magnitude of that relationship and its direction as positive or negative. You could perform a correlation analysis to determine the extent and direction of the relationship between the two variables of energy source and air quality.”

“How can I do that in practice?” she enquired, perplexed.

He looked at her to reassure her and said, “Using a computer program designed for data analysis, you can analyze statistically a pair of data on the two variables of energy source and air quality to find out if there is any relationship between the two and to determine the magnitude and direction of that relationship.”

“What would indicate that the variables in the analysis correlate?” she wondered nervously.

He smiled to appease her and said, “Do not worry! By convention, we call it a correlation coefficient, the number that expresses the relationship between your variables. The correlation analysis will display the correlation coefficient as a number. That number may vary from -1 to +1.”

“How do I interpret the negative one (-1) and the positive one (+1)?” she persisted.

He looked at her again and replied assuredly, “You will interpret -1 or +1 as a perfect relationship between the two variables (with respect to the magnitude of that relationship). The closer the correlation coefficient is to 1, the more significant the relationship between the variables. As a conventional rule or principle, the level of significance is usually set at less or equal to 0.05 in social research.”

“What if the correlation coefficient stands at 0?” she continued insistently.

“That is a great question,” he replied intently. “When the correlation coefficient is 0, this stance indicates the absence of a relationship between the variables.”

Energia nodded and said, “How about the direction of that relationship? How do I determine if a relationship is positive or negative?”

Chercheur said, “The positive relationship means that the variables under study go in the same direction (the variables go together either east, or west, or south, or north, or up, or down, or high, or low). The correlation coefficient here shows a positive sign (as in + 0.513 or 0.513). An example could be the positive relationship between high blood pressure and high risk of heart attack, or the positive correlation between increase in stress and high cortisol levels.

“The negative relationship implies that the variables go in different or opposite directions (when a variable moves east, the other one heads west; when one heads north, another one goes south; when one goes up or high, the other one heads down or low. The correlation coefficient shows a negative sign (as in - 0.479). An example here could be the negative correlation between high cholesterol and low-quality health, or high liver enzymes and decline in life expectancy. In each case there is a relationship between the respective variables, but that relationship is negative.

“In the case of your study, if the energy source records high scores, while the air quality records high scores, this translates a positive relationship. When both the energy source and the air quality register low scores consistently, this also means a positive relationship between the two variables.

“However, when the energy source scores high, while the air quality scores low, this implies a negative relationship between the two variables. If the low scores of the energy source contrast the high scores of the air quality, this also means there is a negative relationship between the two variables.”

“In the social sciences, we use different types of correlation coefficients,” he went on, without breaking his train of thought. “The commonly used correlation coefficients include the Pearson correlation coefficient and the Spearman correlation coefficient. We use the Pearson correlation coefficient for scale (that is interval or ratio) variables such as age, income, weight, height, temperature, and other variables of the same kind. The Spearman correlation coefficient is for ordinal or ranking variables such as social class (as lower, middle, or upper class), rank of basketball team (as fourth, third, second, or first team), size of T-shirt (as small, medium, large, or extra-large size), the intensity of heat or pain (as low, medium, or high), and other similar variables. We use the lowercase r to symbolize the Pearson correlation coefficient, and the symbol rho to represent the Spearman correlation coefficient.”

Staring at Chercheur all this time, Energia exclaimed, “How about some illustrations!”

“Good for you,” he answered. “Let’s suppose there is a correlation between the two variables of vehicle mileage and rate of greenhouse gas emissions. If the strength of the correlation shows up as 0.645, it means the value of r is 0.645. Notice we use the lowercase r to indicate the correlation coefficient, because we are in the presence of two scale variables. Meanwhile, if there is a correlation between the two variables of size of solar panel and level of electricity production, we will express the correlation coefficient with the symbol rho, because the two variables in question are ranking. If the strength of the correlation is 0.723, it means the value of rho is 0.723. In both cases here, r and rho indicate positive statistically significant correlations.”

After listening carefully to his magisterial or masterful explanation, Energia uttered her gratitude to Chercheur. “Thanks to you,” she said. “I now understand correlation analysis as a valuable test for social research.”

“What about regression analysis?” she continued enthusiastically.

Following a deep breath, he said convincingly, “Regression analysis is another test of the relationship between two or more variables. Unlike correlation analysis which focuses on the magnitude and direction of the relationship, regression analysis uses the relationship for prediction. For instance, if you know the scores of the energy source, regression analysis allows you to predict the scores of the air quality.

“Keep in mind there are various forms of regression analyses. Most commonly, you will hear about linear regression and multiple regression. A linear regression implies the possibility of representing the relationship between the two variables by a straight line; you could do this by using graphical techniques or statistical plots. A multiple regression allows you to predict a variable simply from knowing a series of other correlating variables. For example, you could use multiple regression to analyze how the variable of environmental health depends on many other predicting variables.”

With her eyes fixed on her advisor, Energia said, “It seems to me that regression analysis implies correlation analysis to some degree; prediction presupposes there is a relationship between the variables. I hope my perspective is accurate.”

“I understand your point,” Chercheur responded. “Correlation analysis is a prerequisite to regression analysis. Regression analysis inherently encompasses correlation analysis somehow.”

“How about time-series analysis?” she asked.

“I perceive time-series analysis,” he answered, “as a test for changes in a variable over time. You could use time-series analysis to examine the changes in energy source over time in the human history and test the explanations or justifications for the trends. You can do this by relying on the data you had recorded in your personal journals. In practice, regression analysis is helpful in time-series analysis.”

“That is very interesting,” she said. “What about factor analysis?”

“Factor analysis,” he replied, “is a test that reduces a large number of highly correlated predicting variables to a small number of independently representative groups or factors in order to explain the outcome variable.”

After uttering a deep groan, she said, “That is difficult for me to understand, please illustrate!”

“Of course, I will,” he responded. “Imagine that a large number of variables contribute to air pollution. Let’s suppose those variables include emissions from cars, emissions from airplanes, industrial emissions, plastic disposals, poor managements of garbage, wildfires, pesticides, farming, chemical wastes from pharmaceutical laboratories and hospitals, chemical leaks or spills, fracking or hydraulic fracturing and oil extraction and production processes, gas stations, burning gasoline, oil spills, coal plants, nuclear plants, wars, atomic explosions, and many other variables. Suppose we have as many as fifty variables contributing to air pollution. That is a large number of variables. When you look at these variables closely you realize that some of them are redundant, or they correlate highly. For example, the three variables of emissions from cars, emissions from airplanes, and burning gasoline are redundant, because they boil down to greenhouse gas emissions. Greenhouse gases become the common denominator, a factor hosting multiple redundant variables. Factor analysis is a method that allows you to group such redundant variables under the umbrellas of common denominators or factors. So, in the case of your research, instead of having up to fifty variables, we could reduce them to only five factors, with each respective factor regrouping ten redundant variables.”

“How do I know the weight of each respective variable in a factor?” she questioned.

“The computer output of a factor analysis generates numbers or coefficients that express how each variable relates to its factor; we call them the factor loadings,” he answered promptly. “Also be mindful of what we call the eigenvalue or the ratio between the general shared change and the unique change an extracted factor provides. Ideally, in factor analysis the extracted factors displaying eigenvalues of 1 or higher are significant, the analysis would disregard factors displaying eigenvalues less than 1. When the eigenvalue is higher than 1, it means that the extracted factor explains more of the general common change than the unique or specific change.”

“I anticipate you may struggle to understand what I mean by eigenvalues, but do not worry, you will understand it better as you apply and practice factor analysis,” he continued.

“I am afraid I am confused about the eigenvalues,” she said with annoyance; “neither do I understand what you mean by factor loadings,”

He pondered and said, “The factor loadings indicate the strength of the correlations between each variable and its factor.”

“I hope this is helpful,” he added instantly.

After Energia had heard his clarification, she was amazed, and she exclaimed, “Factor analysis is quite sophisticated and impressive!”

“Yes, it is indeed,” he agreed enthusiastically.

After spending several hours listening to her advisor’s perspectives on correlation analysis, regression analysis, time-series analysis, and factor analysis, Energia rushed into a quiet computer lab across.

Puzzled by her move, Chercheur got worried. “What explains that rush?” he asked, uttering concerns about her. “I hope everything is fine.”

“There is nothing to worry about, I am fine and I have everything under control,” she answered metaphorically.

She went into the computer lab carrying a set of data she had obtained from Mr. Sun’s office in Sun City. She also carried data she had collected from Choice City and Wind City. This made Chercheur guess she intended to analyze some data, and he was not wrong.

Inside the computer lab, Energia sat down comfortably before a computer located in a remote corner of the room. She opened a computer program called SPSS (which means Statistical Package for the Social Sciences) to perform a battery of tests on her data.

Chercheur had introduced her to SPSS as a standard integrated system of computer tests for data analysis.

She was fond of SPSS for a few specific reasons. She appreciated the impressive number of tests it offered for statistical analysis, including t-test for independent groups, paired-samples t-test, one-way analysis of variance, correlation analysis, regression analysis, factor analysis, and many other tests. She also enjoyed its flexible data format. She found it helpful how SPSS provided its users with the options of a Windows method and a syntax method. The Windows method would take its users through the clicks of windows and dialog boxes to select the tests and variables relevant to the analysis. The syntax method would require its users to learn and demonstrate proficiency in the language of formulas used to implement statistical analyses.

Though Chercheur had warned Energia that both methods presented some advantages and disadvantages, she felt more comfortable with the Windows method, because she found it easy to use, following much practice or lab trials, and after mulling over her choice.

“The Windows method feels like a fun game that takes you from one click to another one, you just need to know the next step and its meaning,” she once confided to her advisor.

Concentrating on her data in front of the computer, Energia identified three variables of interest for her analysis. The three variables were solar energy use, air quality, and water quality.

She set to examine the correlation between solar energy use and air pollution in Sun City by analyzing the data she had obtained from the office of the Mayor of Sun City, Mr. Sun.

As she prepared to set up a data file using the Windows method in SPSS, she created a codebook with the names of her variables and their coding descriptions on a scratch paper. She would use that information to define her variables in the Variable View icon of SPSS, by specifying their names, their labels, their values (if necessary), and their levels of measurement as nominal, ordinal, or interval-ratio.

After defining her variables, she clicked on the Data View icon of SPSS to enter her data.

Next, she would test the basic assumptions of linearity and homoscedasticity to make sure correlation analysis would be an appropriate test in this case. The assumption of linearity would imply that the relationship between the two variables of solar energy use and air quality could represent a straight line. The assumption of homoscedasticity would mean that changes in the scores of air quality were consistent with changes in the scores of solar energy use.

To that end, from the menu bar of her SPSS, Energia clicked on Graphs, then Scatter/Dot, and she selected the Simple Scatter icon. Following SPSS, step by step, she clicked on the Define icon to open the Simple Scatterplot window. In the open window, she transferred the variable of solar energy use to the Y axis and the variable of air quality to the X axis. She would also click on the Options icon in the open window to select the option of excluding cases listwise to handle missing values.

She would click on the Continue icon to go back to the Simple Scatterplot window. There, she clicked on the OK icon to run the analysis of testing the assumptions on linearity and homoscedasticity.

The results of the analysis presented a scatterplot that confirmed a linear relationship between the variables of solar energy use and air quality. As the scores of solar energy use increased, so did the scores of air quality. The scatterplot obtained also showed the assumption of homoscedasticity was met. Changes in the scores of air quality remained relatively constant from one score to another score of solar energy use.

Happy with the outcome of that first step, Energia felt confident to run a correlation analysis.

From the menu bar, she would click on the Analyze icon. From the long list of tests, she would select Correlate, then Bivariate Correlations. Once the window of Bivariate Correlations opened, she ensured the boxes of both the Pearson correlation coefficient and the two-tailed test of significance were checked. There, she transferred the two variables of solar energy use and air quality to the Variables field. She also clicked the Options icon to ensure the field of Exclude cases pairwise was checked for handling missing values. Next, she clicked on the Continue icon to return to the Bivariate Correlations window where she would click the OK icon to complete the analysis.

The results displayed in the SPSS output confirmed a positive and statistically significant correlation between solar energy use and air quality over a period of ten years in Sun City. The Pearson correlation coefficient (r) stood at 0.895. This meant that as the scores of solar energy use increased, so did the scores of air quality. The results implied that the use of solar energy significantly contributed to reducing air pollution and increasing the chances of respiratory health in Sun City over the past ten years. The more consistently the city implemented its policies on the use of solar panels, the more its citizens enjoyed good air quality.

Thrilled by these results, Energia seized the same opportunity to examine the correlation between the two variables of solar energy use and water quality in Sun City. The results of her Spearman correlation analysis showed a positive and statistically significant correlation between solar energy use and water quality over a period of ten years in Sun City. The Spearman correlation coefficient (rho) was 0.853. This meant that as the scores of solar energy use increased, so did those of water quality. The results implied that the use of solar panels contributed to significant reductions in water pollution in Sun City over the past ten years.

These results reminded Energia of some happy encounters in Sun City. While she was there for data collection, she encountered citizens of Sun City who were proud of living in a city with drinking water free of lead. During her stay in Sun City there was no known risks of lead contamination. She also remembered some history in the legislature of the city. At some point in the history of Sun City, the city council, mindful of the potentials of their territory in oil and natural gas, had put in place a solid legislation to forbid any practice of or inclination to fracking or hydrocracking in Sun City, to prevent the pollution of its surface and groundwater and the displacement of wildlife.

Using correlation analysis, she went on to examine the relationship between the two variables of hydrocarbon fuels and water pollution in Energium. She would find a positive and statistically significant correlation between those two variables. She uncovered that burning fossil fuels contributed to lead contamination, to some degree, in some neighborhoods of the kingdom of Energium where fracking and oil cracking were heavy practices.

Encouraged by the findings, and with her face lit up, Energia decided to take a step further in her analysis. Using regression analysis, she would examine how to predict the scores of air quality from knowing the scores of solar energy use by relying on the same data from Sun City.

Hooked on SPSS, she clicked on the Analyze icon from the menu bar. From the drop-down list, she clicked on Regression, and she selected Linear Regression.

In the Linear Regression window, she clicked on the air quality variable to transfer it to the Dependent field. She would also transfer the variable of solar energy use to the Independent field. In the drop-down list of the Method field, she selected Enter as the method of entry of solar energy use, the predicting variable.

She also clicked on the Statistics icon. The statistics window opened, and she checked the fields for Estimates, Confidence intervals, and Model fit to obtain the statistics required for that analysis.

Next, she clicked the Continue icon to return to the Linear Regression window. There, she would click on the Options icon to make sure that the fields of Use probability of F and Include constant in equation were checked.

Here again, she would click the Continue icon to go back to Linear Regression window.

In the Linear Regression window, she clicked the OK icon to complete the analysis. The SPSS output displayed results showing the R-square (the strength of the regression or prediction) as 0.760. It meant that the variable of solar energy use had explained 76% of the change observed in the variable of air quality in Sun City over the past ten years.

Energia could not believe her eyes, she thought those results were very significant; they confirmed that the clean air citizens of Sun City enjoyed much depended on solar energy use to a large extent, for up to seventy-six percent.

The results contrasted and challenged policymaking in the kingdom of Energium which relied heavily on hydrocarbon fuel for energy production and consumption.

Driven by the results, and aware of the potentials of Energium in offshore and onshore wind, Energia used regression analysis to predict that the exploitation of its wind potentials alone would allow the kingdom to achieve energy security maximally. If Energium were to depend mainly on its abundant wind resource, the kingdom would cut down its greenhouse gas emissions drastically, and this would result into improving the air quality for its citizens. Per the results of Energia’s regression analysis, wind energy stood as a healthier alternative to oil, gas, and coal for power generation in the kingdom of Energium.

She was still contemplating the results of the regression analysis when Chercheur stopped by the computer lab on his way to the cafeteria. He wanted to check on Energia, out of curiosity, after so many hours; it was past lunchtime.

“You may want to take a break for lunch,” he advised unexpectedly.

“I do not think so, I still have a long way to go, Sir!” she answered, surprised by his sudden appearance in the lab.

Obviously, Energia was fond of her data analyses.

“But I would appreciate if you could bring me a bottle of water with some light snack,” she went on gently.

“I would love to,” he answered politely, “unfortunately, the policies of this lab do not allow it.”

“What do you mean?” she asked, staring at her advisor in amazement.

“Look!” he exclaimed, pointing to a notice by the entrance of the lab. It read: “No food or drink is allowed in this lab!”

“Why such a harsh policy?” she asked in a plaintive tone and strain.

With a reassuring voice tone, Chercheur said, “I think the intention behind the policy is positive; it is to prevent distractions, so that the users of this lab concentrate on their research; food and drink could distract at times.”

“What if I am very hungry while using the lab?” she continued.

“In that case, your body whispers you need a lunch break,” he said softly. “When your body speaks, it is in your interest to listen; your health, productivity, effectiveness, and success all depend on it. You deserve a break after working hard all morning; you need to stop for a moment to go eat.”

Chercheur’s words inspired Energia, she understood she needed a break, and she said to him, “I agree with you.”

Any observer could tell by Energia’s body language that it was a difficult decision for her; the bystander could easily observe that she took Chercheur’s advice seriously but not literally.

Yet she decided to take a break. She saved her document on a USB flash drive and shut down her computer, before exiting the lab reluctantly.

Dragging her feet, she followed her mentor to the cafeteria for what she had anticipated to be a quick lunch break.

Inside the cafeteria she quickly grabbed a plate and put on it some salad, mashed potatoes, and a salmon steak.

As she looked around to find a table to sit, she noticed her high school friend, named Oxwe.

He had volunteered to partake in Energia’s experiments in Choice City. He was excited to meet Energia in the cafeteria.

“What is new about your research? Fill me in,” he said delightfully.

“I am currently analyzing the data,” she replied concisely.

“Good for you!” he continued ardently.

“Let me know if I could help, I still possess a residue of skills in quantitative analysis,” he added humbly.

Oxwe was the Chief Information Officer (CIO) for a major and famous multinational corporation, headquartered in the kingdom of Energium. In his capacity as a distinguished patron of data processing for the organization, he had a wealth of knowledge and experience in data analysis, and he was very proficient in running and interpreting quantitative data. He was a young, self-motivated, and successful chief information officer.

Energia’s initial answer to his request was, “That is awesome to hear and know.”

“Please tell me about it!” she exclaimed with admiration on a second thought.

“For your type of research,” Oxwe said eagerly, “correlation analysis, regression analysis, and factor analysis would all be helpful tests to run your data.”

“You read my mind, that is exactly what I am doing,” she answered, mesmerized.

Before he could elaborate further, she interrupted, “I need to go now; it is nice to see you again, I will catch up with you soon, my friend.”

While she was still speaking, Oxwe rushed to go get some fruit salad and cake.

He presented them to his friend and said, “Remember to enjoy some fruit and dessert before you head back, my experience in data analysis has taught me some good lessons. Your brain will need these ingredients for a balanced sugar level; data analysis can be very consuming physically, mentally, and emotionally.”

Oxwe’s words made Energia reminisce Chercheur’s most recent advice.

“Thank you, my good friend, for your kindness and wise suggestions.” she said calmly.

She took the plate of fruit salad and cake from Oxwe and sat down for a few more minutes to enjoy her food.

In the end, she felt really good about the decision to take a break for lunch, and she was very grateful to Chercheur, as she thought to herself: “It certainly helps to trust your advisor; soon or later there are benefits to listening to and following a good mentor’s golden advice.”

Her lunch boosted her energy genuinely. The good hormones from the positive energy contaminated her often labile mood, and she suddenly felt relaxed and happy.

She felt refreshed and rejuvenated when she returned to the lab.

Back in the lab, she was eager to continue her data analysis with time-series analysis and factor analysis to run her raw data.

By relying on the data she had recorded in her journal, Energia would use time-series analysis to find that her ancestors’ generation had enjoyed a cleaner air than her generation, because they had heavily depended on solar energy, wind energy, and water energy. The results of the analysis allowed her to forecast the future in terms of the long-term trends in renewable energy use. These results confirmed the previous results of her regression analysis that the rates of air pollution would decrease significantly in Energium if the kingdom shifted from hydrocarbon fuels to wind and solar energy production.

Soon after, Energia decided to utilize factor analysis to run the set of data she had collected previously in Water City, to identify the main factors contributing to global warming and climate change.

She had recorded as many as fifty variables to explain global warming and climate change. The long list of variables included gas emissions from cars, emissions from planes, industrial emissions, plastic removals, poor garbage dumping, wildfires, pesticides, agriculture, chemical wastes from pharmaceutical laboratories and hospitals, chemical leakages or spills, fracking and oil extraction and production processes, gas stations, burning gasoline, oil spills, coal plants, nuclear plants, wars, atomic explosions, deforestation, and non-human activities, to name a few.

She thought the list was way too long, and some of the variables seemed redundant. Rather than going by that long list of variables, she chose to follow a different line of reasoning, and she hypothesized that a small number of factors would explain global warming and climate change.

While reading through the long list of variables, using techniques of factor analysis, she would identify five common denominators or factors. The five factors were: (1) greenhouse gases, (2) other gases and hazardous substances such as mercury and arsenic, (3) deforestation, (4) other human activities, and (5) non-human activities such as naturally-occurring radioactivity. She reduced the long list by regrouping its variables in five groups of ten variables (around the five factors identified).

She would employ factor analysis to examine whether and how the fifty variables reflected the five factors she had identified. Step by step, she would apply the Windows method of SPSS for data entry and analysis.

From the menu bar, she clicked the Analyze icon. From the drop-down list, she selected the Data Reduction icon and chose the Factor Analysis test. The Factor Analysis window opened, and she highlighted the fifty variables and transferred them to the Variables field.

She clicked the Descriptives’ icon to acquire a correlation matrix with enough correlations to justify the use of factor analysis as a test in that situation.

In the open Descriptives’ window, she ensured the field of Initial solution was checked by default, and she also checked the fields of Coefficients and KMO and Bartlett’s test of sphericity.

She would next click the Continue icon to return to the Factor Analysis window.

In the open Factor Analysis window, she clicked the Extraction icon. When the Extraction window opened, she selected the Principal components from the drop-down list as the extraction method. She also ensured the default value was 1 in the field of Eigenvalues over. Next, she checked the Scree plot field to get a Scree plot of the number of factors extracted, and she clicked the Continue icon to go back to the Factor Analysis window.

In the open Factor Analysis window, she would click the Rotation icon. In the open Rotation window, she checked the Varimax field to request a Varimax rotation for the extracted factors. Then, she clicked the Continue icon to return to the Factor Analysis window.

In the open Factor Analysis window, she clicked the Options’ icon. In the open Options’ window, she selected the field of Exclude cases pairwise to exclude any variable with a missing value from the factor analysis. She also checked the Sorted by size field to rank the factor loadings from the largest to the smallest in the SPSS output. Additionally, she checked the field of Suppress absolute values less than and typed the value of 0.33 in the empty field to request the suppression of factor loadings with smaller values than 0.33 in the SPSS output. Her goal here was to retain only factor loadings that accounted for at least ten percent of the change in their factor for significance.

Next, she would click the Continue icon to go back to the Factor Analysis window. In the open Factor Analysis window, she clicked the OK icon to finalize the analysis.

The SPSS output presented the results of the factor analysis in the forms of a Correlation Matrix table, a KMO and Bartlett’s test table, a table of Communalities, a table of the Total Variance Explained, a Component Matrix table, a Rotated Component Matrix table, and a table of Component Transformation Matrix.

The Correlation Matrix table showed high correlations among the fifty variables. The inter-correlations between gas emissions from cars, emissions from planes, industrial emissions, wildfires, chemical leaks or spills, gas stations, burning gasoline, oil spills, and mining were higher than 0.33 in magnitude. It meant that the choice of factor analysis as a test was a good one for that case. The table of the Bartlett’s test of sphericity also confirmed factor analysis as the correct choice of test.

The Communalities table reflected the proportion of change the common factors accounted for in each variable, by using the principal components analysis as the method for extracting the factors.

The table of the Total Variance Explained showed five common factors, with their respective eigenvalues. It also displayed the percentage of total variance and the cumulative percentage of total variance each of the factors accounted for. The analysis retained the five factors (greenhouse gases, other gases and hazardous substances, deforestation, other human activities, and non-human activities) that displayed eigenvalues of 1 or higher. The factor of the greenhouse gases accounted for thirty-five percent of the global warming and climate change; the factor of the other gases explained global warming and climate change by nineteen percent; the factor of deforestation accounted for global warming and climate change by eighteen percent; the factor of other human activities explained global warming and climate change by fifteen percent; the factor of non-human activities accounted for global warming and climate change by ten percent. It meant that the five factors explained ninety-seven percent of the total variance, which is significant and impressive.

The table of the Component Matrix presented the five factors extracted with specific correlation coefficients or factor loadings reflecting the correlations between the five factors and the fifty variables respectively. The factors the Component Matrix presented were not rotated, meaning their extraction was performed based on the percentage of the overall change (or total variance) explained. Energia noticed that the absence of rotation coincided with significant cross-loadings, showing some variables loaded highly on multiple factors, which would make it difficult to interpret the factors for meaning.

The table of the Rotated Component Matrix displayed the five factors extracted with Varimax rotation. There, the factor loadings indicated that forty of the fifty variables loaded highly on the five factors of (greenhouse gases, other gases and hazardous substances, deforestation, other human activities, and non-human activities). Ten variables cross-loaded significantly across multiple factors.

To make the interpretation of the results of her factor analysis easier, Energia would delete the ten cross-loaded variables. In the end, the results confirmed her initial hypothesis that a small number of factors explained global warming and climate change. Making due allowances, she utilized factor analysis successfully to reduce a long list of fifty variables to identify five factors contributing to global warming and climate change. Per the results of her analysis, the five factors encompassed greenhouse gases, other gases and hazardous substances, deforestation, other human activities, and non-human activities.

Thrilled with the results and the line of reasoning, Energia was amazed by the technical ability of factor analysis to simplify or reduce data.

Mesmerized by these meaningful results, she kept staring so intently into the SPSS output on her computer, when Chercheur appeared suddenly from nowhere and asked, “How far are you?”

Her answer was spontaneous, “I just finished,” she said while jumping for joy, “and the results are quite significant.”

“Come and see,” she added, inviting her advisor to draw near the computer screen.

He checked the results carefully and exclaimed enthusiastically, “Scientific methods are interestingly beautiful and relevant to knowledge and progress!”

“What do you think of such magnificent results?” he went on softly.

“I still can’t believe my eyes, but I did it,” she said promptly.

“Congratulations!” he replied.

“What do the results tell you? What do they imply overall?” he added curiously.

“I think the message is clear, if we promote renewable energy use, we will be able to mitigate or curb greenhouse gas emissions and boost our chances to be healthy and save our planet,” she answered smoothly.

“Well said and well done,” Chercheur responded before exiting the room to head home for dinner with his family.

Energia was bidding him goodbye when the alarm on his phone sounded suddenly and quite loudly.

“Is everything fine?” she asked, worried.

“Do not worry, this is to remind me it is time to go home for dinner. If you want to know more about it, go check with my wife and daughter,” he said, trying to rush out.

Energia giggled and said while nodding, “I understand, Ms. Luz has the appropriate toolbox for good and helpful ground rules. You better hurry up!”

Not long after Chercheur’s departure, she rearranged her papers, saved her files, and shut down her computer before leaving. She was apparently happy and satisfied with what she had achieved in terms of her data analyses and the outstanding results.