Case study–Python for non-graphic EDA
A team member wants to determine whether the viscosity of the resin batches produced by Reactor A differs from that by Reactor B. She collects viscosity data from 39 batches produced by each of the reactors. She first performs non-visual EDA to view the main characteristics of the dataset.
In [1]:
import numpy as np
import pandas as pd
from scipy import stats
In [2]:
df = pd.read_excel('case moods medain.xlsx')
In [3]:
df.head()
Out[3]:
Reactor A
Reactor B
0
1100
660
1
880
1430
2
1100
1045
3
1210
935
4
770
1210
In [4] :
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39 entries, 0 to 38
Data columns (total 2 columns):
Reactor A    39 non-null int64
Reactor B    39 non-null int64
dtypes: int64(2)
memory usage: 752.0 bytes
In [4]:
df.describe()
Out[4]:
Reactor A
Reactor B
count
39.000000
39.000000
mean
985.769231
965.384615
std
196.045170
215.640004
min
713.000000
603.000000
25%
824.000000
878.000000
50%
935.000000
935.000000
75%
1102.000000
1047.000000
max
1322.000000
1432.000000
The dataset has two columns, and each of them has 39 data points. The data in both columns are continuous, and all data points are integers. The mean and median values from the two reactors are quite different from each other.