part0016

Case study–Python for non-graphic EDA

A team member wants to determine whether the viscosity of the resin batches produced by Reactor A differs from that by Reactor B. She collects viscosity data from 39 batches produced by each of the reactors. She first performs non-visual EDA to view the main characteristics of the dataset.

In [1]:

import numpy as np

import pandas as pd

from scipy import stats

In [2]:

df = pd.read_excel('case moods medain.xlsx')

In [3]:

df.head()

Out[3]:

	Reactor A	Reactor B
0	1100	660
1	880	1430
2	1100	1045
3	1210	935
4	770	1210

In [4] :

df.info()

RangeIndex: 39 entries, 0 to 38

Data columns (total 2 columns):

Reactor A 39 non-null int64

Reactor B 39 non-null int64

dtypes: int64(2)

memory usage: 752.0 bytes

In [4]:

df.describe()

Out[4]:

	Reactor A	Reactor B
count	39.000000	39.000000
mean	985.769231	965.384615
std	196.045170	215.640004
min	713.000000	603.000000
25%	824.000000	878.000000
50%	935.000000	935.000000
75%	1102.000000	1047.000000
max	1322.000000	1432.000000

The dataset has two columns, and each of them has 39 data points. The data in both columns are continuous, and all data points are integers. The mean and median values from the two reactors are quite different from each other.