1
COURSE PROJECT PART A
Introduction
Exploratory Data Analysis known as EDA is a very crucial step with the data analysis
process. This step can help data scientists gain more of an insight and understanding of the
characteristics of the data before you can start applying formal statistical techniques or even
building models. The purpose of Exploratory Data Analysis would be used to analyze and figure
out data sets to really understand the main pattern, relationship, and characteristics between
many variables.
Interpretation of different variables
We are going to discuss the data set for the first set of variables and the interpretation along
with a graph of their representation which includes Sales, calls, Time, and Years Columns, so
first up let's look at the Sales column which is (Y)
1.
Sales Column (Y)
Mean: 43.63
Median: 42.5
Mode: 41
Standard Deviation: 7.43952725
2.
Calls Column (X1)
Mean: 160.33
Median: 160.5

2

Mode: 148
Standard Deviation: 19.26427
3.
Time Column (X2)
Mean: 14.947
Median: 14.75
Mode: 15.6
Standard Deviation: 2.317368
4.
Year Column (X3)
Mean: 2.03
Median: 2
Mode: 2
Standard Deviation: 1.2428
In the analysis above shows the measures of central tendency this would mention the
mean and median. The Mean would represent the average value of the data, Meanwhile the
Median is the value that represent the middle when the data is sorted in ascending order. The
Mode is the value that would appear most frequently in the data sets. When looking at the
data distribution the median is very close to the mean which would indicate that the set of
data may follow a normal distribution. The relationship between the mean and the median
would not determine the data distribution. So in order to perform a proper Exploratory Data

3

Analysis would have to Calculate the measures of central tendency which mean you would
have to find the mean, median, and mode to understand the typical or the central value of the
data set. Next you would have to assess the data distribution that when the plot of a
histogram chart is displayed to check if the data follows normal distribution or any other
distribution. After that you want to analyze the standard deviation you must examine the
wide spread or variability of the data using the Standard Deviation. Next you have to
visualize the data by using the appropriate data visualization techniques that gain insight into
the patterns, characteristics, and outliners. You have to explore the relationship between
variables now if its relevant, you must investigate correlations or its dependencies that is
between different variables in the dataset.
Graphical Analysis
Pivot Table- can allow people to summarize large amount of data very quickly. These
tables are used to analyze numerical information in detail, and can answer unanticipated
questions that may arise about the data.
Row Labels
Sum of Calls
(X1)
Sum of Time
(X2)
Sum of Years
(X3)
30
175
14.3
3
33
164
15.9
3
34
596
63.4
12
35
337
37.5
4
36
697
62.3
8
37
842
68.1
9
38
716
68.7
14
39
730
78.8
12
40
1246
119
18
41
1311
140.3
13
42
989
87.3
13
43
941
92.6
7
44
1057
112.1
15

