1 COURSE PROJECT PART A Introduction Exploratory Data Analysis known as EDA is a very crucial step with the data analysis process. This step can help data scientists gain more of an insight and understanding of the characteristics of the data before you can start applying formal statistical techniques or even building models. The purpose of Exploratory Data Analysis would be used to analyze and figure out data sets to really understand the main pattern, relationship, and characteristics between many variables. Interpretation of different variables We are going to discuss the data set for the first set of variables and the interpretation along with a graph of their representation which includes Sales, calls, Time, and Years Columns, so first up let's look at the Sales column which is (Y) 1.Sales Column (Y) Mean: 43.63 Median: 42.5 Mode: 41 Standard Deviation: 7.43952725 2.Calls Column (X1) Mean: 160.33 Median: 160.5
2 COURSE PROJECT PART A Mode: 148 Standard Deviation: 19.26427 3.Time Column (X2) Mean: 14.947 Median: 14.75 Mode: 15.6 Standard Deviation: 2.317368 4.Year Column (X3) Mean: 2.03 Median: 2 Mode: 2 Standard Deviation: 1.2428 In the analysis above shows the measures of central tendency this would mention the mean and median. The Mean would represent the average value of the data, Meanwhile the Median is the value that represent the middle when the data is sorted in ascending order. The Mode is the value that would appear most frequently in the data sets. When looking at the data distribution the median is very close to the mean which would indicate that the set of data may follow a normal distribution. The relationship between the mean and the median would not determine the data distribution. So in order to perform a proper Exploratory Data
3 COURSE PROJECT PART A Analysis would have to Calculate the measures of central tendency which mean you would have to find the mean, median, and mode to understand the typical or the central value of the data set. Next you would have to assess the data distribution that when the plot of a histogram chart is displayed to check if the data follows normal distribution or any other distribution. After that you want to analyze the standard deviation you must examine the wide spread or variability of the data using the Standard Deviation. Next you have to visualize the data by using the appropriate data visualization techniques that gain insight into the patterns, characteristics, and outliners. You have to explore the relationship between variables now if its relevant, you must investigate correlations or its dependencies that is between different variables in the dataset. Graphical Analysis Pivot Table- can allow people to summarize large amount of data very quickly. These tables are used to analyze numerical information in detail, and can answer unanticipated questions that may arise about the data. Row LabelsSum of Calls(X1)Sum of Time(X2)Sum of Years (X3) 3017514.33 3316415.93 3459663.412 3533737.54 3669762.38 3784268.19 3871668.714 3973078.812 40124611918 411311140.313 4298987.313 4394192.67 441057112.115
Why is this page out of focus?
Because this is a Premium document. Subscribe to unlock this document and more.