Organizing and Displaying Data: -How to make sense of our data? ●When researchers collect data, he/she has large quantities. Therefore you must simplify it. One method is: -Frequency Distribution ●Systematic method of ordering, organizing and displaying data from a set Purpose: 1.Simplifies calculations for other statistics 2.Transition step in constructing a frequency histogram Types of Frequency Distributions: -Ungrouped ●Each value of x in the distribution represents one value in the data (used if you have a category for each possible value, i.e nominal scores OR if you have a small dataset) -Grouped(class intervals): ●Several values in the data are classified into one interval (i.e. used if you have a large dataset, which is more typical Steps in Constructing a Frequency Distribution: -Seven steps in constructing a frequency distribution when the data are interval/ratio. 1.Count the number of scores 2.Identify highest and lowest score ●Organize the scores 3.Identify smallest unit of measurement ●What is the smallest division (possible) that was used on the measuring scale when the scores were collected? (i.e. by how much can your score increase from one participant to another?) 4.Decide on appropriate number of class intervals ●Use the following rule (Modified Sturge's Rule). This is only a guide ●But if you have only a small number of unique values in your dataset, use this unique (ungrouped) values to determine the number of class intervals (so skip Step 4 & 5). 5.Decide on the score range of each class interval (i) ●Use the following formula where: -i = width of class interval -i = largest score - smallest score Number of class intervals 6.Round to a nice number ●As a general rule "i" equal to 1, 2, 3, 4, 5, 10, or 20 score divisions will be suitable. (Pick a "nice" number!). ●For example, if instead of 5, we had 6 intervals (so 6 in the denominator), i = 0.833. Better i = 1 7.List class intervals of scores in order ●Usually the largest interval is put at the top
8.Final step: Compute the frequency (f) of each class interval. ●Cross off the score as you count them and use check marks to keep track -For larger datasets, e.g. > 25 data points with many unique values. -Class intervals need to be GROUPED (i.e. not be a single number but a range of numbers) -Not enough to use the smallest possible unit of measure (step 3) to determine the class interval. -Need a range of these scores for each interval. -This class interval is sometimes called a bucket or a Bin (Excel calls it Bin) -For UNGROUPED data - the i or class interval is a single value (or category), specifically the range of all possible unique values in that dataset -For GROUPED data - the i or class interval (or bin) is a range of values; calculated in step 5 ●To determine bins, add i to the start of each bin, starting with ≤ smallest value in the dataset ●These bins should have (1) same width/range (2) no overlap across bins, (3) no gaps and (4) cover all the data in that set. Graphs: -A pictorial representation of a frequency distribution or data table -Helpful in understanding concepts e.g. frequencies, mean, standard deviation Bar Graphs: -The frequency or amount of each type of observation is represented by a vertical bar -Separated by some space -Useful to depict frequencies of nominal variables -Used also to depict other group statistics, e.g. mean of the DV measures for each group Histogram: -A histogram uses vertical bars to depict frequencies of an interval/ratio variable -It differs from a bar graph in not having spaces between the bars Key characteristics of Histogram (and other graphs) -The distribution of your sample data (for interval and ratio units of measure) ●Peaks ●Spread, ●Symmetry Peak/s -Identify the peaks ●The tallest cluster/s of bars ●Represent the most common values/bulk of data Spread -How much the data varies?
Kurtosis -The relative peaked-ness or flatness of the distribution. -It reflects whether the scores are more or less evenly distributed throughout the measurement range Symmetry Normal distribution -A distribution is termed Symmetrical when the data frequencies decrease at equal rates above and below a central point. -Visually Bisected (One half is mirror image of the other Non Symmetrical distributions -Skewed ●bunching of the observations at one or the other end of the measurement range Outliers: -Data values that are far away from other data values. -Strongly affect your results. Other shapes and distributions -Histograms, bar graphs and polygons (or line graphs) can take on these and more shapes which we will discuss in Topic 3-4 -Not only for frequencies; but also true for other measures on the y-axis