Basic Statistics - Chapter 1 - Types of Samples and Types of Data Apopulationis the entire collection of individuals about which information is sought.Parametersare numbers that describe the population. Asampleis a subset of a population, containing the individuals that are actually observed.Statisticsare numbers that describe a sample. Asimple random sampleis chosen by a method in which each collection of items is equally likely. Forcluster sampling, the population is divided into groups, and a random sample of groups is drawn. Forstratified sampling, the population is divided into groups, and a random sample of individuals is drawn from each group. Asample of convenienceis a sample that is not drawn by a well-defined random method. Qualitativedata refers to categories or features (labels).Quantitativedata refers to counts or measures (numbers). Nominaldata refers to items that have NO natural order.Ordinaldata refers to items that can be ordered. Continuousdata can take on any value in an interval (measures).Discretedata can be listed (counts). Basic Statistics - Chapter 3 - Numerical Summaries of Data STAT EDITlets you enter data, andSTAT CALClets you calculate two screens of 1-Var Stats, whereSample Mean.The symbolµ (mu)represents thePopulation Mean in many formulas. S x = Sample Std. Deviation σ x = Population Std. Deviation Variance= (Std.Deviation)2 Coefficient of Variation = σ/µ Five Number Summary min, Q1, median, Q3 , max (and the median is same as Q2 ) Empirical Rule:For data setsthat areapproximately symmetric: 68%of the data values are between µ−σandµ+σ, 95%of the data values are between µ− 2σandµ+ 2σ, andalmost allof the data values are betweenµ− 3σandµ+ 3σ. Chebyshev's Inequality:For any data set (even very skewed, with one tail),75%or more of theof the data values are betweenµ− 2σandµ+ 2σ,and89%or moreof the data values are betweenµ− 3σandµ+ 3σ. z = (x-µ) / σ =how many standard deviations that value is from its population mean.x= µ + (z*σ)= value based on a given z-score Inner Quartile Range (IQR) = Q3- Q 1 Lower Outlier Boundary=Q1 - 1.5 * IQR Upper Outlier Boundary= Q3+ 1.5 * IQR
Basic Statistics - Chapter 2 - Graphical Summaries of Data Here are some things to remember about HISTOGRAMS: Approximately symmetricmeans that the right side and the left side are almost identical Skewed to the right, is also called positively skewed, which means the long tail is on the right side Skewed to the left, is also called negatively skewed, which means the long tail is on the left side Frequencyhistograms are based on counts andRelative Frequencyhistograms are based on percent Finally,classesmust not overlap, must be of equal width, and there should be NO missing classes Here are some things to remember about STEM-&-LEAF PLOTS: TheSTEMis the first part of the number, and NO values are skipped, when setting up your stems TheLEAVESare the rightmost part of the number, which is only the last digit Finally, the leaves are ordered from smallest to biggest values, as you move away from the stems Here are some things to remember about FREQUENCIES and RELATIVE FREQUENCIES: Thefrequencyof a category is the number of times it occurs in the data set Afrequency distributionis a table that presents the frequency for each category Therelative frequencyof a category is the frequency of the category divided by the sum of all the frequencies. (decimal or percent) -Pie Chartsare based on relative frequency Basic Statistics - Chapter 4 - Summarizing Bivariate Data Scatterplots Press2nd, Y=(STAT PLOT) and press1for first plot and select Onand scatterplot icon (first icon). PressZOOMand9: ZoomStat. PressSTATandCALC. Select4: LinReg (ax+b)-or-8: LinReg (a+bx)and pressENTER. Linear Equations Two uses for the equation for the regression line include: Determine how much ydiffers, when given the difference in two values of x.The slope = b in the example screen above, and ∆x is the difference in x, so the difference in y is∆y = b * ∆x . Predict the value of y, when given a value for x.Replace x with the given value and solve for y. Use either 4:LinReg for y = ax + b or 8:LinReg for y = a + bx, onyour calculator. Two values that are used with the regression line include: r2=Coefficient of Determination, which is the % of variation explained by the regression line r = Correlation Coefficient, which describes the strength of the linear relationship (-1 ≤ r ≥ +1), and the direction of the line (negative is downward and positive is upward). Remember, correlation DOES NOT equal causation,as in example of ice cream sales and shark attacks! Another Reminder: If r and r2DO NOT appear when you use the LinReg app, then go to 2ndand 0 to get the Catalog list.Scroll down to DiagnosticOn, hit enter twice, and Done should appear on your screen.