Foundations of statistics

Introduction to statistics Population All of the 'things' we are interested in such as people, households, trains Population parameter (Describes entire population) is the measurement value (size) of the population For example: - population of Venezuela - population of Australian households N represents the population size For example: population of Australian households N= 8,286,084 Sample A subset of the population Must be representative of the population if we want to infer (generalise) our finding to the population of 'things' (population of interest). The sample must be random as a random sample allows for an unbiased representation of the population to take place. Sample statistic Describe the sample Number of scores in the sample For example: - Measurement value (size) of the sample - Sample of 2500 Australian households The sample statistic can provide us with an estimation of the population parameter n represents sample size For example: - Sample from the population of Australian households n=10,000 Variable An overarching label of the element/ feature we want to measure Variables must always vary For example: - Gender - male, female, non-binary etc. - Age - in years, age group Descriptive statistics Organises raw information into 'manageable' information Numerical and/ or graphical form
For example: - Household size - The most popular type of vehicle Inferential statistics Techniques that allow us to use sample statistics to generalise (make conclusions about) to the population For example: - Population tests - Confidence intervals Descriptive research When we want to find out about something For example: - Age of students studying this unit - Percentage of students studying full time Correlational research When we want to explore a relationship between two or more variables For example: - Is there a relationship between height and weight? - Is there a relationship between study hours and employment hours? Comparative research When we want to compare two or more groups on the same measure For example: - Do part-time students spend more hours working than full-time students? - Comparing the hours of work for the different groups Independent variable Predictor variable Explanatory Dependant variable Outcome variable Response Experimental research method Allows a cause-and-effect explanation One variable (IV) is causing a change in the other (DV) For example: - If the IV was the amount of medication taken in mgs and the DV was the pain reduction in minutes The IV is manipulated (while other variables are held constant) and the DV is observed/measured Non-experimental method Does not allow a cause-and-effect explanation Both variables are observed/measured
- Correlational research Information is recorded only- no manipulation (Eg. Is there a relationship between height (IV) and weight (DV)? - Non-equivalent groups The IV is not manipulated (eg. Comparing heights (DV) of males and females - gender (IV) - Pre-post studies The DV is measured twice (at different times) (eg. Comparing scores (DV) before and after completing competency training For example: - If the IV was gender (male, female, other) and the DV was income ($ per week) Discrete variable Separate categories (no intermediate values between) Countable number of values For example: - Number of children - Blood type Continuous variable Infinite number of values that fall between two observed/ measurable values Divisible into an infinite number of fractional parts For example: - Distance travelled - Speed of aircraft Nominal scale of measurement Used to label (name) the group For example: - 1= male, 2= female, 3= other Ordinal scale of measurement Used to label and order For example: - 1 st , 2 nd , 3 rd , 4 th in race Interval scale of measurement Numbers are used to label and order and the intervals between the numbers are equal For example: - Temperate in degrees Celsius Ratio scale of measurement Numbers are used to label and order and the intervals between the numbers are equal and zero means a complete absence of something For example:
- Number of correct answers in a test Raw scores Original, unchanged scores obtained from the data collection Typically represented by X If there is more than one variable, Y measures the other Summing scores When adding scores, the Greek letter sigma ( Σ) is used to designate summation ΣX means add all the X values For example: - Suppose we have scores of 2, 1 , 9, 4 - ΣX: 2+1+9+$ - ΣX=16 Order of operations 1. Any calculation is brackets is done first 2. Squaring is done second 3. Multiplication or division is done next - if more than one they are done left to right 4. Summation ( Σ) is done fourth 5. Any other addition/subtraction is the final step Statistical notation Population = N Sample = n Raw scores = X Summing scores = \ Bias Any effect that makes our results non- representative If there is any bias, we cannot conclude anything meaningful about the population of interest from looking at the sample Selection bias Occurs when the sample selection does not truly represent the population of interest Random sampling of the population of interest can help eliminate sampling bias Information bias Refers to how the information was collected/measured Need to be careful of question wording etc. - Emotive language - Leading question Eg. Most people don't want to work more than 30. Hours a week. What
do you have to say about it? - Double barrelled question Eg. When was the last time you purchased a house or a car? - Unclear question - Source of the study/who conducted the study Eg. A shampoo company telling us their product is the best Identifying possible sources of bias in a study - How were individuals/objects in the study selected? - What measurements were made? - What questions were asked? - Who conducted/ sponsored the study? Avoiding bias - Sample should be randomly selected where possible - Non-leading language in question wording - Source should be reliable/ credible
Research approaches and data collection methods calculate the ∑X and X 2 and the proportion and percentage of the group associated with each score, from a frequency table determine percentiles and percentile ranks for values corresponding to real limits in a frequency distribution table construct the three types of frequency distribution graphs— histograms, polygons, and bar graphs—and know when each type is used and explain what they are describing calculate the population three measures of central tendency— mean, median, and mode mean report on the three measures of central tendency including when they should be used and identify the advantages and disadvantages of each and explain what they are telling you about the data.
Uploaded by DeanIron11458 on