School

Fitchburg State University **We aren't endorsed by this school

Course

MGMT MISC

Subject

Statistics

Date

Aug 26, 2023

Pages

3

Uploaded by LieutenantDove2617 on coursehero.com

Chapter 13: Big Data Basics: Describing Samples and Populations (p. 357-383)
Introduction
o
Loss-leader: losing money on a particular item to hopefully make it up on
additional purchases
Descriptive Statistics and Basic Inferences
o
Metrics: a summary number that allows analysts to compare characteristics of a
sample with some population benchmark, characteristics of another sample, or
some other critical value
o
Inferential statistics: a summary representation of data from a sample that allows
us to understand (i.e. infer from sample to population) an entire population
o
Two applications of statistics exist:
To describe characteristics of the population sample (descriptive
statistics)
To generalize from a sample to a population (inferential statistics)
o
Sample statistics: summary measures about variables computed using only data
taken from a sample
o
Population parameters: summary characteristics of information describing the
properties of a population.
o
Frequency distribution: a table or chart summarizing the number of times a
particular value of a variable occurs
o
Percentage distribution: a frequency distribution organized into a table (or graph)
that summarizes percentage values associated with particular values of a
variable
o
Probability: the long-run relative frequency with which an event will occur
o
Proportion: the percentage of elements that meet some criterion for membership
in a category
o
Top-box score: the proportion of respondents who choose the most positive
choice in a multiple-choice question usually dealing with customer opinion
o
Bottom-box score: the proportion of respondents who choose the least favorable
response to some question about customer opinion
o
Mean: a basic statistic that quantifies central tendency computed as the
arithmetic average
Although widely relied upon, the mean can be misleading particularly
when extreme values or outliers are present
o
Median: a measure of central tendency that is the midpoint; the value below
which half the values in the distribution fall.
o
Mode: a measure of central tendency; the value that occurs most often
o
The simplest representation of dispersion is range, or the distance between the
smallest and the largest values of a frequency distribution
o
Individual deviation scores: a method of calculating how far any observation is
from the mean
o
Standard deviation: the most popular indicator of spread or dispersion
o
Variance: a metric of variability or dispersion. Its square root is the standard
deviation
o
Standard deviation: a quantitative index of a distribution's spread, or variability;
the square root of the variance for a distribution
Distinguish Between Population, Sample and Sample Distribution
o
Normal distribution: a symmetrical, mean-centered, bell-shaped distribution that
describes the expected probability distribution of observations

o
Standardized normal distribution: a purely theoretical probability distribution that
reflects a specific normal curve for the standardized value, Z.
The most useful distribution in inferential statistics
o
Population distribution: a frequency distribution of the elements of a population
o
Sample distribution: a frequency distribution of a sample
o
Sampling distribution: a theoretical probability distribution of sample means for all
possible samples of a certain size drawn from a particular population
o
Standard error of the mean: the standard deviation of the sampling distribution
Central-Limit Theorem
o
Central-limit theorem: the theory that, as the sample size increases, the
distribution of sample means of size n, randomly selected, approaches a normal
distribution
o
The distribution of averages quickly approaches normal as sample size increases
o
The theoretical knowledge about sampling distributions helps us solve two basic
and very practical marketing analytics problems:
Estimating population parameters
Determining sample size
Estimation of Parameters and Confidence Intervals
o
Point estimate: an estimate of the population by mean in the form of a single
value, usually the sample mean
o
Confidence interval estimate: a specified range of numbers within which a
population mean is expected to lie; an estimate of the population mean based on
the knowledge that it will be equal to the sample mean plus or minus a small
sampling error
o
Confidence level: a percentage or decimal value that tells how confident a
researcher can be about being correct; it states the long-run percentage of
confidence intervals that will include the true population mean
Sample Size
o
The larger the sample the more accurate the research
o
Random sampling error varies with sample size -- increasing the sample size
decreases the width of confidence interval at a given confidence level
o
Three factors are required to specify sample size:
The variance, or heterogeneity, of the population
The magnitude of acceptable error
The confidence level
o
As heterogeneity increases so must sample size
o
Magnitude of error: confidence level
o
Sequential sampling: the application of results from one or more pilot studies
prior to deciding on the sample size for a definitive study
o
A general rule of thumb for estimating the value of the standard deviation is to
expect it to be one sixth of the range
o
In most cases, the size of the population does not have a major effect on the
sample size
The variance of the population has a greater effect on sample size
requirements than does the population size
o
Sample size for a proportion requires the researcher to make a judgement about
confidence level and the maximum allowance for random sampling error
Assess the Potential for Nonresponse Bias
o
Researchers provide an assessment of generalizability in their reports

o
Nonresponse bias, in particular the bias caused when sample units provide no
response, can significantly damage generalizability
o
Non-responders must be considered routinely as a threat to external validity
because there could be a systematic reason that members selected for inclusion
from a sampling frame did not respond
o
Auxiliary variables: those that the researcher should build into a survey that allow
a comparison between sample units that do not respond and those that do
respond

Page1of 3