School

Georgia Institute Of Technology **We aren't endorsed by this school

Course

ISYE 6414

Subject

Statistics

Date

Sep 25, 2023

Type

Other

Pages

3

Uploaded by schooler18 on coursehero.com

Data Example
Slide 3:
In this example, we want to analyze the number of survival days for patients with different types
of cancers that were treated with ascorbate.
Slide 4:
The response variable is the number of survival days, and each patient belongs to a specific
cancer type. There are five different cancer types, and the sample sizes vary across the groups.
●
Response variable: Number of survival days for patients with different cancer types.
●
'j' is the index for a patient, 'I' is the index for the cancer type/group (k = 5).
●
Five groups representing different organ cancers: stomach, colon, ovary, breast.
●
Varying sample sizes across groups, e.g., ovarian cancer with only six observations.
Slide 5:
As part of the exploratory data analysis, we examine the distribution of the response variable.
The histogram reveals a skewed distribution, indicating that the normality assumption does not
hold. To address this, a log transformation is applied to normalize the data.
●
Data loaded in R using read.table().
●
Extract response variable ('survival') from 'cancer_data'.
●
Histogram using hist() to check distribution.
●
Skewed distribution indicates non-normality.
●
Log transformation applied for normalization.
●
Distribution becomes more symmetric with two modes.
Slide 6:
To implement ANOVA in R, we need the response variable (log of survival days) and the
categorical predictor variable (cancer type). The cancer type variable is converted into a factor
to indicate its categorical nature. A side-by-side boxplot is generated to visualize the differences
in means between the cancer types.
●
Log-transformed data used for ANOVA.
●
Two variables: log of survival and cancer type label.
●
Cancer type labels converted to a factor using as.factor().
●
Side-by-side boxplot shows survival distribution by cancer type.
●
Differences in means and variability observed between groups.

Slide 7:
The ANOVA command (aov) in R is used to perform the analysis. The output consists of the
ANOVA table, which provides information about the sum of squared errors and treatments,
degrees of freedom, mean sum of squares, F-test, and p-value. Additionally, the model.tables
command provides the overall mean and means for each cancer type.
●
ANOVA tests equality of means across cancer types.
●
ANOVA table provides sums of squares, degrees of freedom, F-test, and p-value.
●
Degrees of freedom: 4 for cancer type, residual for error.
●
F-value: 4.286, p-value: 0.004.
●
Model.tables output shows overall and group-specific log survival means.
Slide 8:
To compare the means between pairs of cancer types, a pairwise comparison is conducted
using the TukeyHSD command. Confidence intervals for the difference in means are obtained.
The results indicate that the log mean of survival days for breast cancer patients is significantly
larger than for bronchus and stomach cancer patients.
●
TukeyHSD for pairwise mean comparisons.
●
Statistically significant differences between two pairs of means.
●
Comparison of log mean survival days: Breast vs. Bronchus, Breast vs. Stomach.
●
Confidence intervals only include negative values.
●
Other pairs not statistically different, suggesting similar survival days.
Slide 9:
Assessing the assumptions of ANOVA is crucial for model fit evaluation. Residual analysis is
performed to evaluate the assumptions of constant variance, independence, and normality. The
quantile-normal plot and histogram show that the residuals have an approximately normal
distribution, and there is no clear pattern in the residuals.
●
Residual analysis assesses assumptions.
●
Assumptions: constant variance, independence, normality.
●
Quantile normal plot and histogram evaluate normality.
●
Residuals appear normally distributed.
●
No patterns observed in residual plots.
Slide 10:
Based on the ANOVA analysis, it is concluded that there is strong evidence of differences in
survival days among the five types of cancer. Specifically, the survival time is significantly
different for patients with breast cancer compared to those with bronchus or stomach cancer.

●
Strong evidence for survival differences among cancer types.
●
Statistically significant differences for Breast vs. Bronchus/Stomach cancers.
In summary, the example demonstrates how to implement ANOVA in R, interpret the ANOVA
output, conduct pairwise comparisons, and evaluate the assumptions and model fit. The
analysis provides insights into the differences in survival days among different types of cancer.