# Hw1-2023

.pdf
RICE UNIVERSITY STAT 425 Bayesian Statistics Fall 2023: Prof. Daniel R. Kowal Assignment #1 This assignment is due on Canvas at the beginning of class (2:30pm CT) on Tuesday, September 5, 2023. Late homework is not accepted. 1. COVID tests and conditional probability. Consider a COVID test for an individual. Let test+ be the event that the test is positive and covid+ be the event that the individual has COVID. Similarly define test- and covid- for a negative test and not having COVID, respectively. Suppose this particular test has 80% sensitivity , p ( test+ | covid+ ) = 0 . 80 , and 97% specificity , p ( test- | covid- ) = 0 . 97 . Finally, suppose that 5% of the population currently has COVID, so p ( covid+ ) = 0 . 05. These values for sensitivity and specificity are the minimum recommendations for antigen (rapid) tests according to the World Health Organization. (a) What is p ( test+ )? Express in terms of the sensitivity, specificity, and proportion of the population with COVID. (b) What is the probability that an individual is COVID positive, given that they have tested positive? (c) What is the probability that an individual is COVID negative, given that they have tested negative? (d) How does these probabilities in (b) and (c) change if we vary the following terms: Note: only change one at a time. i. If the sensitivity is much lower, at 50%? ii. If the specificity is much higher, at 100%? iii. If instead we consider a rare disease that occurs in only 0.01% of the population? 2. Importance of the prior parameters For this question, we will expand upon the survey data from class. Let y i = 1 for every student who answered "yes" and y i = 0 otherwise, and let y = n i =1 y i . We will use the binomial model y | θ Binomial( n, θ ) . For each sample size n ∈ { 10 , 100 , 1000 } , build a "fake" dataset y = n ˆ y , each using the same proportion ˆ y that we recorded in class. Note: You may round y to the nearest integer. © Daniel R. Kowal 2023 1
(a) Consider a uniform prior on θ . For each ( n, y ) computed above: i. Compute the posterior mean and variance of θ . ii. Compute the posterior probability that θ exceeds 0.5. iii. Plot the posterior distribution of θ . (b) Now consider yourself a "Bayesian adversary": using the prior θ Beta( α, β ), pick values of α and β that you think are absurd or unhelpful. Repeat (a)i-iii for each ( n, y ). Note: remember that α, β > 0. (c) What do these results suggest about the importance of the prior as n grows large? 3. Binomial model with rare events Suppose we are interested in the prevalence of a rare genetic trait. After consulting the literature, you found that previous studies show that the trait prevalence ranges from about 0.05 to 0.20, with an average prevalence of 0.10. A small random sample of n = 20 individuals was checked for presence or absence of the trait. Let y i be an indicator of the trait for subject i and let y = n i =1 y i . We use the following model: y | θ Binomial( n, θ ) , θ Beta( α, β ) (a) Specify a choice of ( α, β ) and justify your choice. (b) Suppose y = 0, i.e., we do not observe the trait in any of the n = 20 subjects. Plot the posterior distribution together with the prior distribution and the sample proportion (and MLE) ˆ y = y/n . (c) Compute a 95% posterior credible interval for θ . Compare this interval with a 95% frequentist confidence interval ˆ y ± 1 . 96 p ˆ y (1 ˆ y ) /n (d) As a "stress test", recompute the frequentist confidence interval for y = 1 , 2 , . . . until both endpoints of the interval belong to [0 , 1]. How large does y need to be? Why is this important? 2