Lab 3C: Random Sampling
Lab 3C - Random Sampling
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Learning by sampling
-
In many circumstances, there's simply no feasible way to gather data about everyone in a population.
– For example, the Department of Water & Power (DWP) wants to determine how much water people in Los Angeles use to take a shower. They've created a survey to pass out to collect this information.
– Write down two reasons why getting everyone in Los Angeles to fill out the survey would be difficult. Also, write a sentence why the DWP might consider using a sample of households instead.
-
In this lab, we'll learn how sampling methods affect how representative a sample is of a population.
Loading a population
-
In previous labs, we used the
cdcdata as a sample for young people in the United States.– In this lab, we'll consider these survey respondents to be our population.
-
Load the
cdcdata intoRand fill in the blanks to take a convenience sample of the first 50 people in the data:s1 <- slice(____, 1:____) -
Why do you think we call this method a convenience sample?
Comparing your convenience sample
-
A convenience sample is a sample from a population where we collect data on subjects because they're easy-to-find.
-
Using your convenience sample, create a
bargraphfor the number of people in eachgrade.– Do you think the distribution of
gradefor your sample would look similar when compared to the wholecdcdata?– Which groups of people do you think are over or under represented in your convenience sample? Why?
-
Create a
bargraphforgradeusing thecdcdata.– Compare the distributions of the
cdcdata and your convenience sample and write down how they differ.
Using randomness
-
Fill in the blanks below to create a sample by randomly selecting 50 people in the
cdcdata, without replacement. Call this new samples2:___ <- sample(___, size = ___, replace = ___) -
Write a sentence that explains why you think the distribution of
gradefor this random sample will look more or less similar to the distribution from the wholecdcdata. -
Create a
bargraphforgradebased on this random sample to check your prediction.
Increasing sample size
-
Create
bargraphs forgradebased on each of the following sample sizes: 10, 100, 1,000, 10,000.– Compare each distribution to that of the population.
-
How do the distributions change as the size of the sample increases? Why do you think this occurs?
-
tally()the proportion ofgrades for your convenience sample and all your random samples.– Which set of proportions looks most similar to the proportions of the population?
Lessons learned
-
The mean, or proportion, from a random sample might not always be closer to that of the true population when compared to a convenience sample.
-
However, as sample sizes get larger:
– Random samples will tend to be better estimates for the population.
– With convenience samples, this might not be the case.
-
Write down a reason why estimates based on convenience samples might not improve even as sample size increases.