Skip to content
Introduction to Data Science Curriculum
Lab 2I - R’s Normal Distribution Alphabet
Initializing search
    • Home
    • Table of Contents
    • Overview & Philosophy
    • Scope and Sequence
      • Daily Overview
      • Essential Concepts
        • Data Are All Around
        • Lesson 1: Data Trails
        • Lesson 2: Stick Figures
        • Lesson 3: Data Structures
        • Lesson 4: The Data Cycle
        • Lesson 5: So Many Questions
        • Lesson 6: What Do I Eat?
        • Lesson 7: Setting the Stage
        • Campaign Guidelines – Food Habits
        • Visualizing Data
        • Lesson 8: Tangible Plots
        • Lesson 9: What is Typical?
        • Lesson 10: Making Histograms
        • Lesson 11: What Shape Are You In?
        • Lesson 12: Exploring Food Habits
        • Lesson 13: RStudio Basics
        • Lab 1A - Data, Code & RStudio
        • Lab 1B: Get the Picture?
        • Lab 1C: Export, Upload, Import
        • Lesson 14: Variables, Variables, Variables
        • Lab 1D: Zooming Through Data
        • Lab 1E: What’s the Relationship?
        • Practicum: The Data Cycle & My Food Habits
        • Would You Look at the Time!
        • Lesson 15: Americans’ Time on Task
        • Campaign Guidelines – Time Use
        • Lab 1F: A Diamond in the Rough
        • Lesson 16: Categorical Associations
        • Lesson 17: Interpreting Two-Way Tables
        • Lab 1G: What’s the FREQ?
        • Practicum: Teen Depression
        • Lab 1H: Our Time
        • End of Unit Project and Presentation: Analyzing Data to Evaluate Claims
      • Daily Overview
      • Essential Concepts
        • What is Your True Color?
        • Lesson 1: What is Your True Color?
        • Lesson 2: What Does Mean Mean?
        • Lesson 3: Median in the Middle
        • Lesson 4: How Far is it from Typical?
        • Lab 2A - All About Distributions
        • Lesson 5: Human Boxplots
        • Lesson 6: Face Off
        • Lesson 7: Plot Match
        • Lab 2B - Oh the Summaries ...
        • Practicum: The Summaries
        • How Likely is it?
        • Lesson 8: How Likely Is It?
        • Lesson 9: Dice Detective
        • Lesson 10: Marbles, Marbles…
        • Lab 2C - Which Song Plays Next?
        • Lesson 11: This AND/OR That
        • Lab 2D - Queue It Up!
        • Practicum: Win, Win, Win
        • What Are the Chances That You Are Stressing or Chilling?
        • Lesson 12: Don’t Take My Stress Away!
        • Campaign Guidelines – Stress/Chill
        • Lesson 13: The Horror Movie Shuffle
        • Lab 2E - The Horror Movie Shuffle
        • Lesson 14: The Titanic Shuffle
        • Lab 2F - The Titanic Shuffle
        • Lesson 15: Tangible Data Merging
        • Lab 2G - Getting It Together
        • Practicum: What Stresses Us?
        • What’s Normal?
        • Lesson 16: What Is Normal?
        • Lesson 17: A Normal Measure of Spread
        • Lesson 18: What’s Your Z-Score?
        • Lab 2H - Eyeballing Normal
        • Lab 2I - R’s Normal Distribution Alphabet
          • Lab 2I - R's Normal Distribution Alphabet
            • Where we're headed
            • Get set up
            • Is it normal?
            • Using the normal model
            • Extreme probabilities
            • Simulating normal draws
            • P's and Q's
            • On your own
        • End of Unit Project: Asking and Answering Statistical Investigative Questions of Our Own Data
      • Daily Overview
      • Essential Concepts
        • Testing, Testing…1, 2, 3…
        • Lesson 1: Anecdotes vs. Data
        • Lesson 2: What Is an Experiment?
        • Lesson 3: Let’s Try an Experiment!
        • Lesson 4: Predictions, Predictions
        • Lesson 5: Time Perception Experiment
        • Lab 3A: The Results Are In!
        • Practicum: Music to my Ears
        • Would You Look at That?
        • Lesson 6: Observational Studies
        • Lesson 7: Observational Studies vs. Experiments
        • Lesson 8: Monsters That Hide in Observational Studies
        • Lab 3B: Confound It All!
        • Are You Asking Me?
        • Lesson 9: Survey Says…
        • Lesson 10: We’re So Random
        • Lesson 11: The Gettysburg Address
        • Lab 3C: Random Sampling
        • Lesson 12: Bias in Survey Sampling
        • Lesson 13: The Confidence Game
        • Lesson 14: How Confident Are You?
        • Lab 3D: Are You Sure about That?
        • Practicum: Let’s Build a Survey!
        • What’s the Trigger?
        • Lesson 15: Ready, Sense, Go!
        • Lesson 16: Does It Have a Trigger?
        • Lesson 17: Creating Our Own Participatory Sensing Campaign
        • Lesson 18: Evaluating Our Own Participatory Sensing Campaign
        • Lesson 19: Implementing Our Own Participatory Sensing Campaign
        • Webpages
        • Lesson 20: Online Data-ing
        • Lab 3E: Scraping Web Data
        • Lab 3F: Maps
        • Lesson 21: Learning to Love XML
        • Lesson 22: Changing Format
        • Practicum: What Does Our Campaign Data Say?
        • End of Unit Project: TB or Not TB
      • Daily Overview
      • Essential Concepts
        • Campaigns and Community
        • Lesson 1: Trash
        • Lesson 2: Drought
        • Lesson 3: Community Connection
        • Lesson 4: Evaluate and Implement the Campaign
        • Lesson 5: Refine and Create the Campaign
        • Predictions and Models
        • Lesson 6: Statistical Predictions Using One Variable
        • Lesson 7: Statistical Predictions by Applying the Rule
        • Lesson 8: Statistical Predictions Using Two Variables
        • Lesson 9: Spaghetti Line
        • LAB 4A: If the Line Fits…
        • Lesson 10: What’s the Best Line?
        • LAB 4B: What’s the Score?
        • LAB 4C: Cross-Validation
        • Lesson 11: What’s the Trend?
        • Lesson 12: How Strong Is It?
        • LAB 4D: Interpreting Correlations
        • Lesson 13: Improving your Model
        • LAB 4E: Some Models Have Curves
        • Practicum: Predictions
        • Piecing it Together
        • Lesson 14: More Variables to Make Better Predictions
        • Lesson 15: Combination of Variables
        • LAB 4F: This Model Is Big Enough for All of Us
        • Decisions, Decisions!
        • Lesson 16: Football or Futbol?
        • Lesson 17: Grow Your Own Decision Tree
        • LAB 4G: Growing Trees
        • Ties That Bind
        • Lesson 18: Where Do I Belong?
        • LAB 4H: Finding Clusters
        • Lesson 19: Our Class Network
        • End of Unit 4 Modeling Activity Project and Presentation
      • Unit 1 Vocabulary
      • Unit 2 Vocabulary
      • Unit 3 Vocabulary
      • Unit 4 Vocabulary
      • Unit 1 Lab Code
      • Unit 2 Lab Code
      • Unit 3 Lab Code
      • Unit 4 Lab Code
      • IDS_Curriculum
      • IDS_LMRs
      • IDS_Lab Response Sheets
      • IDS_Teacher Resources
    • AppDownloads
    • How to..Video
    • Applications
    • Updates
    • Lab 2I - R's Normal Distribution Alphabet
      • Where we're headed
      • Get set up
      • Is it normal?
      • Using the normal model
      • Extreme probabilities
      • Simulating normal draws
      • P's and Q's
      • On your own

    Lab 2I - R’s Normal Distribution Alphabet

    Lab 2I - R's Normal Distribution Alphabet

    Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.

    Where we're headed

    • In the last lab, you were able to overlay a normal curve on histograms of data to help you decide if the data's distribution is close to a normal distribution.

      – We also saw that calculating the mean of random shuffles also produces differences that are normally distributed.

    • In this lab, we'll learn how to use some other R functions to:

      – Simulate random draws from a normal distribution.

      – Calculate probabilities with normal distributions.

    Get set up

    • Start by loading the titanic data and calculate the mean age of people in the data but shuffle their survival status 500 times.

      – Assign this data the name shfls.

    • After creating shfls, use mutate to add a new variable to the dataset. This new variable should have the name diff and should be the mean age of those who survived minus those who died.

    • Finally, calculate the mean and sd of the diff variable.

      – Assign these values the name diff_mean and diff_sd.

    Is it normal?

    • Before we proceed, we need to verify that our diff variable looks approximately normally distributed.

      – Is the distribution close to normal? Explain how you determined this. Describe the center and spread of the distribution.

      – Compute and write down the mean difference in the age of the actual survivors and the actual non-survivors.

    Using the normal model

    • Since the distribution of our diff variable appears normally distributed, we can use a normal model to estimate the probability of seeing differences that are more extreme than our actual data.

    • Draw a sketch of a normal curve. Label the mean age difference, based on your shuffles, and the actual age difference of survivors minus non-survivors from the actual data. Then shade in the areas, under normal the curve, that are smaller than the actual difference.

    • Fill in the blanks to calculate the probability of an even smaller difference occurring than our actual difference using a normal model.

      pnorm(____, mean = diff_mean, sd = ____)
      

    Extreme probabilities

    • The probability you calculated in the previous slide is an estimate for how often we expect to see a difference smaller than the actual one we observed, by chance alone.

    • If you wanted to instead calculate the probability that the difference would be larger than the one observed, we could run (fill in the blanks):

      1 - pnorm(____, mean = diff_mean, sd = ____)
      

    Simulating normal draws

    • We can simulate random draws from a normal distribution with the rnorm function.

      – Fill in the blanks in the following two lines of code to simulate 100 heights of randomly chosen men. Assume the mean height is 67 inches and the standard deviation is 3 inches.

      draws <- rnorm(____, mean = ____, sd = ____)
      

      – Plot your simulated heights with a histogram.

      histogram(draws, fit = ____)
      

    P's and Q's

    • We've seen that we can use pnorm to calculate probabilities based on a specified quantity.

      – Hence, why we call it "P" norm.

    • Now we'll see how to do the opposite. That is, calculate the quantity for a specific probability.

      – Hence why we'll call this a "Q" norm.

    • How tall can a man be and still be in the shortest 25% of heights if the mean height is 67 inches with a standard deviation of 3 inches?

      qnorm(____, mean = ____, sd = ____)
      

    On your own

    Conduct one of the statistical investigations below:

    • Using the titanic data:

      – Were women on the Titanic typically younger than men?

      – Use a histogram, 500 random shuffles and a normal model to answer the question in the bullet above.

    • Using the cdc data:

      – Using 500 random shuffles and a normal model, how much taller would the typical male have to be than the typical female in order for the difference to be in the upper 1% by chance alone?

      – How can we use this value to justify the claim that the average Male in our data is taller than the average Female?

    Made with Material for MkDocs

      Login Required.

      Questions? Contact IDS Support at support@idsucla.org