Lesson 13: RStudio Basics
Lesson 13: RStudio Basics
Objective:
Students will learn RStudio/Posit Cloud’s interface, as well as a few basic commands to discover the structure behind a dataset.
Materials:
-
Computer
-
Projector
-
RStudio: https://portal.idsucla.org
-
Video showing how to log into RStudio/Posit Cloud for the first time found here.
Vocabulary:
pane preview console plot environment
RStudio Commands:
data( )
, View( )
, names( )
, help( )
, dim( )
, tally( )
, load_labs( )
Essential Concepts:
Essential Concepts:
The computer has a syntax, and it can only understand if you speak its language.
Before inviting students to your RStudio/Posit Cloud Teacher Space, ensure that:
a) Students sign-in to RStudio/Posit Cloud using the "Log In With Google" option using their school email address.
b) Each student watches the video showing how to log into RStudio/Posit Cloud for the first time found here.
c) You are familiar with managing your RStudio/Posit Teacher Space. See video here.
d) In preparation for this lesson watch this video.
Lesson:
-
Inform students that the Dashboard and PlotApp are data visualization tools that are coded in R, the statistical programming software that academics and professional statisticians use. The Introduction to Data Science course will utilize RStudio, which also runs on R. They will learn the programming language of RStudio for data analysis.
-
Demonstrate how to access RStudio/Posit Cloud by projecting the URL: https://portal.idsucla.org on a screen. Then, click on the RStudio (Posit Cloud) icon on the page.
-
Inform students that they will log into RStudio/Posit Cloud using the "Log In with Google" option. Note that this is not the same as their IDS App & IDS Homepage login.
-
Once logged in, show each pane, or rectangular area, of the RStudio/Posit Cloud interface:
-
preview (spreadsheet) - where they will be able to see the variables and observations (index); rows and columns of data
-
console - where they will be entering their code
-
plot - where their plots/graphs/visualizations will be generated
-
environment - where they will see values and objects
-
-
Inform students that they will be looking at a dataset from The Centers for Disease Control and Prevention (CDC), a government agency that collects data about teenagers on a variety of topics.
-
Demonstrate how to load and view the CDC data file to the workspace by typing the following command in the console:
>data(cdc)
>View(cdc)
-
Examine the environment pane. Ask a student to describe how the data are displayed. The data are displayed in rows and columns.
-
Demonstrate how to list the variables found in the CDC dataset. Students may take notes and write down commands in their DS journals:
-
>names(cdc)
-
Ask: What do you notice? What is one variable of this dataset? How many variables are there? How does this output compare to the information in the preview pane? Answer: This command lists the names of each variable in the dataset. There are 32 variables; age, sex, grade, height, weight, etc. Answers will vary about how this output compares to the information in the preview pane.
-
-
Demonstrate how to obtain more detailed information about the dataset by typing the following command in the console
-
>help(cdc)
-
Ask: What unit of measurement is height reported in? Answer: Height was reported in meters.
-
-
Demonstrate how to find the number of rows and columns in the dataset.
-
>dim(cdc)
-
Ask: Which number do you think represents the rows? Which one represents the columns? How does this output compare to the information in the preview and environment panes? How many observations are there in the dataset? How many variables does this dataset contain? Answer: The first number represents the rows and the second number represents the columns. There are 17,232 rows, or 17,232 observations; and there are 32 columns, or 32 variables. This information is also visible in the preview pane.
-
-
Next, show students how to access the number of observations of a specific variable.
-
>tally(~seat_belt, data = cdc)
-
Ask: What do you notice? Describe the output. Answer: Notice that six categories are displayed. Each category shows the number of observations contained in it. E.g,. “Never” has 265 observations, meaning 265 teens reported never wearing their seat belt as a passenger in a motor vehicle. <NA> = Not Available, represents teens that did not provide information about their seat belt habits.
-
-
Change the variable to height.
-
>tally(~height, data = cdc)
-
Ask: What do you notice? Describe the output. Answer: The levels are missing. It happened because the variable 'height' contains numbers, not categories.
-
-
Let’s take a closer look at the variables seat_belt and height. Maximize the console. Ask teams to discuss the following question:
What is the difference between the data from the variables seat_belt and height? Answer: The data from the 'seat_belt' variable is categorical, which means it consists of groupings. The data from the variable 'height' is numerical, which means it consists of numbers.
-
Summarize: In data science, the variable seat_belt is what we call a categorical variable, and the variable height is what we call a numerical variable.
-
Let’s look at the other variables in this dataset. In pairs, categorize each variable as categorical or numerical:
-
eat_fruit Answer: categorical
-
weight Answer: numerical
-
grade Answer: categorical
-
drive_text Answer: categorical
-
-
Inform students that they will be learning RStudio code to work with data. They will be completing RStudio labs throughout the course.
-
Demonstrate how to load the menu of labs by typing the following code:
>load_labs( )
-
The load labs command displays a list of available labs and a selection prompt. To select Lab 1A, type number 1 after the selection prompt.
-
Next, direct students’ attention to the plot pane. Show them the location of Lab 1A’s presentation.
-
Click on the arrows at the bottom right-hand side of the presentation to view each slide. Pause on a slide titled “R’s most important syntax.” There are 3 boxes, each containing a line of code.
-
Explain that every time they see a grey box with a line of code, they are to type the code in the console. The output will appear either on the console itself or on the plot pane.
-
Type in one of the lines of code. In this particular case, the output will be a plot. Show students the location of the plot and demonstrate how to toggle between the plots and presentation tabs.
-
Inform students that they will be completing the first lab, 1A, the next day.
Class Scribes:
One team of students will give a brief talk to discuss what they think the 3 most important topics of the day were.
Homework & Next 3 Days
Students should continue to collect nutritional facts data using the Food Habits Participatory Sensing campaign on their smart devices or via web browser.