Essential Concepts
IDS Unit 1: Essential Concepts
Lesson 1: Data Trails
Data are a collection of recorded observations. Data are gathered by people and by sensors. Patterns in data can reveal previously unknown patterns in our world. Data play a large, and sometimes invisible, role in our lives.
Lesson 2: Stick Figures
Data consist of records of particular characteristics of people or objects. Data can be organized in many different ways, and some ways make it easier than others for achieving particular purposes.
Lesson 3: Data Structures
Variables record values that vary. By organizing data into rectangular format, we can easily see the characteristics of observations by reading across a row, or we can see the variability in a variable by reading down the column. Computers can easily process data when it is in rectangular format.
Lesson 4: The Data Cycle
A statistical investigation consists of cycling through the four stages of the Data Cycle. The term statistical investigative questions encompasses the variety of questions asked during the statistical problem-solving process which support statistical thinking and reasoning. Statistical investigative questions are perhaps the most important because they are challenging to learn and are the types of questions that determine whether an analysis is productive or not. Statistical investigative questions are questions that address variability and are productive in that they motivate data collection, analysis, and interpretation. The Data Collection phase might consist of collecting data through Participatory Sensing or some other means, or it might consist of examining previously collected data to determine the quality of the data for answering the statistical investigative questions. Data Analysis is almost always done on the computer and consists of creating relevant graphics and numerical summaries of the data. Data Interpretation is involved with using the analysis to answer the statistical investigative questions.
Lesson 5: So Many Questions
Statistical investigative questions typically begin with a vague general question, then develop into a precise question. The process of developing or creating a good investigative question is iterative and requires time and effort to get right. In her 2021 paper, What Makes a Good Statistical Question, Dr. Pip Arnold identified the following as features of a good investigative question: 1. The variable(s) of interest is/are clear 2. The group or population we are interested in is clear 3. The question can be answered with data 4. The question asks about the whole group, not an individual or portion of the group 5. The intention is clear (e.g., summary, comparison, association, time series) 6. The question is one that is worth investigating, is interesting, and has a purpose
Lesson 6: What Do I Eat? [The Data Cycle: Consider Data]
After raising statistical investigative questions, we examine and record data to see if the questions are appropriate.
Lesson 7: Setting the Stage [The Data Cycle: Collect Data]
In Participatory Sensing, we humans behave as if we are robot sensors, collecting data whenever a "trigger" event occurs. Our ability to learn about the patterns in our life through these data depends on our being reliable data collectors.
Lesson 8: Tangible Plots [The Data Cycle: Analyze Data]
Distributions organize data for us by telling us (a) which values of a variable were observed, and (b) how many times the values were observed (their frequency).
Lesson 9: What Is Typical?
The “center” of a distribution is a deliberately vague term, but it is one way to answer the subjective question "what is a typical value?". The center could be the perceived balancing point or the value that approximately cuts the area of the distribution in half.
Lesson 10: Making Histograms
Histograms can be created through the use of an algorithm. The distributions displayed in a histogram can be classified using the technical terms for the shapes of distributions. Learning to describe routine tasks through an algorithm is an important component of computational thinking.
Lesson 11: What Shape Are You In?
Identifying the shape of a histogram is part of the interpret step of the Data Cycle.
Lesson 12: Exploring Food Habits
Once Participatory Sensing data has been collected, the Dashboard and PlotApp perform the analysis step of the Data Cycle, though humans need to tell the computer which plots to examine.
Lesson 13: RStudio Basics
The computer has a syntax, and it can only understand if you speak its language.
Lesson 14: Variables, Variables, Variables
To examine whether two (or more) variables are related, we can plot their distributions on the same graph.
Lesson 15: Americans’ Time on Task
Learning to examine other analyses is an important part of statistical thinking.
Lesson 16: Categorical Associations
A two-way table is a summary of the association/relationship between two categorical variables. Joint relative frequencies answer questions of the form "what proportion of the people/objects had this value on the first variable and this value on the second?"
Lesson 17: Interpreting Two-Way Tables
Marginal (relative) frequencies tell us about the distribution of a single variable. Conditional relative frequencies tell us about the distribution of one variable when "subsetting" the other.