Analysis of MIDFIELD data begins by identifying the groups of students, programs, and metrics with which we intend to work.
Working with MIDFIELD data is iterative—intermediate results often cause us to revisit an earlier assumption or select a different bloc or student attributes to work with. Nevertheless, a completed analysis usually comprises the following steps in roughly the sequence given below. Accented entries indicate topics in the open article.
Data at the “student-level” refers to information about individual students including, for example, demographics, programs, academic standing, courses, grades, and degrees. Also called Student Unit Records (SURs). In MIDFIELD, student-level data are compiled by an institution and anonymized and curated by the MIDFIELD data steward.
US academic field of study. Can be used to indicate a specialty within a field or a collection of fields within a Department, College, or University. Programs are denoted by the Classification of Instructional Programs (CIP), a taxonomy of academic programs curated by the US Department of Education (NCES 2010).
A quantitative measure derived from student-level data. Includes statistical measures such as counts of program starters or graduates as well as comparative ratios such as graduation rate or stickiness. Typically involves comparisons of specific blocs of students and programs.
A grouping of student-level data dealt with as a unit, for example, starters, students ever-enrolled, graduates, transfer students, traditional and non-traditional students, migrators, etc.
Detailed information in the student-level data that further characterize a bloc of records, typically used to create bloc subsets for comparison, for example, program, race/ethnicity, sex, age, grade level, grades, etc.
There are currently two points of access to MIDFIELD data:
MIDFIELD. A database of anonymized student-level records for approximately 1.7M undergraduates at nineteen US institutions from 1987–2018, of which midfielddata provides a sample. This research database is currently accessible to MIDFIELD partner institutions only.
midfielddata. An R data package that supplies anonymized student-level records for 98,000 undergraduates at three US institutions from 1988–2018. A sample of the MIDFIELD database, midfielddata provides practice data for the tools and methods in the midfieldr package.
To load research data. For users with access to the MIDFIELD database, data are imported using any “read” function, e.g.,
# Not run <- fread("local_path_to_student_research_data") student <- fread("local_path_to_course_research_data") course <- fread("local_path_to_term_research_data") term <- fread("local_path_to_degree_research_data")degree
To load practice data. Load from the midfielddata package.
# Load practice data library(midfielddata) data(student, course, term, degree)
The variables in the practice data are a subset of those in the research data. A researcher transitioning from working with the practice data to the research data should find that their scripts need little (if any) modification.
Reminder. midfielddata datasets are for practice, not research.
Identify programs in general terms, for example,
cip data set included with midfieldr to
identify the 6-digit CIP codes relevant to a study.
Note. Most of our examples involve engineering programs. However, MIDFIELD research data contain student-level records of all undergraduates in all programs at their institution over the time spans given.
Before the data processing starts, we have to decide the metrics we want to compare among which blocs of students grouped by what variables. Metrics can include bloc counts or comparative ratios, for example:
The metric determines the blocs to gather, for example:
The research study design determines the grouping variables, for example,