As a package built for the R software environment, to use data sgp you need a computer that has R installed. R is available for Windows, OSX, and Linux and is open source so it can be compiled on just about any system. Running SGP analyses assumes that you have some familiarity with using R, so if you don’t already know how to use it then you should spend some time learning how to do so before diving into running SGP analyses.
A good starting point for getting familiar with the SGP package and the use of its data structures is to consult the SGP data analysis vignette. This vignette provides comprehensive documentation on how to use the WIDE format sgpData with the SGP package and perform basic SGP analyses.
The sgpData data set is an anonymized panel dataset comprising 5 years of annual, vertically scaled, assessment data in the WIDE format. This exemplar data set models the format of the data used by the lower level studentGrowthPercentiles and studentGrowthProjections functions. It contains data for a single student over 5 years of testing, and each of the columns in the data set provide a unique student identifier and the numeric assessment score associated with that student at each of the test occurrences.
As with any data set, it is important to understand the meaning of each of the individual fields within the sgpData data structure. In the case of sgpData, the first field, ID, provides the unique student identifier. The next field, GRADE_2013, provides the grade level at which the student was tested in each of the 5 years of data. The final fields, SS_2013, SS_2014, SS_2015, and SS_2016, provide the numeric scores at each of the assessed grades.
It is important to note that this sgpData data is stored in its entirety, including all of the missing values (NA). When using this data for analysis it is essential that you understand the full implications of doing so. In some cases, missing data may not have any impact on the analyses being performed but in other cases it can have a significant effect on the results of those analyses. In particular, missing data may lead to a bias in the estimate of student growth. To avoid this bias, it is necessary to account for missing data when analyzing the SGP output. This can be done by using the missing values function. To do so, simply pass the sgpData data through the missingvalues() function. The missingvalues() function returns an array of missing value entries corresponding to the missing values in the sgpData data set. This array can then be used in the sgpGrowthPercentiles() and sgpGrowthProjections() functions to perform the appropriate analyses. This will result in unbiased estimates of student growth for each of the assessment ages included in the data. Using this method, you can also avoid the potentially biasing effects of comparing student growth across time periods using these SGP functions.