Building a dataset

Author

Peter M. Crosta

Published

April 6, 2025

Introduction

To gain a better understanding of the current state of education in Fair Haven, I started by building a data set. This document shares the process I went through to combine state and federal resources to generate data sets that would enable an analysis of student outcomes, educational inputs, and their relationship over time. The process also includes a discussion of how I created a set of suitable comparison school districts for Fair Haven.

In an effort to be as transparent as possible, this document contains descriptions of source and derived data sets as well as code in the R programming language that one can use to reproduce results.

Code - Load libraries
options(width = 120)

library(tidyverse)
library(readxl)
library(googlesheets4)
library(googledrive)
library(haven)
library(cluster)
library(MASS)
library(ggpmisc)
library(ggrepel)
library(ggfortify)
library(glue)
library(DT)
library(htmltools)
library(factoextra)
library(gt)
library(purrr)

set.seed(2024)

# Commonly used State ID numbers (staid) and Federal Local Education Authority (LEA) IDs
# for FH, Little Silver, Rumson, and Shrewsbury
fh_leaid <- "3404950"
fh_staid <- "NJ-251440"

ls_leaid <- "3408790"
ls_staid <- "NJ-252720"

rums_leaid <- "3414370"
rums_staid <- "NJ-254570"

shrews_leaid <- "3414970"
shrews_staid <- "NJ-254770"

district_name_colors <- c(
  "Fair Haven Boro" = "#4169E1",
  "Little Silver Boro" = "#1F2868",
  "Rumson Boro" = "#8640C4",
  "Shrewsbury Boro" = "#004500",
  "Other" = "darkgrey"
)

Common Core of Data

The first data set I leverage is the Common Core of Data, a directory of PK-12 schools curated by the Federal Government. The most recent data is specific to the 2023-24 school year, which is suitable for the purpose of building a directory of public, non-charter NJ districts that that offer grades PK/K-8 like Fair Haven.

Code - Read Common Core of Data
url <- "https://nces.ed.gov/ccd/Data/zip/ccd_lea_029_2324_w_0a_050824.zip"

temp_file <- tempfile(fileext = ".zip") 
download.file(url, destfile = temp_file, mode = "wb") 
out <- unzip(temp_file, exdir = tempdir())

outcsv <- grep(pattern="csv$", x = out, value = T) 
ccd <- read_csv(outcsv)

# Create subset of public NJ K-8
ccdnjk12 <- ccd  %>%  
  dplyr::filter(ST=="NJ", LEA_TYPE==1, CHARTER_LEA=="NOTCHR") %>% 
  dplyr::filter(G_9_OFFERED=="No", G_10_OFFERED=="No", G_11_OFFERED=="No", G_12_OFFERED=="No", G_13_OFFERED=="No") %>% 
  dplyr::filter(GSLO != "05", ! GSHI %in% c("04", "05")) %>%
  dplyr::select(SCHOOL_YEAR, ST, LEA_NAME, ST_LEAID, LEAID, LSTREET1, LSTREET2, LSTREET3, LCITY, LZIP, WEBSITE, G_PK_OFFERED, GSLO, GSHI)

# Serialize
write_csv(ccdnjk12, file="datasets/ccdnjk12.csv")

I identify 267 public, non-charter NJ districts that that offer grades PK/K-8 like Fair Haven. These are saved as a CSV file for easy access.

Click to show/hide table of districts

ACS EDGE

ACS EDGE, or the American Community Survey (ACS) Education Demographic and Geographic Estimates (EDGE), data contains district-level social, economic, housing, and demographic detail based on the ACS 5-year summary files. I use this data to generate peer districts (comparison groups) for Fair Haven. The latest available data was the 2018-22 data, and I use the Population Group: Total Population. The raw data that was downloaded from ACS came as zipped XLSX files. The zip files were uploaded to a Google Drive folder, where they could be accessed and parsed. The collapsed code chunk below creates data frames from the raw ACS EDGE data.

Code - Read ACS EDGE Files from Google Drive
#Auth Google Drive and Sheets
drive_auth(path = "../pmcrosta-datascience-ed9fcc88faf6.json")
gs4_auth(path = "../pmcrosta-datascience-ed9fcc88faf6.json")

edge_zips <- drive_find(pattern='edge', type='zip') %>% dplyr::select(name, id)

## download, unzip, and clean up files. create edge_* data frames.

obnames <- vector()
for (ii in 1:nrow(edge_zips)) {
  
  temp <- tempfile(fileext = ".zip")
  dl <- drive_download(
    as_id(edge_zips$id[ii]), path = temp, overwrite = TRUE)
  out <- unzip(temp, exdir = tempdir())
  
  outtxt <- grep(pattern="txt$", x = out, value = T)
  object_name <- str_remove(edge_zips$name[ii], ".zip")
  assign(object_name, read.table(outtxt, sep="|", header=T) %>% dplyr::select(-ends_with("moe")))
  
  outxls <- grep(pattern="xlsx$", x = out, value = T)
  data_dict <- read_excel(outxls, sheet="DP_TotalPop") %>% dplyr::filter(varname %in% colnames(get(object_name)))
  newnames <- data_dict$varname
  names(newnames) <- data_dict$vlabel
  
  duplicate_cols <- duplicated(as.list(get(object_name)))
  df_unique <- get(object_name)[!duplicate_cols]
  newnames <- newnames[newnames %in% colnames(df_unique)]
  
  assign(object_name, df_unique %>% dplyr::rename(all_of(newnames)))

  obnames <- c(obnames, object_name)
}

glue("Data frames created: ", paste(obnames, collapse=", "))
Data frames created: edge_edattain, edge_enrollment, edge_householdstype, edge_occupation, edge_income, edge_workerclass, edge_employmentstatus, edge_housing, edge_race, edge_sex_age

Next, I isolate the variables that will be useful for generating peer districts. Major categories include:

  • Educational attainment
  • Employment status
  • Educational enrollment
  • Household type and size
  • Housing occupancy
  • Household income
  • Occupational category distribution
  • Demographics (race, sex, age)
Code - Process EDGE data and create district-level data set
edge_edattain_df <- edge_edattain %>%
  dplyr::select(GeoId, Geography, LEAID, starts_with("Percent")) %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE)))

edge_employmentstatus_df <- edge_employmentstatus %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID, starts_with("Percent")) %>%
  dplyr::select(LEAID, ends_with("__In_labor_force") | contains("Unemployment"))

edge_enrollment_df <- edge_enrollment %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID, starts_with("Percent"))

edge_householdstype_df <-  edge_householdstype %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID,
                Percent__Estimate__HOUSEHOLDS_BY_TYPE__Total_households__Married_couple_household__With_children_of_the_householder_under_18_years,
                Number__Estimate__HOUSEHOLDS_BY_TYPE__Total_households__Average_household_size,
                Number__Estimate__HOUSEHOLDS_BY_TYPE__Total_households__Average_family_size)

edge_housing_df <- edge_housing %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID, Number__Estimate__HOUSING_OCCUPANCY__Total_housing_units)

edge_income_df <- edge_income %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID,
                Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Median_household_income__dollars_,
                Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Mean_household_income__dollars_,
                Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Median_family_income__dollars_,
                Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Mean_family_income__dollars_,
                Percent__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families___150_000_to__199_999,
                Percent__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families___200_000_or_more,
                Percent__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households___150_000_to__199_999,
                Percent__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households___200_000_or_more) %>%
  dplyr::mutate(
    Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Median_household_income__dollars_ = str_replace(Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Median_household_income__dollars_, fixed("250,000+"), "275000"),
    Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Median_family_income__dollars_ = str_replace(Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Median_family_income__dollars_, fixed("250,000+"), "275000")
  )

edge_occupation_df <- edge_occupation %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID, starts_with("Percent"))

edge_race_df <- edge_race %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(LEAID, starts_with("Percent__Estimate__Race_alone_or_in_combination_with_one_or_more_other_races"))

edge_sex_age_df <- edge_sex_age %>%
  rename_with(~ gsub("\\.", "_", make.names(., unique = TRUE))) %>%
  dplyr::select(GeoId, Geography, LEAID, starts_with("Percent")) %>%
  dplyr::select(LEAID, Percent__Estimate__SEX_AND_AGE__Total_population__Under_18_years)

edge_df <- edge_edattain_df %>%
  dplyr::filter(LEAID != "3499999") %>%
  dplyr::left_join(edge_employmentstatus_df, by="LEAID") %>%
  dplyr::left_join(edge_enrollment_df, by="LEAID") %>%
  dplyr::left_join(edge_householdstype_df, by="LEAID") %>%
  dplyr::left_join(edge_housing_df, by="LEAID") %>%
  dplyr::left_join(edge_occupation_df, by="LEAID") %>%
  dplyr::left_join(edge_race_df, by="LEAID") %>%
  dplyr::left_join(edge_sex_age_df, by="LEAID") %>%
  dplyr::left_join(edge_income_df, by="LEAID") %>%
  dplyr::mutate(
    Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__Nursery_school__preschool = as.numeric(Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__Nursery_school__preschool),
    Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__Kindergarten = as.numeric(Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__Kindergarten),
    Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__Elementary_school__grades_1_8_ = as.numeric(Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__Elementary_school__grades_1_8_),
    Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__High_school__grades_9_12_ = as.numeric(Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__High_school__grades_9_12_),
    Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__College_or_graduate_school = as.numeric(Percent__Estimate__SCHOOL_ENROLLMENT__Population_3_years_and_over_enrolled_in_school__College_or_graduate_school),
    
    Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Median_household_income__dollars_ = as.numeric(Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Median_household_income__dollars_),
    Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Mean_household_income__dollars_ = as.numeric(Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Total_households__Mean_household_income__dollars_),
    Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Median_family_income__dollars_ = as.numeric(Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Median_family_income__dollars_),
    Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Mean_family_income__dollars_ = as.numeric(Number__Estimate__INCOME_AND_BENEFITS__IN_2022_INFLATION_ADJUSTED_DOLLARS___Families__Mean_family_income__dollars_)
  )

colnames(edge_df) <- c(
  "GeoId",
  "Geography",
  "LEAID",
  "Edu_Less_9th",
  "Edu_9th_12th_No_Diploma",
  "Edu_HS_Grad",
  "Edu_Some_College",
  "Edu_Assoc_Degree",
  "Edu_Bach_Degree",
  "Edu_Grad_Prof_Degree",
  "Edu_HS_or_Higher",
  "Edu_Bach_or_Higher",
  "Emp_Labor_Force",
  "Emp_Fem_Labor_Force",
  "Emp_Unemp_Rate",
  "Enroll_Nursery",
  "Enroll_Kindergarten",
  "Enroll_Elem",
  "Enroll_HS",
  "Enroll_College",
  "HH_Married_Children",
  "HH_Avg_Size",
  "Fam_Avg_Size",
  "Housing_Units",
  "Occ_Mgmt_Bus_Sci_Arts",
  "Occ_Service",
  "Occ_Sales_Office",
  "Occ_Nat_Res_Const_Maint",
  "Occ_Prod_Trans_Mat_Mov",
  "Race_White",
  "Race_Black",
  "Race_Amer_Indian_Alaska",
  "Race_Asian",
  "Race_Hawaiian_Pacific",
  "Race_Other",
  "Pop_Under_18",
  "HH_Med_Income",
  "HH_Mean_Income",
  "Fam_Med_Income",
  "Fam_Mean_Income",
  "Fam_Income_150k_199k",
  "Fam_Income_200k_plus",
  "HH_Income_150k_199k",
  "HH_Income_200k_plus"
)

glue("Created `edge_df`. `edge_df` has {nrow(edge_df)} rows and {ncol(edge_df)} columns.")
Created `edge_df`. `edge_df` has 557 rows and 44 columns.

edge_df now has the following columns and definitions:

  • Identifiers:

    • GeoId: Geographic identifier
    • Geography: District name
    • LEAID: Local education authority ID
  • Educational Attainment (percent of population age 25 and older) :

    • Edu_Less_9th: Less than 9th grade
    • Edu_9th_12th_No_Diploma: 9th to 12th grade, no diploma
    • Edu_HS_Grad: High school graduate
    • Edu_Some_College: Some college, no degree
    • Edu_Assoc_Degree: Associate’s degree
    • Edu_Bach_Degree: Bachelor’s degree
    • Edu_Grad_Prof_Degree: Graduate or professional degree
    • Edu_HS_or_Higher: High school graduate or higher
    • Edu_Bach_or_Higher: Bachelor’s degree or higher
  • Employment Status (percent of population 16 years and older):

    • Emp_Labor_Force: In labor force
    • Emp_Fem_Labor_Force: Females in labor force
    • Emp_Unemp_Rate: Unemployment rate
  • School Enrollment (percent of population 3 years and older) :

    • Enroll_Nursery: Nursery school
    • Enroll_Kindergarten: Kindergarten
    • Enroll_Elem: Elementary school
    • Enroll_HS: High school
    • Enroll_College: College or graduate school
  • Households:

    • HH_Married_Children: Percent married couple households with children under 18 years
    • HH_Avg_Size: Average household size
    • Fam_Avg_Size: Average family size
    • Housing_Units: Total housing units
  • Occupation (percent civilians 16 years and older):

    • Occ_Mgmt_Bus_Sci_Arts: Management, business, science, and arts
    • Occ_Service: Service occupations
    • Occ_Sales_Office: Sales and office occupations
    • Occ_Nat_Res_Const_Maint: Natural resources, construction, and maintenance
    • Occ_Prod_Trans_Mat_Mov: Production, transportation, and material moving
  • Race (percent):

    • Race_White: White
    • Race_Black: Black or African American
    • Race_Amer_Indian_Alaska: American Indian and Alaska Native
    • Race_Asian: Asian
    • Race_Hawaiian_Pacific: Native Hawaiian and Other Pacific Islander
    • Race_Other: Some Other Race
  • Population (percent):

    • Pop_Under_18: Population under 18 years
  • Income:

    • HH_Med_Income: Median household income
    • HH_Mean_Income: Mean household income
    • Fam_Med_Income: Median family income
    • Fam_Mean_Income: Mean family income
    • Fam_Income_150k_199k: Percent with family income $150,000 to $199,999
    • Fam_Income_200k_plus: Percent with family income $200,000 or more
    • HH_Income_150k_199k: Percent with household income $150,000 to $199,999
    • HH_Income_200k_plus: Percent with household income $200,000 or more

Finally, I serialize the edge_df data frame so that it can be accessed without having to rerun all this code. Here is a link to a CSV file.

Code - Serialize ACS-EDGE dataset and make accessible
write_csv(edge_df, file="datasets/edge_df.csv")
Click to show/hide ACS-EDGE data

Generating Peer Districts

The main purpose of collecting the ACS-EDGE data is to generate a set of districts that can help us to understand trends in Fair Haven and not in isolation. Due to geographical proximity and social integration, the de facto peer group for Fair Haven has mostly consisted of Rumson Borough, Little Silver Borough, and Shrewsbury Borough. I will certainly be highlighting these districts as we proceed. However, it is also instructive to have a broader set of districts that serve a population that is similar to Fair Haven along numerous demographic characteristics. I use the ACS-EDGE data to generate such a set of districts.

Why not use District Factor Groups?

For years, people have used District Factor Groups (DFGs) for the purpose of comparing students’ performance on statewide assessments across demographically similar school districts. However, the School Performance Reports no longer compare districts using this measure, and no updates have been made to the DFGs since the groupings were finalized in 2004.

Approaches

I leverage a number of techniques to generate peer districts for Fair Haven using these data. This exercise is both an art and science, as it is both grounded in data and subject to a number of researcher choices. To that end, a few sets of comparison districts will be created. Note that the final comparison data sets will be filtered to only include PK-8 districts like Fair Haven; this will happen later though since it requires yet another data source. In addition, the following section is not a “full academic” treatment of these approaches, meaning that the primary end is to be useful, not necessarily to fully account for all methodological nuances. However, I will compare district characteristics across various peer sets.

Euclidean distance

One simple, albeit somewhat naive, approach to identifying peer districts is to find the districts that are “closest” to each other given the district characteristics that were identified above. To do this, I

  1. Standardize all of our variables
  2. Compute a distance matrix (Euclidean distance in multidimensional space)
  3. Rank districts in terms of their “proximity” to Fair Haven

One downside to this strategy is that there are a number of highly correlated variables in the data set, and the distance matrix may overweight importance of these characteristics. For example, there are a number of income-based measures, and thus this approach may over-index on the household incomes of district residents. However, it may also produce a reasonable peer set.

Code - Using Euclidean distance
# Standardize variables
edge_df_scale <- scale(edge_df[,-c(1:3)])
rownames(edge_df_scale) <- edge_df$LEAID

## compute the distance matrix
edge_dist <- dist(edge_df_scale, method = "euclidean")

# Find the indices of the closest districts
closest_indices_euclid <- order(as.matrix(edge_dist)[, which(attr(edge_dist, "Labels")==fh_leaid)])

# Reorder edge_df so the most similar districts are on top
similar_districts_euclid <- edge_df[closest_indices_euclid, ] %>% dplyr::mutate(euclid_score=1:n()) %>% dplyr::select(LEAID, Geography, euclid_score)

# Print table
datatable(similar_districts_euclid,
  options = list(
    pageLength = 5,
    searching = TRUE,
    ordering = TRUE
  ),
  rownames = FALSE,
  caption = tags$caption(
    style = "caption-side: top; text-align: center;",
    "Most similar districts by Euclidean distance")
)

Mahalanobis distance

A second smiple distance-based approach is to use the Mahalanobis distance, as it is capable of taking into coniderations the correlations present in the data set. To do this, I

  1. Standardize all of our variables
  2. Compute a distance matrix (Mahalanobis pairwise distances)
  3. Rank districts in terms of their “proximity” to Fair Haven
Code - Using Mahalanobis distance
# Standardize variables
edge_df_scale <- scale(edge_df[,-c(1:3)])
rownames(edge_df_scale) <- edge_df$LEAID

mahalanobis_dist_matrix <- function(data) {
  cov_matrix <- cov(data)
  n <- nrow(data)
  dist_matrix <- matrix(0, nrow = n, ncol = n)

  for (i in 1:n) {
    for (j in 1:n) {
      dist_matrix[i, j] <- mahalanobis(data[i, ], data[j, ], cov_matrix)
    }
  }
  return(dist_matrix)
}

## compute the distance matrix
dist_mat_mah <- mahalanobis_dist_matrix(na.omit(edge_df_scale))
rownames(dist_mat_mah) <- dimnames(na.omit(edge_df_scale))[[1]]

# Find the indices of the closest districts
closest_indices_mah <- order(as.matrix(dist_mat_mah)[, which(rownames(dist_mat_mah)==fh_leaid)])

# Print the most similar districts
similar_districts_mah <- (edge_df %>% dplyr::filter(LEAID %in% rownames(dist_mat_mah))) [closest_indices_mah, ] %>% dplyr::mutate(mah_score=1:n()) %>% dplyr::select(LEAID, Geography, mah_score)

# Print table
datatable(similar_districts_mah,
  options = list(
    pageLength = 5,
    searching = TRUE,
    ordering = TRUE
  ),
  rownames = FALSE,
  caption = tags$caption(
    style = "caption-side: top; text-align: center;",
    "Most similar districts by Mahalanobis distance")
)

Principal Components

Another strategy for identifying peer districts is a dimensionality reduction technique called Principal Components Analysis. This is actually the same technique used to develop the original District Factor Groups. Though here I am using more data and more recent data. The central idea is to reduce our 40-dimensional (40-column) data set into a smaller set of components (or dimensions) that are maximally different from each other and uncorrelated with each other. These new components would contain the essence of the larger set of columns. As in the DFG calculation, I expect the first principal component to contain the variance associated with socioeconomic status (SES).

The general procedure is to:

  1. Compute the principal components on the data set
  2. Assess which dimensions load in which principal components
  3. Rank order component of interest, and find districts surrounding Fair Haven

If the first one or two components is not sufficient, we are in clustering territory, which will be addressed later.

Code - Using Principal Components
# Perform PCA
pcadat <- na.omit(edge_df[,-c(1:3)])
rownames(pcadat) <- make.unique(na.omit(edge_df)$Geography)
pca_result <- prcomp(pcadat, scale.=T, center = T)

#round(pca_result$sdev^2/sum(pca_result$sdev^2), 2) #first two ~49% variance, 37+12

# Extract principal components
pc_all_scores <- pca_result$x

#Visualize the reduced data
#autoplot(pca_result, data = pcadat, label = TRUE, label.size = 3)

# Get loadings
pca_loadings <- pca_result$rotation

# Transform loadings into a long format data frame
loadings_df <- as.data.frame(pca_loadings)
loadings_df$Variable <- rownames(pca_loadings)
loadings_long <- reshape2::melt(loadings_df, id.vars = "Variable", variable.name = "PC", value.name = "Loading")

In Figure 1, it is clear that Component 1 (PC1) is loading on income and education dimensions, which are qualitatively aligned with SES. However, Component 2 (PC2) is loading on other potentially relevant dimensions, such as family size and the population under 18. Note the differences below in the districts that surround Fair Haven for both PC1 and PC2.

Code
# Create heatmap
ggplot(loadings_long %>% filter(PC %in% c("PC1", "PC2")), aes(x = PC, y = Variable, fill = Loading)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "orange") +
  theme_minimal() +
  labs(title = "PCA Loadings", x="", y="")
Figure 1: Loadings from first two components of the PCA.

Clustering

A final approach to identifying peer districts is cluster analysis. Cluster analysis leverages distance functions to separate districts based on similarity, with the goal of creating groups of high similarity within clusters and high dissimilarity between clusters. I will explore both K-means and hierarchical clustering techniques, looking at a few examples of each.

K-means clusters

K-means clustering requires a distance matrix as input, and I have already created Euclidean and Mahalanobis distance matrices. I can also apply distance calculations to the PCA results in order to generate clusters based on the reduced dimensionality provided by principal components. Both scaled and un-scaled PCA results are clustered. The un-scaled version should over-weight PC1, and the scaled version should treat PC1 and PC2 more equally. The tabs below show the resulting peer groups from a standard K-means routine that assumes 7 clusters (a full treatment of K-means that searches for the optimal number of clusters is omitted in this document). Each tab includes a list of districts that were clustered alongside Fair Haven given the particular distance matrix as input.

Code - K-means clustering
## distance matrix from earlier
# Euclidean: edge_dist (557)
# Mahalanobis: dist_mat_mah (555)
# PCA: pc_all_scores (needs dist, scaled and unscaled) (555)

# Scale the principal component scores
scaled_pc_scores <- scale(pc_all_scores[,c(1:2)])
pc_dist_scaled <- dist(scaled_pc_scores)
pc_dist_unscaled <- dist(pc_all_scores[,c(1:2)])

# Apply k-means
n_clusters <- 6
km_pc_scaled <- kmeans(pc_dist_scaled, centers = n_clusters)
km_pc_unscaled <- kmeans(pc_dist_unscaled, centers = n_clusters)
km_euclid <- kmeans(edge_dist, centers = n_clusters)
km_maha <- kmeans(dist_mat_mah, centers = n_clusters)

# Access cluster assignments
clusters_pc_scaled <- km_pc_scaled$cluster
clusters_pc_unscaled <- km_pc_unscaled$cluster
clusters_euclid <- km_euclid$cluster
clusters_maha <- km_maha$cluster

fh_pc_scaled <- clusters_pc_scaled["Fair Haven Borough School District, NJ"]
fh_pc_unscaled <- clusters_pc_unscaled["Fair Haven Borough School District, NJ"]
fh_euclid <- clusters_euclid[fh_leaid]
fh_maha <- clusters_maha[fh_leaid]

mat_pc_scaled <- as_tibble(names(clusters_pc_scaled[clusters_pc_scaled==fh_pc_scaled])) %>% rename("PC Scaled"=value)
mat_pc_unscaled <- as_tibble(names(clusters_pc_unscaled[clusters_pc_unscaled==fh_pc_unscaled]))  %>% rename("PC Unscaled"=value)
mat_euclid <- as_tibble(names(clusters_euclid[clusters_euclid==fh_euclid]))  %>% rename("Euclidean"=value) %>%
  dplyr::left_join(edge_df %>% dplyr::select(Geography, LEAID) %>% mutate(LEAID=as.character(LEAID)), by = c("Euclidean"="LEAID"))
mat_maha <- as_tibble(names(clusters_maha[clusters_maha==fh_maha]))  %>% rename("Mahalanobis"=value) %>%
  dplyr::left_join(edge_df %>% dplyr::select(Geography, LEAID) %>% mutate(LEAID=as.character(LEAID)), by = c("Mahalanobis"="LEAID"))

Hierarchical

Another clustering strategy is called Hierarchical Clustering, which generates a full tree of districts based on similarity. One advantage of this technique is that one does not have to determine a priori how many clusters there are in the data. The tree can be cut at any level in order to find clusters of appropriate size.

As with K-means, the input to the hierarchical clustering routine is a distance matrix. There is another parameter included in the model to instruct how the clustering algorithm should combine districts with each other and with groups. This is often called the “method”, and here will explore a few methods. Again as with K-means, a full treatment of cluster analysis is not provided here. Rather, this exercise is to generate candidate sets of peers that will be qualitatively assessed in the next section.

The first tab below shows a partial dendrogram highlighting part of the full tree that contains a decent-sized cluster around Fair Haven. The other two tabs show the resulting peer groups from different clustering inputs.

Code - Hierarchical clustering
find_min_cluster <- function(hc, target_observation = "Fair Haven Borough", min_size = 50, verbose=FALSE) {
  # Function to find where to cut tree to generate smallest cluster with FH that has at least 50 districts
  n <- length(hc$labels)
  min_k <- NULL
  min_cluster_size <- Inf 

  for (k in 2:n) {
    clusters <- cutree(hc, k = k)
    target_cluster <- clusters[names(clusters) == target_observation]
    cluster_members <- names(clusters[clusters == target_cluster])
    cluster_size <- length(cluster_members)

    if (cluster_size >= min_size && cluster_size < min_cluster_size) {
      min_k <- k
      min_cluster_size <- cluster_size
    }
  }

  final_clusters <- cutree(hc, k = min_k)
  
  
  if (!is.null(final_clusters)) {
    
    # Check the size of the cluster that "Fair Haven Borough" belongs to.
    target_cluster <- final_clusters[names(final_clusters) == target_observation]
    cluster_members <- names(final_clusters[final_clusters == target_cluster]) # contains peers
    if (verbose) {
      print(paste("Size of cluster containing Fair Haven Borough:", length(cluster_members)))
      print(paste("Number of Clusters:", length(unique(final_clusters))))
    }
  } else {
    if (verbose) print("No cluster found with size >= 50 containing Fair Haven Borough.")
  }

  
  if (!is.null(min_k)) {
    return(list(final_clusters=final_clusters, target_cluster=target_cluster, cluster_members=cluster_members))
  } else {
    return(NULL)
  }
}

# Distance matrices:
#   pc_dist_scaled
#   pc_dist_unscaled
#   edge_dist
#   dist_mat_mah (SKIP_)

# Hierarchical clustering using Ward and WPGMA Linkage

## Euclidean
hc_euclid_ward <- hclust(edge_dist, method = "ward.D2" )
hc_euclid_mcquitty <- hclust(edge_dist, method="mcquitty")

hc_euclid_ward_clusters <- find_min_cluster(hc_euclid_ward, min_size = 50, target_observation = fh_leaid)
hc_euclid_mcquitty_clusters <- find_min_cluster(hc_euclid_mcquitty, min_size = 50, target_observation = fh_leaid)

## PC scaled
hc_pc_dist_scaled_ward <- hclust(pc_dist_scaled, method = "ward.D2" )
hc_pc_dist_scaled_mcquitty <- hclust(pc_dist_scaled, method="mcquitty")

hc_pc_dist_scaled_ward_clusters <- find_min_cluster(hc_pc_dist_scaled_ward, min_size = 50, target_observation = "Fair Haven Borough School District, NJ")
hc_pc_dist_scaled_mcquitty_clusters <- find_min_cluster(hc_pc_dist_scaled_mcquitty, min_size = 50, target_observation = "Fair Haven Borough School District, NJ")

## PC unscaled
hc_pc_dist_unscaled_ward <- hclust(pc_dist_unscaled, method = "ward.D2" )
hc_pc_dist_unscaled_mcquitty <- hclust(pc_dist_unscaled, method="mcquitty")

hc_pc_dist_unscaled_ward_clusters <- find_min_cluster(hc_pc_dist_unscaled_ward, min_size = 50, target_observation = "Fair Haven Borough School District, NJ")
hc_pc_dist_unscaled_mcquitty_clusters <- find_min_cluster(hc_pc_dist_unscaled_mcquitty, min_size = 40, target_observation = "Fair Haven Borough School District, NJ")
Code
# visualize the tree
# fviz_dend(hc_euclid_ward, k = 4, color_labels_by_k = TRUE, cex=.1, type="circular")

viztree <- function(datain, members, distfun, clustfun) {
  datain <- datain %>%
    dplyr::filter(LEAID %in% members)
  
  datain_scale <- scale(datain[,-c(1:3)])
  rownames(datain_scale) <- datain$LEAID

  datain_dist <- dist(datain_scale, method = distfun)
  
  newclust <- hclust(datain_dist, method = "ward.D2" )
  newclust$labels <- gsub(" School District, NJ", "", datain$Geography)
  
  fviz_dend(newclust, cex=.4, horiz=T)
}

viztree(edge_df, hc_euclid_ward_clusters$cluster_members, "euclidean", "ward.D2")
Figure 2: Partial dendrogram from Ward clustering (Euclidean)

Finally, I add the possible peer sets back to our CCD data set and save them as a CSV file for easy access.

Code - Update data with peer sets
# Create K-means cluster indicators in edge_df
edge_df <- edge_df %>% 
  dplyr::mutate(
    km_euclid = LEAID %in% mat_euclid$Euclidean,
    km_maha = LEAID %in% mat_maha$Mahalanobis,
    km_pc_scaled = Geography %in% mat_pc_scaled$`PC Scaled`,
    km_pc_unscaled = Geography %in% mat_pc_unscaled$`PC Unscaled`
  )
  
# Create hierarchical cluster indicators in edge_df
edge_df <- edge_df %>% 
  dplyr::mutate(
    hc_euclid_ward = LEAID %in% hc_euclid_ward_clusters$cluster_members,
    hc_euclid_mcquitty = LEAID %in% hc_euclid_mcquitty_clusters$cluster_members,
    hc_pc_dist_scaled_ward = Geography %in% hc_pc_dist_scaled_ward_clusters$cluster_members,
    hc_pc_dist_scaled_mcquitty = Geography %in% hc_pc_dist_scaled_mcquitty_clusters$cluster_members,
    hc_pc_dist_unscaled_ward = Geography %in% hc_pc_dist_unscaled_ward_clusters$cluster_members,
    hc_pc_dist_unscaled_mcquitty = Geography %in% hc_pc_dist_unscaled_mcquitty_clusters$cluster_members,
    LEAID = as.character(LEAID)
    )

# add peer options to 
ccdnjk12_peers <- ccdnjk12 %>% dplyr::mutate(LEAID = as.character(LEAID)) %>%
  dplyr::left_join(edge_df %>% dplyr::mutate(LEAID = as.character(LEAID)), by = "LEAID") %>%
  dplyr::left_join(similar_districts_euclid %>% dplyr::select(-Geography) %>% dplyr::mutate(LEAID = as.character(LEAID)), by = "LEAID") %>%
  dplyr::left_join(similar_districts_mah %>% dplyr::select(-Geography) %>% dplyr::mutate(LEAID = as.character(LEAID)), by = "LEAID") %>%
  dplyr::left_join(pc_all_scores_df %>%
                     dplyr::select(LEAID, PC1, PC2) %>%
                     dplyr::mutate(LEAID = as.character(LEAID)), by = "LEAID")
 
# Serialize
write_csv(ccdnjk12_peers, file="datasets/ccdnjk12_peers.csv")

Comparing Peer Sets

Thus far, I have created a number of candidate peer groups for Fair Haven. In fact, there are 14 possible sets to choose from (probably too many!):

  • Distance-based (pick districts closest to Fair Haven according to a distance metric)
    • Euclidean euclid_score
    • Mahalanobis mah_score
  • Principal Components (pick districts according to similarity across a reduced dimensional space)
    • PC1 PC1
    • PC2 PC2
  • K-means Clusters (pick districts based on groups derived from the K-means algorithm.
    • Euclidean distance km_euclid
    • Mahalanobis distance km_maha
    • First two principal components (scaled) km_pc_scaled
    • First two principal components (unscaled) km_pc_unscaled
  • Hierarchical Clusters
    • Euclidean - Ward hc_euclid_ward
    • Euclidean - WPGMA hc_euclid_mcquitty
    • PC Scaled - Ward hc_pc_dist_scaled_ward
    • PC Scaled - WPGMA hc_pc_dist_scaled_mcquitty
    • PC Unscaled - Ward hc_pc_dist_unscaled_ward
    • PC Unscaled - WPGMA hc_pc_dist_unscaled_mcquitty

The table below summarizes a selection of district characteristics for the districts that are included in each potential peer set (restricted to PK/K-8th grade). Fair Haven itself is included in the last row for comparison. The Peninsula column is TRUE if FH, Rumson, Little Silver, and Shrewsbury (Peninsula Peers) are all included in that particular peer set.

Code - Compare peer sets
edge_vars <- c("Edu_Bach_or_Higher", "Fam_Avg_Size", "HH_Married_Children", "Pop_Under_18", "HH_Med_Income", "Occ_Mgmt_Bus_Sci_Arts")

peerset_summary <- bind_rows(
  ccdnjk12_peers %>% 
    dplyr::filter(euclid_score <= 100) %>%
    dplyr::summarise(Method="Distance - Euclidean",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(mah_score <= 100) %>%
    dplyr::summarise(Method="Distance - Mahalanobis",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::arrange(PC1) %>%
    dplyr::slice((max(0,fh_pcaid1-30)):(fh_pcaid1+30)) %>%
    dplyr::summarise(Method="Principal Components 1",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::arrange(PC2) %>%
    dplyr::slice((max(0,fh_pcaid2-25)):(fh_pcaid2+25)) %>%
    dplyr::summarise(Method="Principal Components 2",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(km_euclid) %>%
    dplyr::summarise(Method="K-means (Euclidean)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(km_maha) %>%
    dplyr::summarise(Method="K-means (Mahalanobis)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(km_pc_scaled) %>%
    dplyr::summarise(Method="K-means (PC Scaled)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(km_pc_unscaled) %>%
    dplyr::summarise(Method="K-means (PC Unscaled)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(hc_euclid_ward) %>%
    dplyr::summarise(Method="Hierarchical (Euclidean Ward)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(hc_euclid_mcquitty) %>%
    dplyr::summarise(Method="Hierarchical (Euclidean WPGMA)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(hc_pc_dist_scaled_ward) %>%
    dplyr::summarise(Method="Hierarchical (PC Scaled Ward)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(hc_pc_dist_scaled_mcquitty) %>%
    dplyr::summarise(Method="Hierarchical (PC Scaled WPGMA)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(hc_pc_dist_unscaled_ward) %>%
    dplyr::summarise(Method="Hierarchical (PC Unscaled Ward)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(hc_pc_dist_unscaled_mcquitty) %>%
    dplyr::summarise(Method="Hierarchical (PC Unscaled WPGMA)",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1))),
  ccdnjk12_peers %>% 
    dplyr::filter(LEAID==fh_leaid) %>%
    dplyr::summarise(Method="Fair Haven",
                     Districts=n(),
                     Peninsula = all(c(fh_leaid, ls_leaid, rums_leaid, shrews_leaid) %in% LEAID),
                     across(all_of(edge_vars), \(x) round(mean(x), 1)))
)
  
peerset_summary %>% 
  dplyr::select(Method, Districts, Peninsula, HH_Med_Income, Edu_Bach_or_Higher, Occ_Mgmt_Bus_Sci_Arts, HH_Married_Children, Fam_Avg_Size) %>%
  gt(rowname_col = "Method") %>%
   tab_header(
    title = "Group Characteristics for Districts in each Peer Set"
  ) %>%
  tab_source_note("Data from ACS-EDGE 2018-22.") %>%
  opt_interactive()
Group Characteristics for Districts in each Peer Set
Data from ACS-EDGE 2018-22.

Actually selecting which peer set to use at this point is a combination of art and science. I would like enough districts in the set to provide sufficient variation, but not too many to make comparisons unwieldy - between 35 and 45 should work. Ideally the “Peninsula” peers are already in the set; otherwise I would manually add them in after. It seems like the Hierarchical (Euclidean Ward) set might offer good coverage. A list of these peers can be found above at Section 4.1.4.2, and upon inspection, they do seem reasonable. The group of peers has about 72% of the population with a Bachelor’s degree or higher (FH’s 79%), median household income of $192k (FH’s $230k), and 64% of the population working in management, business, science, or arts (FH’s 60%). From the table, one could argue that that are smaller peer sets that have districts more similar to FH, but I will proceed with the hierarchical cluster-generated solution.

This concludes the section on peer set generation for Fair Haven.

Financial Data (User Friendly Budgets)

In order to understand factors related to school district financing, a data source for financial data is required. The state of New Jersey’s Department of Education provides User Friendly Budgets, which contain district-level expenses and appropriations. I mostly rely on per-pupil costs, and save them as a CSV file for easy access.

Code - Read user friendly budgets
# FY 2024-25
## use year_4   if want 2023-24 Revised Budgeted Per Pupil Costs
## use year_2   if want 2022-23 Revised Budgeted Per Pupil Costs
## amount_2 10/13/23 Estimated Enrollment Counts
## amount_2 2023-24 Revised Budgeted Revenue Amounts
## amount_2 2023-24 Revised Budgeted Appropriated Amounts
## amount_1 10/13/22  Enrollment Counts
## amount_1 2022-23 Audited Budgeted Revenue Amounts
## amount_1 2022-23 Audited Appropriated Amounts

# Leaving these here in case we need them later.
# fin_enroll2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/enroll25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))
# fin_revenue2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/rev25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))
# fin_approp2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/approp25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))
# fin_recapit2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/recap25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))
# fin_unusual2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/unusual25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))
# fin_taxes2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/esttax25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))

fin_perpupil2425 <- read_csv("https://www.nj.gov/education/budget/ufb/2425/download/pupcst25_minus_1.csv") %>% dplyr::mutate(SCHOOL_YEAR="2024-2025", ST_LEAID=paste0("NJ-", county_id, district_id))

per_pupil_exp <- bind_cols(
  fin_perpupil2425 %>%
    dplyr::filter(line_desc=="Total Budgetary Comparative Per Pupil Cost") %>%
    dplyr::select(district_name, ST_LEAID, per_pupil_cost=year_4),
  
  fin_perpupil2425 %>%
    dplyr::filter(line_desc=="Total Classroom Instruction") %>%
    dplyr::select(per_pupil_class_instruction=year_4),
  
  fin_perpupil2425 %>%
    dplyr::filter(line_desc=="Total Administrative Costs") %>%
    dplyr::select(per_pupil_admin_cost=year_4)
)

write_csv(per_pupil_exp, file="datasets/per_pupil_exp.csv")
Code
colnames(per_pupil_exp)
[1] "district_name"               "ST_LEAID"                    "per_pupil_cost"             
[4] "per_pupil_class_instruction" "per_pupil_admin_cost"       

Report Card Data

Student outcomes data come from the NJ School Performance Reports. To gather longitudinal data, I ingest report card data from 2014-15 to 2023-24. The content and format of the report cards varies greatly from year to year. I makea best effort approach to join appropriately across years.

Code - Read in NJ Report Card data 2023-24
## School year 2023-2024 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2023-2024/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2023-24"
sheet_names_district <- excel_sheets(rc_district)

nsheet <- which(sheet_names_district == "EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district == "EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district == "StudentGrowth")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentGrowth <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(SubGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -SubGroup, -StateMedian, -MetTarget) %>%
  pivot_wider(names_from = "Subject", values_from = "DistrictMedian", names_prefix = "mSGP_")

nsheet <- which(sheet_names_district == "StudentGrowthByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentGrowthByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(sub_grade=paste(`ELA/Math`, Grade)) %>%
  dplyr::select(-Level, -Grade, -`ELA/Math`) %>%
  pivot_wider(names_from = "sub_grade", values_from = "mSGP", names_prefix = "mSGP_") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_notdis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAPerformanceTrends")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceTrends <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode)) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename("ELA_ME"=`% Met/Exceeded Expectation`) %>%
  pivot_wider(names_from="Grades", values_from="ELA_ME", names_prefix = "ELA_ME_")

nsheet <- which(sheet_names_district == "ELAPerformanceByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(ELA_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_notdis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathPerformanceTrends")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceTrends <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode)) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename("MATH_ME"=`% Met/Exceeded Expectation`, Grades= `Grades/Subject`) %>%
  pivot_wider(names_from="Grades", values_from="MATH_ME", names_prefix = "MATH_ME_")

nsheet <- which(sheet_names_district == "MathPerformanceByGradeTest")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceByGradeTest <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`Valid Scores`=="*", "0", `Valid Scores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`% Level 5`=="*", "0", `% Level 5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`))) %>%
  dplyr::ungroup()

MathPerformanceByGradeTest_nogr8 <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide", ! `Grade/Subject` %in% c("Grade 08", "ALG01", "ALG02", "GEO01")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`Valid Scores`=="*", "0", `Valid Scores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`% Level 5`=="*", "0", `% Level 5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5_nogr8=sum(n_level5)/sum(as.numeric(`Valid Scores`))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district == "StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district == "TeachersExperience")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district == "ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district == "DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb2324 <- StudentGrowth %>%
  dplyr::left_join(StudentGrowthByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceTrends, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceTrends, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest_nogr8, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(StudentToStaffRatios, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsbyGrade, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb2324, file="datasets/rc_comb2324.csv")
Code - Read in NJ Report Card data 2022-23
## School year 2022-2023 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2022-2023/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sheet_names_district <- excel_sheets(rc_district)

nsheet <- which(sheet_names_district == "EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district == "EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district == "StudentGrowth")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentGrowth <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(SubGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -SubGroup, -StateMedian, -MetTarget) %>%
  pivot_wider(names_from = "Subject", values_from = "DistrictMedian", names_prefix = "mSGP_")

nsheet <- which(sheet_names_district == "StudentGrowthByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentGrowthByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(sub_grade=paste(`ELA/Math`, Grade)) %>%
  dplyr::select(-Level, -Grade, -`ELA/Math`) %>%
  pivot_wider(names_from = "sub_grade", values_from = "mSGP", names_prefix = "mSGP_") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_notdis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAPerformanceTrends")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceTrends <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode)) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename("ELA_ME"=`% Met/Exceeded Expectation`) %>%
  pivot_wider(names_from="Grades", values_from="ELA_ME", names_prefix = "ELA_ME_")

nsheet <- which(sheet_names_district == "ELAPerformanceByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(ELA_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_notdis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathPerformanceTrends")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceTrends <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode)) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename("MATH_ME"=`% Met/Exceeded Expectation`, Grades= `Grades/Subject`) %>%
  pivot_wider(names_from="Grades", values_from="MATH_ME", names_prefix = "MATH_ME_")

nsheet <- which(sheet_names_district == "MathPerformanceByGradeTest")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceByGradeTest <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`Valid Scores`=="*", "0", `Valid Scores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`% Level 5`=="*", "0", `% Level 5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`))) %>%
  dplyr::ungroup()

MathPerformanceByGradeTest_nogr8 <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide", ! `Grade/Subject` %in% c("Grade 08", "ALG01", "ALG02", "GEO01")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`Valid Scores`=="*", "0", `Valid Scores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`% Level 5`=="*", "0", `% Level 5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5_nogr8=sum(n_level5)/sum(as.numeric(`Valid Scores`))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district == "StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district == "TeachersExperience")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district == "ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district == "DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear="2022-2023") %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb2223 <- StudentGrowth %>%
  dplyr::left_join(StudentGrowthByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceTrends, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceTrends, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest_nogr8, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(StudentToStaffRatios, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsbyGrade, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb2223, file="datasets/rc_comb2223.csv")
Code - Read in NJ Report Card data 2021-22
## School year 2021-22 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2021-2022/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2021-2022"

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district == "EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district == "EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_notdis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "ELAPerformanceTrends")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceTrends <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode)) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::rename("ELA_ME"=`% Met/Exceeded Expectation`) %>%
  pivot_wider(names_from="Grades", values_from="ELA_ME", names_prefix = "ELA_ME_")

nsheet <- which(sheet_names_district == "ELAPerformanceByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(ELA_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathParticipationPerformance")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Groups`=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_notdis=`District: % of testers met/exceeded expectations`)

nsheet <- which(sheet_names_district == "MathPerformanceTrends")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceTrends <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode)) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::rename("MATH_ME"=`% Met/Exceeded Expectation`, Grades= `Grades/Subject`) %>%
  pivot_wider(names_from="Grades", values_from="MATH_ME", names_prefix = "MATH_ME_")

nsheet <- which(sheet_names_district == "MathPerformanceByGradeTest")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceByGradeTest <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`Valid Scores`=="*", "0", `Valid Scores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`% Level 5`=="*", "0", `% Level 5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

MathPerformanceByGradeTest_nogr8 <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide", ! `Grade/Subject` %in% c("Grade 08", "ALG01", "ALG02", "GEO01")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`Valid Scores`=="*", "0", `Valid Scores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`% Level 5`=="*", "0", `% Level 5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5_nogr8=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

nsheet <- which(sheet_names_district == "StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district == "TeachersExperience")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district == "ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district == "DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb2122 <- ELAParticipationPerformance %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceTrends, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceTrends, by=c("ST_LEAID","SchoolYear", "DistrictName")) %>%
  dplyr::left_join(MathPerformanceByGradeTest, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest_nogr8, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(StudentToStaffRatios, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsbyGrade, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb2122, file="datasets/rc_comb2122.csv")
Code - Read in NJ Report Card data 2020-21
## School year 2020-21 district data - no testing this year
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2020-2021/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2020-2021"

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district == "EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district == "EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district == "StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district == "TeachersExperience")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district == "ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district == "DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb2021 <- StudentToStaffRatios %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsbyGrade, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb2021, file="datasets/rc_comb2021.csv")
Code - Read in NJ Report Card data 2019-20
## School year 2019-20 district data - no testing this year
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2019-2020/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2019-2020"

sheet_names_district <- excel_sheets(rc_district)
print(sheet_names_district)

nsheet <- which(sheet_names_district=="EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district=="EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district=="StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district=="TeachersExperience")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

rc_comb1920 <- StudentToStaffRatios %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(EnrollmentTrendsbyGrade, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb1920, file="datasets/rc_comb1920.csv")
Code - Read in NJ Report Card data 2018-19
## School year 2018-19 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2018-2019/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2018-2019"

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district=="EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district=="EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_notdis=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyPerformanceByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(n_level5=(as.numeric(LEVEL_5)/100)*as.numeric(ValidScores)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(ELA_pct_level5=sum(n_level5)/sum(as.numeric(ValidScores))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_notdis=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathPerformanceByGradeTest")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceByGradeTest <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`ValidScores`=="*", "0", `ValidScores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`LEVEL_5`=="*", "0", `LEVEL_5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

MathPerformanceByGradeTest_nogr8 <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide", ! `Grade` %in% c("Grade 08", "ALG01", "ALG02", "GEO01")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`ValidScores`=="*", "0", `ValidScores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`LEVEL_5`=="*", "0", `LEVEL_5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5_nogr8=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

nsheet <- which(sheet_names_district=="StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district=="TeachersExperience")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district=="ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district=="DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb1819 <- EnrollmentTrendsbyGrade %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest_nogr8, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(StudentToStaffRatios, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb1819, file="datasets/rc_comb1819.csv")
Code - Read in NJ Report Card data 2017-18
## School year 2017-18 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2017-2018/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2017-2018"

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district=="EnrollmentTrendsbyGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::rename(TotalEnrollment=Total)

nsheet <- which(sheet_names_district=="EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))


ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_notdis=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyPerformanceByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(n_level5=(as.numeric(LEVEL_5)/100)*as.numeric(ValidScores)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(ELA_pct_level5=sum(n_level5)/sum(as.numeric(ValidScores))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_notdis=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathPerformanceByGradeTest")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceByGradeTest <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`ValidScores`=="*", "0", `ValidScores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`LEVEL_5`=="*", "0", `LEVEL_5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

MathPerformanceByGradeTest_nogr8 <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`Student Group`=="Districtwide", ! `Grade` %in% c("Grade 08", "ALG01", "ALG02", "GEO01")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`ValidScores`=="*", "0", `ValidScores`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`LEVEL_5`=="*", "0", `LEVEL_5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5_nogr8=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

nsheet <- which(sheet_names_district=="StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District, Student_Admin_District)

nsheet <- which(sheet_names_district=="TeachersExperienceCertification")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district=="ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -CountyName, -DistrictCode, -DistrictName, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district=="DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb1718 <- EnrollmentTrendsbyGrade %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest_nogr8, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(StudentToStaffRatios, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb1718, file="datasets/rc_comb1718.csv")
Code - Read in NJ Report Card data 2016-17
## School year 2016-17 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2016-2017/Database_DistrictStateDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2016-2017"

#New Jersey transitioned to the PARCC exams in mathematics and ELA beginning in the 2014-15 school year.

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district=="EnrollmentTrendsByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  pivot_wider(names_from="Grade", values_from = "Count") %>%
  dplyr::rename_with(~str_remove_all(., " "), .cols = everything())

nsheet <- which(sheet_names_district=="EnrollmentTrendsByStudentGroup")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  pivot_wider(names_from="StudentGroup", values_from = "Percent") %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_notdis=DistrictPerformance)

nsheet <- which(sheet_names_district=="ELALiteracyPerformanceByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAPerformanceByGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`StudentGroup`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::mutate(n_level5=(as.numeric(LEVEL_5)/100)*as.numeric(`VALID SCORES`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(ELA_pct_level5=sum(n_level5)/sum(as.numeric(`VALID SCORES`))) %>%
  dplyr::ungroup()

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students with Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathParticipationPerform")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_notdis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(StudentGroup=="Students without Disabilities") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_notdis=DistrictPerformance)

nsheet <- which(sheet_names_district=="MathPerformanceByGradeTest")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathPerformanceByGradeTest <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`StudentGroup`=="Districtwide") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`VALID SCORES`=="*", "0", `VALID SCORES`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`LEVEL_5`=="*", "0", `LEVEL_5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

MathPerformanceByGradeTest_nogr8 <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(`StudentGroup`=="Districtwide", ! `Grade_Subject` %in% c("Grade 08", "ALG01", "ALG02", "GEO01")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::mutate(`Valid Scores`=if_else(`VALID SCORES`=="*", "0", `VALID SCORES`)) %>%
  dplyr::mutate(`% Level 5`=if_else(`LEVEL_5`=="*", "0", `LEVEL_5`)) %>%
  dplyr::mutate(n_level5=(as.numeric(`% Level 5`)/100)*as.numeric(`Valid Scores`)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(MATH_pct_level5_nogr8=sum(n_level5)/sum(as.numeric(`Valid Scores`)))

nsheet <- which(sheet_names_district=="StudentToStaffRatios")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

StudentToStaffRatios <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, Student_Teacher_District=TeacherRatio, Student_Admin_District=AdministratorsRatio)

nsheet <- which(sheet_names_district=="TeachersExperienceCertification")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

TeachersExperience <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::select(-CountyCode, -DistrictCode) %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Teacher") | starts_with("Percent")) %>%
  dplyr::select(!ends_with("State"))

nsheet <- which(sheet_names_district=="ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism<- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  dplyr::filter(StudentGroup=="Districtwide")  %>%
  dplyr::select(-CountyCode, -DistrictCode, -StudentGroup, -MetTarget, -Target)

nsheet <- which(sheet_names_district=="DaysAbsent")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", CountyCode, DistrictCode), SchoolYear=sy) %>%
  pivot_wider(names_from="Subgroup", values_from="PCT") %>%
  dplyr::rename(Abs_0=`0 Absences`, `Abs_1-5`=`1-5 Absences`,   `Abs_6-10`=`6-10 Absences`, `Abs_11-15`=`11-15 Absences`, `Abs_over15`=`15+ Absences`) %>%
  dplyr::select(-CountyCode, -DistrictCode)

rc_comb1617 <- EnrollmentTrendsbyGrade %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAPerformanceByGrade, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_notdis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathPerformanceByGradeTest_nogr8, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(StudentToStaffRatios, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(TeachersExperience, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear"))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb1617, file="datasets/rc_comb1617.csv")
Code - Read in NJ Report Card data 2015-16
## School year 2015-16 district data
url <- "https://navilp7rg08njprsharedst.blob.core.windows.net/perf-reports-ct/DataFiles/2015-2016/Database_SchoolDetail.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2015-2016"

## looks like a parcc year

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district=="EnrollmentByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::mutate(across(PK:RowTotal, ~replace_na(as.numeric(.), 0)))  %>%
  dplyr::rename(GradePK=PK, GradeKG=KG) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(across(where(is.numeric), \(x) sum(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::rename(TotalEnrollment=RowTotal)

nsheet <- which(sheet_names_district=="EnrollmentByGrade")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTotalsBySchool <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::mutate(across(PK:RowTotal, ~replace_na(as.numeric(.), 0)))  %>%
  dplyr::rename(GradePK=PK, GradeKG=KG) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(across(where(is.numeric), \(x) sum(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::rename(TotalEnrollment=RowTotal)

nsheet <- which(sheet_names_district=="EnrollmentBySpecPop")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsByStudentGroup <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE, -LEP) %>%
  dplyr::rename(`Students with Disabilities`=Disability, `Economically Disadvantaged Students`=`Economically Disadvantaged`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(`Students with Disabilities`=TotalEnrollment*`Students with Disabilities`/100) %>%
  dplyr::mutate(`Economically Disadvantaged Students`=TotalEnrollment*`Economically Disadvantaged Students`/100) %>%
  #dplyr::filter(startsWith(ST_LEAID, fh_staid)) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(`Students with Disabilities`=round(100*sum(`Students with Disabilities`)/sum(TotalEnrollment))) %>%
  dplyr::mutate(`Economically Disadvantaged Students`=round(100*sum(`Economically Disadvantaged Students`)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`, `Economically Disadvantaged Students`)

EnrollmentDisBySchool <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE, -LEP) %>%
  dplyr::rename(`Students with Disabilities`=Disability, `Economically Disadvantaged Students`=`Economically Disadvantaged`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(`Students with Disabilities`=TotalEnrollment*`Students with Disabilities`/100) %>%
  dplyr::select(ST_LEAID, SchoolYear, `Students with Disabilities`)

nsheet <- which(sheet_names_district=="PARCC_ELA_MetExcMathEng")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(Type=="Schoolwide", !VALID_SCORES %in% c("S", "N")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=`SchoolPerf/MetStandards`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(ELA_ME_District=TotalEnrollment*as.numeric(ELA_ME_District)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(ELA_ME_District=round(100*sum(ELA_ME_District)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District)

nsheet <- which(sheet_names_district=="PARCC_ELA_MetExcMathEng")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(Type=="Students with Disability", !VALID_SCORES %in% c("S", "N")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis=`SchoolPerf/MetStandards`) %>%
  dplyr::left_join(EnrollmentDisBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(ELA_ME_District_dis=`Students with Disabilities`*as.numeric(ELA_ME_District_dis)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(ELA_ME_District_dis=round(100*sum(ELA_ME_District_dis)/sum(`Students with Disabilities`))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District_dis)

nsheet <- which(sheet_names_district=="PARCC_Math_MetExcMathEng")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(Type=="Schoolwide", !VALID_SCORES %in% c("S", "N")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=`SchoolPerf/MetStandards`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(MATH_ME_District=TotalEnrollment*as.numeric(MATH_ME_District)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(MATH_ME_District=round(100*sum(MATH_ME_District)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District)

nsheet <- which(sheet_names_district=="PARCC_Math_MetExcMathEng")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance_dis <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(Type=="Students with Disability", !VALID_SCORES %in% c("S", "N")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis=`SchoolPerf/MetStandards`) %>%
  dplyr::left_join(EnrollmentDisBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(MATH_ME_District_dis=`Students with Disabilities`*as.numeric(MATH_ME_District_dis)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(MATH_ME_District_dis=round(100*sum(MATH_ME_District_dis)/sum(`Students with Disabilities`))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District_dis)

nsheet <- which(sheet_names_district=="ChronicAbsenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, Chronic_Abs_Pct=SchoolChronicAbsenteeism) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(Chronic_Abs_Pct=TotalEnrollment*as.numeric(Chronic_Abs_Pct)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(Chronic_Abs_Pct=round(100*sum(Chronic_Abs_Pct)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup()  %>%
  dplyr::select(ST_LEAID, SchoolYear, Chronic_Abs_Pct)

nsheet <- which(sheet_names_district=="Absenteeism")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

DaysAbsent <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::rename(Abs_0=`0Absences`,  `Abs_1-5`=`1-5Absences`,    `Abs_6-10`=`6-10Absences`,  `Abs_11-15`=`11-15Absences`, `Abs_over15`=`15+Absences`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(Abs_0=TotalEnrollment*as.numeric(Abs_0)/100) %>%
  dplyr::mutate(`Abs_1-5`=TotalEnrollment*as.numeric(`Abs_1-5`)/100) %>%
  dplyr::mutate(`Abs_6-10`=TotalEnrollment*as.numeric(`Abs_6-10`)/100) %>%
  dplyr::mutate(`Abs_11-15`=TotalEnrollment*as.numeric(`Abs_11-15`)/100) %>%
  dplyr::mutate(`Abs_over15`=TotalEnrollment*as.numeric(`Abs_over15`)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(Abs_0=round(100*sum(Abs_0)/sum(TotalEnrollment))) %>%
  dplyr::mutate(`Abs_6-10`=round(100*sum(`Abs_6-10`)/sum(TotalEnrollment))) %>%
  dplyr::mutate(`Abs_1-5`=round(100*sum(`Abs_1-5`)/sum(TotalEnrollment))) %>%
  dplyr::mutate(`Abs_11-15`=round(100*sum(`Abs_11-15`)/sum(TotalEnrollment))) %>%
  dplyr::mutate(`Abs_over15`=round(100*sum(`Abs_over15`)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup()  %>%
  dplyr::select(ST_LEAID, SchoolYear, starts_with("Abs"))

rc_comb1516 <- EnrollmentTrendsbyGrade %>%
  dplyr::left_join(EnrollmentTrendsByStudentGroup, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ELAParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance_dis, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(DaysAbsent, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::mutate(across(where(is.numeric), \(x) if_else(is.infinite(x), NA, x)))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb1516, file="datasets/rc_comb1516.csv")
Code - Read in NJ Report Card data 2014-15
## School year 2014-15 district data
url <- "https://nj.gov/education/schoolperformance/archive/201415/2015PRDATABASE.xlsx"

# Download the file temporarily
rc_district <- tempfile(fileext = ".xlsx")
download.file(url, destfile = rc_district, mode = "wb")
sy <- "2014-2015"

#New Jersey transitioned to the PARCC exams in mathematics and ELA beginning in the 2014-15 school year.
## So this should be first year where everyone used PARCC

sheet_names_district <- excel_sheets(rc_district)
# print(sheet_names_district)

nsheet <- which(sheet_names_district=="ENROLLMENT")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTrendsbyGrade <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `CO CODE`, `DIST CODE`), SchoolYear=sy) %>%
  dplyr::select(-`CO CODE`, -`DIST CODE`, -`SCH CODE`) %>%
  dplyr::mutate(across(PK:ROWTOTAL, ~replace_na(as.numeric(.), 0)))  %>%
  dplyr::rename(GradePK=PK, GradeKG=KG, Grade01=GRADE01,
                Grade02=GRADE02,
                Grade03=GRADE03,
                Grade04=GRADE04,
                Grade05=GRADE05,
                Grade06=GRADE06,
                Grade07=GRADE07,
                Grade08=GRADE08,
                Grade09=GRADE09,
                Grade10=GRADE10,
                Grade11=GRADE11,
                Grade12=GRADE12) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(across(where(is.numeric), \(x) sum(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::rename(TotalEnrollment=ROWTOTAL)

nsheet <- which(sheet_names_district=="ENROLLMENT")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

EnrollmentTotalsBySchool <-   get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `CO CODE`, `DIST CODE`, `SCH CODE`), SchoolYear=sy) %>%
  dplyr::select(-`CO CODE`, -`DIST CODE`, -`SCH CODE`) %>%
  dplyr::mutate(across(PK:ROWTOTAL, ~replace_na(as.numeric(.), 0)))  %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::summarize(across(where(is.numeric), \(x) sum(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::rename(TotalEnrollment=ROWTOTAL) %>%
  dplyr::select(ST_LEAID, SchoolYear, TotalEnrollment)

nsheet <- which(sheet_names_district=="PARCC ELA")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ELAParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(TYPE=="Schoolwide", !`VALID SCORES` %in% c("S", "N")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District=`MET_EXCEED`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(ELA_ME_District=TotalEnrollment*as.numeric(ELA_ME_District)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(ELA_ME_District=round(100*sum(ELA_ME_District)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, ELA_ME_District)

nsheet <- which(sheet_names_district=="PARCC MATH")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

MathParticipationPerformance <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::filter(TYPE=="Schoolwide", !`VALID SCORES` %in% c("S", "N")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District=`MET_EXCEED`) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(MATH_ME_District=TotalEnrollment*as.numeric(MATH_ME_District)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(MATH_ME_District=round(100*sum(MATH_ME_District)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup() %>%
  dplyr::select(ST_LEAID, SchoolYear, MATH_ME_District)

nsheet <- which(sheet_names_district=="CHRONIC ABSENTEEISM")
assign(paste0(sheet_names_district[nsheet], "_D"), read_excel(rc_district, sheet=sheet_names_district[nsheet]))

ChronicAbsenteeism <- get(paste0(sheet_names_district[nsheet], "_D")) %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_CODE, DISTRICT_CODE, SCHOOL_CODE), SchoolYear=sy) %>%
  dplyr::select(-COUNTY_CODE, -DISTRICT_CODE, -SCHOOL_CODE) %>%
  dplyr::select(ST_LEAID, SchoolYear, Chronic_Abs_Pct=PERCENT_CHRONIC_ABSENT) %>%
  dplyr::left_join(EnrollmentTotalsBySchool, by=c("ST_LEAID", "SchoolYear")) %>%
  dplyr::mutate(Chronic_Abs_Pct=TotalEnrollment*as.numeric(Chronic_Abs_Pct)/100) %>%
  dplyr::mutate(ST_LEAID= str_sub(ST_LEAID, start=1, end=9)) %>%
  dplyr::group_by(ST_LEAID, SchoolYear) %>%
  dplyr::mutate(Chronic_Abs_Pct=round(100*sum(Chronic_Abs_Pct)/sum(TotalEnrollment))) %>%
  dplyr::summarize(across(where(is.numeric), \(x) max(x, na.rm = TRUE))) %>%
  dplyr::ungroup()  %>%
  dplyr::select(ST_LEAID, SchoolYear, Chronic_Abs_Pct)

rc_comb1415 <- EnrollmentTrendsbyGrade %>%
  dplyr::left_join(ELAParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(MathParticipationPerformance, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::left_join(ChronicAbsenteeism, by=c("ST_LEAID","SchoolYear")) %>%
  dplyr::mutate(across(where(is.numeric), \(x) if_else(is.infinite(x), NA, x)))

rm(list=ls(pattern = "_D$"))
write_csv(rc_comb1415, file="datasets/rc_comb1415.csv")

Essential report card data are joined into one report card data set, saved as a CSV for easy access.

Code - Read in NJ Report Card data merge
rc_comb2324 <- rc_comb2324 %>% dplyr::mutate(across(everything(), as.character))
rc_comb2223 <- rc_comb2223 %>% dplyr::mutate(across(everything(), as.character))
rc_comb2122 <- rc_comb2122 %>% dplyr::mutate(across(everything(), as.character))
rc_comb2021 <- rc_comb2021 %>% dplyr::mutate(across(everything(), as.character))
rc_comb1920 <- rc_comb1920 %>% dplyr::mutate(across(everything(), as.character))
rc_comb1819 <- rc_comb1819 %>% dplyr::mutate(across(everything(), as.character))
rc_comb1718 <- rc_comb1718 %>% dplyr::mutate(across(everything(), as.character))
rc_comb1617 <- rc_comb1617 %>% dplyr::mutate(across(everything(), as.character))
rc_comb1516 <- rc_comb1516 %>% dplyr::mutate(across(everything(), as.character))
rc_comb1415 <- rc_comb1415 %>% dplyr::mutate(across(everything(), as.character))
  
rc_all_comb <- bind_rows(rc_comb2324, rc_comb2223, rc_comb2122, rc_comb2021,  rc_comb1920, rc_comb1819, rc_comb1718, rc_comb1617, rc_comb1516, rc_comb1415)

write_csv(rc_all_comb, file="datasets/rc_all_comb.csv")
Code
colnames(rc_all_comb)
 [1] "DistrictName"                                          "ST_LEAID"                                             
 [3] "SchoolYear"                                            "mSGP_ELA"                                             
 [5] "mSGP_Math"                                             "mSGP_ELA Grade 4"                                     
 [7] "mSGP_ELA Grade 5"                                      "mSGP_ELA Grade 6"                                     
 [9] "mSGP_ELA Grade 7"                                      "mSGP_ELA Grade 8"                                     
[11] "mSGP_Math Grade 4"                                     "mSGP_Math Grade 5"                                    
[13] "mSGP_Math Grade 6"                                     "mSGP_Math Grade 7"                                    
[15] "ELA_ME_District"                                       "ELA_ME_District_dis"                                  
[17] "ELA_ME_District_notdis"                                "ELA_ME_Grade 03"                                      
[19] "ELA_ME_Grade 04"                                       "ELA_ME_Grade 05"                                      
[21] "ELA_ME_Grade 06"                                       "ELA_ME_Grade 07"                                      
[23] "ELA_ME_Grade 08"                                       "ELA_ME_Grade 09"                                      
[25] "ELA_pct_level5"                                        "MATH_ME_District"                                     
[27] "MATH_ME_District_dis"                                  "MATH_ME_District_notdis"                              
[29] "MATH_ME_Algebra I"                                     "MATH_ME_Grade 03"                                     
[31] "MATH_ME_Grade 04"                                      "MATH_ME_Grade 05"                                     
[33] "MATH_ME_Grade 06"                                      "MATH_ME_Grade 07"                                     
[35] "MATH_ME_Grade 08"                                      "MATH_ME_Algebra II"                                   
[37] "MATH_ME_Geometry"                                      "MATH_pct_level5"                                      
[39] "MATH_pct_level5_nogr8"                                 "Student_Teacher_District"                             
[41] "Student_Admin_District"                                "TeacherCount_School"                                  
[43] "TeacherAvgYearsExp_School"                             "TeacherAvgYearsExpInDistrict_School"                  
[45] "PercentageTeacherInDistrict4YearsOrMore_School"        "PercentageOfOutOfFieldTeachers"                       
[47] "PercentageTeachersWithProvisionalCredentials_School"   "Chronic_Abs_Count"                                    
[49] "Chronic_Abs_Pct"                                       "Students with Disabilities"                           
[51] "Economically Disadvantaged Students"                   "GradePK"                                              
[53] "GradeKG"                                               "Grade01"                                              
[55] "Grade02"                                               "Grade03"                                              
[57] "Grade04"                                               "Grade05"                                              
[59] "Grade06"                                               "Grade07"                                              
[61] "Grade08"                                               "Grade09"                                              
[63] "Grade10"                                               "Grade11"                                              
[65] "Grade12"                                               "TotalEnrollment"                                      
[67] "mSGP_Math Grade 8"                                     "TeacherCount_District"                                
[69] "TeacherAvgYearsExp_District"                           "TeacherAvgYearsExpInDistrict_District"                
[71] "PercentageTeacherInDistrict4YearsOrMore_District"      "PercentageOfOutOfFieldTeachers_District"              
[73] "PercentageTeachersWithProvisionalCredentials_District" "Abs_0"                                                
[75] "Abs_1-5"                                               "Abs_6-10"                                             
[77] "Abs_11-15"                                             "Abs_over15"                                           
[79] "TeacherInDistrict4YearsOrMore_District"                "Ungraded"                                             
[81] "TeacherCount"                                          "TeacherAvgYearsExpInSchool"                           
[83] "TeacherAvgYearsExpInDistrict"                          "TeacherInDistrict4YearsOrMore"                        
[85] "Percent"                                               "UNGRADED"                                             

Enrollments

Official enrollment trend data is retrieved from the NJ Department of Education and stored in a CSV.

Code - Enrollments
# Function to download, unzip, and get sheet names
download_unzip_sheets <- function(url) {
  temp_file <- tempfile(fileext = ".zip")
  download.file(url, destfile = temp_file, mode = "wb")
  out <- unzip(temp_file, exdir = tempdir())
  sheet_names <- excel_sheets(out)
  return(list(sheet_names = sheet_names, unzipped_path = out))
}

# 2023-24

url <- "https://www.nj.gov/education/doedata/enr/enr24/enrollment_2324.zip"
nsheet <- 3
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=2)

District_D23 <- raw %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `County Code`, `District Code`), SchoolYear="2023-2024") %>%
  dplyr::select(-`County Code`, -`County Name`, -`District Code`)


# 2022-23
url <- "https://www.nj.gov/education/doedata/enr/enr23/enrollment_2223.zip"
nsheet <- 3
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=2)

District_D22 <- raw %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `County Code`, `District Code`), SchoolYear="2022-2023") %>%
  dplyr::select(-`County Code`, -`County Name`, -`District Code`)

# 2021-22
url <- "https://www.nj.gov/education/doedata/enr/enr22/enrollment_2122.zip"
nsheet <- 3
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=2)

District_D21 <- raw %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `County Code`, `District Code`), SchoolYear="2021-2022") %>%
  dplyr::select(-`County Code`, -`County Name`, -`District Code`)

# 2020-21
url <- "https://www.nj.gov/education/doedata/enr/enr21/enrollment_2021.zip"
nsheet <- 3
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=2)

District_D20 <- raw %>% 
  dplyr::mutate(ST_LEAID=paste0("NJ-", `County Code`, `District Code`), SchoolYear="2020-2021") %>%
  dplyr::select(-`County Code`, -`County Name`, -`District Code`)

# 2019-20
url <- "https://www.nj.gov/education/doedata/enr/enr20/enrollment_1920.zip"
nsheet <- 3
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=2)

District_D19 <- raw %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `County Code`, `District Code`), SchoolYear="2019-2020") %>%
  dplyr::select(-`County Code`, -`County Name`, -`District Code`)

## Things get different here

# 2018-19
url <- "https://www.nj.gov/education/doedata/enr/enr19/enrollment_1819.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=2)

District_D18 <- raw %>%
  dplyr::filter(School_Name=="District Total", Grade_Level=="Total", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", County_ID, Dist_ID), SchoolYear="2018-2019") %>%
  dplyr::select(-County_ID, -County_Name, -Dist_ID, -School_ID, -School_Name)

# 2017-18
url <- "https://www.nj.gov/education/doedata/enr/enr18/enrollment_1718.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=1)

District_D17 <- raw %>%
  dplyr::filter(School_Name=="District Total", Grade_Level=="Total", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", County_ID, Dist_ID), SchoolYear="2017-2018") %>%
  dplyr::select(-County_ID, -County_Name, -Dist_ID, -School_ID, -School_Name)

# 2016-17
url <- "https://www.nj.gov/education/doedata/enr/enr17/enrollment_1617.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=0)

District_D16 <- raw %>%
  dplyr::filter(SCHOOL_NAME=="District Total", GRADE_LEVEL=="TOTAL", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_ID, DIST_ID), SchoolYear="2016-2017") %>%
  dplyr::select(-COUNTY_ID, -COUNTY_NAME, -DIST_ID, -SCHOOL_ID, -SCHOOL_NAME)

# 2015-16
url <- "https://www.nj.gov/education/doedata/enr/enr16/enrollment_1516.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=0)

District_D15 <- raw %>%
  dplyr::filter(SCHOOL_NAME=="District Total", GRADE_LEVEL=="TOTAL", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_ID, DIST_ID), SchoolYear="2015-2016") %>%
  dplyr::select(-COUNTY_ID, -COUNTY_NAME, -DIST_ID, -SCHOOL_ID, -SCHOOL_NAME)

# 2014-15
url <- "https://www.nj.gov/education/doedata/enr/enr15/enrollment_1415.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=0)

District_D14 <- raw %>%
  dplyr::filter(SCHOOL_NAME=="District Total", GRADE_LEVEL=="TOTAL", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_ID, DIST_ID), SchoolYear="2014-2015") %>%
  dplyr::select(-COUNTY_ID, -COUNTY_NAME, -DIST_ID, -SCHOOL_ID, -SCHOOL_NAME)

# 2013-14
url <- "https://www.nj.gov/education/doedata/enr/enr14/enrollment_1314.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=0)

District_D13 <- raw %>% 
  dplyr::filter(SCHOOL_NAME=="District Total", GRADE_LEVEL=="TOTAL", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", COUNTY_ID, DIST_ID), SchoolYear="2013-2014") %>%
  dplyr::select(-COUNTY_ID, -COUNTY_NAME, -DIST_ID, -SCHOOL_ID, -SCHOOL_NAME)

# 2012-13
url <- "https://www.nj.gov/education/doedata/enr/enr13/enrollment_1213.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=0)

District_D12 <- raw %>%
  dplyr::filter(`SCHOOL NAME`=="District Total", GRADE_LEVEL=="TOTAL", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `COUNTY CODE`, `DISTRICT CODE`), SchoolYear="2012-2013") %>%
  dplyr::select(-`COUNTY CODE`, -`DISTRICT CODE`, -`COUNTY NAME`, -`SCHOOL CODE`, -`SCHOOL NAME`)

# 2011-12
url <- "https://www.nj.gov/education/doedata/enr/enr12/enrollment_1112.zip"
nsheet <- 1
edat <- download_unzip_sheets(url)
raw <- read_excel(edat$unzipped_path, sheet=edat$sheet_names[nsheet], skip=0)

District_D11 <- raw %>%
  dplyr::filter(`School Name`=="District Total", PRGCODE=="55") %>%
  dplyr::mutate(ST_LEAID=paste0("NJ-", `COUNTY_ID`, `DIST_ID`), SchoolYear="2011-2012") %>%
  dplyr::select(-`COUNTY_ID`, -`DIST_ID`, -`County Name`, -`SCHOOL_ID`, -`School Name`)


enr_19_23 <- bind_rows(
  District_D23 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District Name`, TotalEnrollment=`Total Enrollment`),
  District_D22 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District Name`, TotalEnrollment=`Total Enrollment`),
  District_D21 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District Name`, TotalEnrollment=`Total Enrollment`),
  District_D20 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District Name`, TotalEnrollment=`Total Enrollment`),
  District_D19 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District Name`, TotalEnrollment=`Total Enrollment`)
)

enr_11_22 <- bind_rows(
  District_D18 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District_Name`, TotalEnrollment=`Row_Total`),
  District_D17 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District_Name`, TotalEnrollment=`Row_Total`),
  District_D16 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`LEA_NAME`, TotalEnrollment=`ROW_TOTAL`),
  District_D15 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`LEA_NAME`, TotalEnrollment=`ROW_TOTAL`),
  District_D14 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`LEA_NAME`, TotalEnrollment=`ROW_TOTAL`),
  District_D13 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`LEA_NAME`, TotalEnrollment=`ROW_TOTAL`),
  District_D12 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`DISTRICT NAME`, TotalEnrollment=`ROW_TOTAL`),
  District_D11 %>% dplyr::select(ST_LEAID, SchoolYear, DistrictName=`District Name`, TotalEnrollment=`ROW_TOTAL`)
)

enr_all <- bind_rows(enr_11_22, enr_19_23)

rm(list=ls(pattern="^District"))

# Serialize
write_csv(enr_all, file="datasets/enr_all.csv")
Code
colnames(enr_all)
[1] "ST_LEAID"        "SchoolYear"      "DistrictName"    "TotalEnrollment"

Bringing the data together

Though there are times when it will be useful for individual data sets to be used alone, it is also useful to have one dataset that contains the latest data available, which is the 2023-24 school year as of this writing.

Code - Bringing data together
comb2324 <- ccdnjk12_peers %>%
  dplyr::left_join(per_pupil_exp, by="ST_LEAID") %>%
  dplyr::left_join(rc_comb2324, by="ST_LEAID") %>%
  dplyr::left_join(enr_all %>% dplyr::filter(SchoolYear=="2023-24") %>% dplyr::select(ST_LEAID, OfficialTotalEnrollment=TotalEnrollment) , by="ST_LEAID")

comb2324 <- comb2324 %>%
  dplyr::mutate(across(starts_with("per_pupil"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("mSGP"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("ELA"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("MATH"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("Chronic"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("Abs_"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("Teacher_"), as.numeric)) %>%
  dplyr::mutate(across(starts_with("Percentage"), as.numeric)) %>%
  dplyr::mutate(student_teacher_ratio = sub(":.*", "", Student_Teacher_District),
                student_admin_ratio = sub(":.*", "", Student_Admin_District))

comb2324 <- comb2324 %>%
  dplyr::mutate(dist_colors = case_when(
    district_name %in% names(district_name_colors) ~ district_name,
    TRUE ~ "Other"
  ))

write_csv(comb2324, file="datasets/comb2324.csv")
Code
colnames(comb2324)
  [1] "SCHOOL_YEAR"                                         "ST"                                                 
  [3] "LEA_NAME"                                            "ST_LEAID"                                           
  [5] "LEAID"                                               "LSTREET1"                                           
  [7] "LSTREET2"                                            "LSTREET3"                                           
  [9] "LCITY"                                               "LZIP"                                               
 [11] "WEBSITE"                                             "G_PK_OFFERED"                                       
 [13] "GSLO"                                                "GSHI"                                               
 [15] "GeoId"                                               "Geography"                                          
 [17] "Edu_Less_9th"                                        "Edu_9th_12th_No_Diploma"                            
 [19] "Edu_HS_Grad"                                         "Edu_Some_College"                                   
 [21] "Edu_Assoc_Degree"                                    "Edu_Bach_Degree"                                    
 [23] "Edu_Grad_Prof_Degree"                                "Edu_HS_or_Higher"                                   
 [25] "Edu_Bach_or_Higher"                                  "Emp_Labor_Force"                                    
 [27] "Emp_Fem_Labor_Force"                                 "Emp_Unemp_Rate"                                     
 [29] "Enroll_Nursery"                                      "Enroll_Kindergarten"                                
 [31] "Enroll_Elem"                                         "Enroll_HS"                                          
 [33] "Enroll_College"                                      "HH_Married_Children"                                
 [35] "HH_Avg_Size"                                         "Fam_Avg_Size"                                       
 [37] "Housing_Units"                                       "Occ_Mgmt_Bus_Sci_Arts"                              
 [39] "Occ_Service"                                         "Occ_Sales_Office"                                   
 [41] "Occ_Nat_Res_Const_Maint"                             "Occ_Prod_Trans_Mat_Mov"                             
 [43] "Race_White"                                          "Race_Black"                                         
 [45] "Race_Amer_Indian_Alaska"                             "Race_Asian"                                         
 [47] "Race_Hawaiian_Pacific"                               "Race_Other"                                         
 [49] "Pop_Under_18"                                        "HH_Med_Income"                                      
 [51] "HH_Mean_Income"                                      "Fam_Med_Income"                                     
 [53] "Fam_Mean_Income"                                     "Fam_Income_150k_199k"                               
 [55] "Fam_Income_200k_plus"                                "HH_Income_150k_199k"                                
 [57] "HH_Income_200k_plus"                                 "km_euclid"                                          
 [59] "km_maha"                                             "km_pc_scaled"                                       
 [61] "km_pc_unscaled"                                      "hc_euclid_ward"                                     
 [63] "hc_euclid_mcquitty"                                  "hc_pc_dist_scaled_ward"                             
 [65] "hc_pc_dist_scaled_mcquitty"                          "hc_pc_dist_unscaled_ward"                           
 [67] "hc_pc_dist_unscaled_mcquitty"                        "euclid_score"                                       
 [69] "mah_score"                                           "PC1"                                                
 [71] "PC2"                                                 "district_name"                                      
 [73] "per_pupil_cost"                                      "per_pupil_class_instruction"                        
 [75] "per_pupil_admin_cost"                                "DistrictName"                                       
 [77] "SchoolYear"                                          "mSGP_ELA"                                           
 [79] "mSGP_Math"                                           "mSGP_ELA Grade 4"                                   
 [81] "mSGP_ELA Grade 5"                                    "mSGP_ELA Grade 6"                                   
 [83] "mSGP_ELA Grade 7"                                    "mSGP_ELA Grade 8"                                   
 [85] "mSGP_Math Grade 4"                                   "mSGP_Math Grade 5"                                  
 [87] "mSGP_Math Grade 6"                                   "mSGP_Math Grade 7"                                  
 [89] "ELA_ME_District"                                     "ELA_ME_District_dis"                                
 [91] "ELA_ME_District_notdis"                              "ELA_ME_Grade 03"                                    
 [93] "ELA_ME_Grade 04"                                     "ELA_ME_Grade 05"                                    
 [95] "ELA_ME_Grade 06"                                     "ELA_ME_Grade 07"                                    
 [97] "ELA_ME_Grade 08"                                     "ELA_ME_Grade 09"                                    
 [99] "ELA_pct_level5"                                      "MATH_ME_District"                                   
[101] "MATH_ME_District_dis"                                "MATH_ME_District_notdis"                            
[103] "MATH_ME_Algebra I"                                   "MATH_ME_Grade 03"                                   
[105] "MATH_ME_Grade 04"                                    "MATH_ME_Grade 05"                                   
[107] "MATH_ME_Grade 06"                                    "MATH_ME_Grade 07"                                   
[109] "MATH_ME_Grade 08"                                    "MATH_ME_Algebra II"                                 
[111] "MATH_ME_Geometry"                                    "MATH_pct_level5"                                    
[113] "MATH_pct_level5_nogr8"                               "Student_Teacher_District"                           
[115] "Student_Admin_District"                              "TeacherCount_School"                                
[117] "TeacherAvgYearsExp_School"                           "TeacherAvgYearsExpInDistrict_School"                
[119] "PercentageTeacherInDistrict4YearsOrMore_School"      "PercentageOfOutOfFieldTeachers"                     
[121] "PercentageTeachersWithProvisionalCredentials_School" "Chronic_Abs_Count"                                  
[123] "Chronic_Abs_Pct"                                     "Students with Disabilities"                         
[125] "Economically Disadvantaged Students"                 "GradePK"                                            
[127] "GradeKG"                                             "Grade01"                                            
[129] "Grade02"                                             "Grade03"                                            
[131] "Grade04"                                             "Grade05"                                            
[133] "Grade06"                                             "Grade07"                                            
[135] "Grade08"                                             "Grade09"                                            
[137] "Grade10"                                             "Grade11"                                            
[139] "Grade12"                                             "TotalEnrollment"                                    
[141] "OfficialTotalEnrollment"                             "student_teacher_ratio"                              
[143] "student_admin_ratio"                                 "dist_colors"                                        

Table of created data sets

The following table is a handy reference to some of the main data sets created in this document, which have been made available as CSV files.

Name Description
ccdnjk12_peers.csv Common Core of Data (2023-24), ACS-EDGE, and FH peer group sets. Only contains PK/K-8 districts
comb2324.csv Common Core of Data, ACS-EDGE, Budget data, Report Card data, all for PK/K-8 districts 2023-24
edge_df.csv ACS-EDGE for all NJ districts
enr_all.csv NJ District Enrollment counts from 2013-2014 to 2023-2024 school years, all NJ districts
per_pupil_exp.csv Per pupil expenditures for 2023-24 from 2022-23 User Friendly Budgets, all NJ districts
rc_all_comb.csv Report Card data for all NJ districts from 2014-15 to 2023-24