Health Code Violations in New York City Public Schools

Anna Ward
4 min readSep 29, 2020

Currently, many changes are occurring in New York City, and I would like to focus on the health system and the educational system, which through research and analysis with data and visualization, I found to be highly connected. A healthy environment is key to fostering a good education, and therefore, I wanted to explore how the best ranked schools in New York in regard to health compare to the schools that are ranked best academically.

I started by finding a dataset on NYC Open Data, which derived from the Department of Education’s data on New York public schools. Given the current pandemic, I thought it would be interesting to take a look at how schools were monitored health wise before COVID-19. This data derives from the 2018–2019 school year. Perhaps regulations were a bit more lenient before our country realized how truly important the safety and well-being of our community is, and how fast negative health effects can spread. What I will explore is what areas of New York have the highest scores for health and what types of violations are causing problems in the school system.

You can find the dataset here to inquire further: https://data.cityofnewyork.us/Health/DOHMH-Childcare-Center-Inspections/dsg6-ifza

The dataset provided me with a great deal of valuable information about schools and their health code violations, however to make the dataset less cumbersome I removed some irrelevent details in order to make visibility easier. For instance, I found the building number, street, and zip code of the school irrelevent, and decided just to focus on the borough.

Now after cleaning up my dataset a bit, I wanted to focus on certain variables and what they mean. First, I wanted to see which borough had the highest violation rate percentage, so I ran a code that groups each borough together with their violation rate percentage, and from there I will be able to see which one has the highest percentage. I will start with Staten Island, and their average violation rate. I am going to break down the dataset to just look at this specific borough, then I will find the average violation rate.

Now to summarize, after finding the average violation rate of each borough here were the results:

Staten Island: 24.03%
Bronx: 58.02%
Queens: 44.52%
Brooklyn: 33.90%
Manhattan: 30.66%

As you can see from the results, Staten Island had the lowest violation rate, whereas the Bronx had the highest. This was interesting since Staten Island would be considered more of a suburban rather than urban area, and therefore perhaps the school systems are less equipped to deal with health hazards in a more urban setting. This aligns with past research I have done, but it was interesting to see the results come to life through data.

Next, I will create a visualization of these results by creating a bar chart that shows each borough and their average violation rate. First, in order to do this I need to create a new data table that just includes the borough and their average violation rate.

Now that I have looked into health code violations in the New York public school system, I thought it would be a good idea to look back to my first project about the language program implementation system in the public school system. Is there a connection between better performing schools and their health standards?

Based on these results, as mentioned earlier, the Bronx had the highest health code violation rate. Because of this, I looked into the average school quality of the Bronx borough by finding a data set on the New York School Quality Index, provided by the USA Today Network.

You can find the data set here: http://rochester.nydatabases.com/database/new-york-school-quality-index-%E2%80%93-usa-today-network

I filtered it by the Bronx to get the data that I needed.

To be perfectly clear, the New York School Quality Index takes into account non-academic factors when determining school quality. This includes, attendance, number of suspensions, class size, and diversity.

After studying this data set, I calculated the average school quality rate in the Bronx and these were my results:

The Bronx had an overall school quality rating of 72.49%.

Since Staten Island had the lowest rate of health violations, I wanted to see what their school quality rating was. I used the same method as I did for the Bronx and filtered my data set by the Richmond district, since that is where Staten Island is located.

For Staten Island, I got an average overall school quality rating of 81.04%.

These results went along with my prediction that a higher rated school would have better health protocols.

Once again, I wanted to visualize this data so I created a new data set with just the Bronx and Staten Island, and turned it into another bar plot, as can be seen below.

To conclude, I found the two datasets to be incredibly helpful in making a comparison to see the connection between health quality in schools compared to overall school quality and satisfaction ratings.

To see my R project with the code, you can follow this link:

https://rstudio.cloud/project/1749071

--

--