knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(tidyverse)
library(readxl)
library(plotly)
library(sf)
library(tmap)
library(tmaptools)
library(shinyjs)
library(ggridges)
library(patchwork)
theme_set(theme_minimal() + theme(legend.position = "right"))
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis"
)
scale_colour_discrete = scale_colour_viridis_d
scale_fill_discrete = scale_fill_viridis_d
Project Name: Operation Lorax - Data Speaks for the Trees!
Project Members: Stephanie Calluori (sdc2157), Zander De Jesus (agd2159), Sharon Kulali (sjk2254), Christine Kuryla (clk2162), Grace Santos (gvs2113)
In this exploratory study, we aim to examine the geographic distribution and attributes of street trees across NYC and potential associations with sociodemographic variables and health outcomes.
There is a growing body of evidence demonstrating the beneficial effects of green space exposure on human health (Yang et al., 2021). To better inform public health interventions, researchers are diving deeper into the specific aspects of nature, such as urban trees, and their effects on health. Most studies have observed positive effects of urban trees on physical activity and a collection of health outcomes, including respiratory, cardiovascular, and mental health ([Wolf et al.,2020] (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7345658/)). However, further exploration of health outcomes is needed. In addition, few studies have examined how urban tree exposure may differ across socio-demographic groups ([Wolf et al., 2020] (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7345658/)). Here, we focus on New York City using street tree data from 2015. In this exploratory study, we aim to examine the geographic distribution and attributes of street trees across NYC and potential associations with sociodemographic variables and health outcomes.
This research will aid in informing public health interventions to promote equitable access to urban street trees and improve health. While our analyses are retrospective, our work will lay a foundation that can be used as a comparison point when the next street tree census is released in 2025, allowing researchers to see how trends may have changed between 2015 and 2025.
Our project focuses on four overarching themes that center around the 2015 Street Tree Census: geographic distribution of street trees, tree attributes, access to trees across various socio-demographic groups, and tree exposure and health outcomes.
The 2015 NYC Street Tree Census includes variables referring to various geographic resolutions (e.g., zip code, nta, borough, community district) according to the 2010 Census. Originally, we suspected boroughs would be too large of a resolution to make comparisons and planned on using zip codes. However, due to the multitude of zip codes in NYC and variation in their sizes (e.g., single streets, larger areas), we thought zip codes might be too fine of a resolution. We decided to use Neighborhood Tabulation Areas (NTAs) to approximate neighborhoods, affording us a reasonable level of resolution to make comparisons as well as an action level of resolution to inform future green space based interventions. More information about NTAs can be found here: Neighborhood Boundaries Description and NTA Map of NYC. As our analyses progressed, we realized that NTAs sometimes resulted in too many geographies to compare. In addition, some qualities or outcomes of interest were only available for certain geographic resolutions. Thus, the spatial resolution of interest depended on the analysis being conducted. To supplement the 2015 data, we leaned on our shared interest in Environmental Justice efforts and the growing evidence of the health benefits of green space, to also include socio-demographic datasets to combine with the tree dataset. Since there is a current gap in the literature examining how urban tree exposures may differ across socio-demographic groups, we aim to determine if there are any racial/ethnic groups that have increased or limited opportunity to access street trees in NYC.
Finally, since we all are interested in environmental health, we aimed to examine potential effects of exposure to trees on people’s health. Based on data availability at the NTA and community district level as well as potential biological plausibility, we chose to examine asthma and heat stress related outcomes.
How are trees geographically distributed throughout New York City?
How does access to trees vary by socio-demographic categories by borough and neighborhoods?
Is there a relationship between tree distribution and health outcomes related to asthma and heat stress?
How are different tree attributes, quality, and health related to each other, and does this differ by borough?
trees_2015 =
read_csv("large_tree_data/2015_tree_raw.csv", na = c("", "NA", "Unknown")) |>
janitor::clean_names() |>
mutate(spc_common = str_to_title(spc_common)) |>
mutate(health = fct_relevel(health, c("Good", "Fair", "Poor")))
tree_id
, block_id
,
tree_status
(alive, standing dead, stump),
spc_common
(species name), borough
,
nta
(nta code), nta_name
,
latitude
, longitude
, and many more!poverty_raw <-
read_csv("large_tree_data/NYC EH Data Portal - Neighborhood poverty.csv") |>
janitor::clean_names()
poverty_clean <- poverty_raw |>
rename(nta_name = geography, poverty_percent = percent) |>
filter(geo_type == "NTA2010" & time == "2013-17") |>
select(nta_name, poverty_percent)
education_raw <-
read_csv("large_tree_data/NYC EH Data Portal - Graduated high school.csv") |>
janitor::clean_names()
education_clean <- education_raw |>
rename(nta_name = geography, graduated_hs_percent = percent) |>
filter(geo_type == "NTA2010" & time == "2013-17") |>
select(nta_name, graduated_hs_percent)
asthma_ed_adults_raw =
read_csv("small_data/NYC EH Data Portal_Asthma emergency department visits adults.csv") |>
janitor::clean_names()
asthma_ed_adults_clean = asthma_ed_adults_raw |>
rename(nta_name = geography) |>
filter(geo_type == "NTA2010" & time == "2017-2019") |>
select(nta_name, average_annual_age_adjusted_rate_per_10_000, average_annual_rate_per_10_000)
asthma_hosp_adult_raw <-
read_csv("small_data/NYC EH Data Portal - Asthma hospitalizations adults.csv") |>
janitor::clean_names()
asthma_hosp_adult_clean <- asthma_hosp_adult_raw |>
rename(nta_name = geography) |>
filter(geo_type == "NTA2010" & time == "2012-2014") |>
select(nta_name, average_annual_age_adjusted_rate_per_10_000, average_annual_rate_per_10_000)
pop_race_2010 = readxl::read_excel("small_data/pop_race2010_nta.xlsx", skip = 6, col_names = FALSE, na = "NA") |>
rename(
"borough" = "...1" ,
"census_FIPS" = "...2",
"nta" = "...3",
"nta_name" = "...4",
"total_population" = "...5",
"white_nonhis" = "...6",
"black_nonhis" = "...7",
"am_ind_alaska_nonhis" = "...8",
"asian_nonhis" = "...9",
"hawaii_pac_isl_nonhis" = "...10",
"other_nonhis" = "...11",
"two_races" = "...12",
"hispanic_any_race" = "...13"
) |>
drop_na()
acres_raw <-
read_excel("small_data/t_pl_p5_nta.xlsx",
range = "A9:J203",
col_names = c("borough", "county_code", "nta", "nta_name",
"total_pop_2000", "total_pop_2010", "pop_change_num",
"pop_change_per", "total_acres", "persons_per_acre")
) |>
janitor::clean_names()
acres_sub <- acres_raw |>
select(nta_name, total_acres, nta)
trees_per_nta <- trees_2015 |>
select(nta_name, nta, borough, status) |>
filter(status == "Alive") |>
count(nta_name, borough) |>
rename(num_trees = n)
trees_and_acres <- left_join(trees_per_nta, acres_sub, by = "nta_name")
num_missing_nta <- sum(is.na(trees_and_acres$total_acres))
trees_per_acre_df <- trees_and_acres |>
filter(!is.na(total_acres)) |>
mutate(trees_per_acre = num_trees/total_acres) |>
arrange(desc(trees_per_acre))
nyc_heatstress_2015 = read_csv("small_data/nyc_Heatstress Hospitalizations_2011-2015.csv", na = c(" ", "NA", "***")) |>
janitor::clean_names() |>
filter(geo_type_desc == "Community District") |>
mutate(
average_annual_age_adjusted_rate_per_100_000 =
str_replace(average_annual_age_adjusted_rate_per_100_000, "[*]$", "")) |>
select(geo_id, avg_heat_hospit = average_annual_age_adjusted_rate_per_100_000, heat_hospitalizations = total) |>
mutate(avg_heat_hospit = as.numeric(avg_heat_hospit))
nyc_finepm2015 = read_csv("small_data/nyc_pm2.5_annual2015.csv") |>
janitor::clean_names() |>
filter(time == "Annual Average 2015" & geo_type_desc == "Community District") |>
select(geo_id, geography, avg_pm2.5 = mean_mcg_m3)
pop_race_2010s =
pop_race_2010 |>
select(- borough, -nta_name)
all_data = left_join(trees_2015, pop_race_2010s, by = "nta")
number_tree =
all_data |>
group_by(borough) |>
summarize(n_trees = n_distinct(tree_id)) |>
mutate(borough = fct_reorder(borough, n_trees))
ggplot(data = number_tree, aes(x = borough, y = n_trees, fill = borough)) + geom_bar(stat = 'identity') + geom_text(aes(label = n_trees), size = 3, vjust = -1) +
labs(
title = "Number of Trees in each Borough",
x = "Borough",
y = "Total Number of Trees"
)
When the 2015 Tree Census data for all boroughs is compared, Queens has the largest total number of trees and Manhattan has the least.
top_trees_nyc <- trees_2015 %>%
drop_na(spc_common) %>%
group_by(spc_common) %>%
count(spc_common) %>%
arrange(desc(n)) %>%
head(n = 10) %>%
pull(spc_common)
trees_2015 %>%
drop_na(spc_common, borough) %>%
group_by(spc_common, borough) %>%
count(spc_common) %>%
filter(spc_common %in% top_trees_nyc) %>%
pivot_wider(names_from = borough, values_from = n) %>%
mutate(`All Boroughs Average` = mean(Bronx:`Staten Island`)) %>%
select(spc_common, `All Boroughs Average`, everything()) %>%
pivot_longer(cols = `All Boroughs Average`:`Staten Island`) %>%
ggplot(aes(x = reorder(spc_common, value), y = value, fill = spc_common)) +
geom_bar(stat = "identity") +
facet_wrap(~ name) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "Tree Species",
y = "Number of Trees",
fill = "Tree Species",
title = "Top Ten Trees Overall and in Each NYC Borough")
This side-by-side bar chart reveals that the top 10 species vary greatly by borough. Staten Island shows the closest resemblance to the all borough average and Queens has the highest number of each top 10 species, except for the London Planetree.
trees_2015 %>%
drop_na() %>%
group_by(borough, spc_common) %>%
count() %>%
arrange(borough, desc(n)) %>%
group_by(borough) %>%
slice_max(n = 10, order_by = n) %>%
select(-n) %>%
group_by(borough) %>%
mutate(row_num = row_number()) %>%
pivot_wider(names_from = borough, values_from = spc_common) %>%
unnest(cols = c(Bronx, Brooklyn, Manhattan, Queens, `Staten Island`)) %>%
select(row_num, everything()) %>%
knitr::kable(caption = "Top 10 Tree Species in Each Borough")
row_num | Bronx | Brooklyn | Manhattan | Queens | Staten Island |
---|---|---|---|---|---|
1 | Honeylocust | London Planetree | Honeylocust | London Planetree | Callery Pear |
2 | London Planetree | Honeylocust | Callery Pear | Pin Oak | London Planetree |
3 | Pin Oak | Pin Oak | Ginkgo | Honeylocust | Red Maple |
4 | Callery Pear | Japanese Zelkova | Pin Oak | Norway Maple | Pin Oak |
5 | Japanese Zelkova | Callery Pear | Sophora | Callery Pear | Cherry |
6 | Cherry | Littleleaf Linden | London Planetree | Cherry | Sweetgum |
7 | Littleleaf Linden | Norway Maple | Japanese Zelkova | Littleleaf Linden | Honeylocust |
8 | Norway Maple | Sophora | Littleleaf Linden | Japanese Zelkova | Norway Maple |
9 | Ginkgo | Cherry | American Elm | Green Ash | Silver Maple |
10 | Sophora | Ginkgo | American Linden | Silver Maple | Maple |
This table allows for comparison of which species are most common throughout New York City. Interestingly, there is no species that is included within the top 3 species across all boroughs.
trees_dbh = trees_2015 |>
drop_na(tree_dbh) |>
group_by(borough, nta_name) |>
summarize(
n_trees = n(),
avg_dbh = mean(tree_dbh)) |>
arrange(desc(avg_dbh))
trees_dbh |>
ungroup() |>
filter(n_trees > 100) |>
filter(min_rank(desc(avg_dbh)) < 11) |>
knitr::kable(digits = 2, caption= "Top 10 Neighborhoods with the Highest Average DBH (inches) Across Local Street Trees")
borough | nta_name | n_trees | avg_dbh |
---|---|---|---|
Queens | Kew Gardens | 2024 | 15.56 |
Queens | Woodhaven | 4254 | 15.41 |
Brooklyn | Flatlands | 5589 | 15.40 |
Staten Island | New Dorp-Midland Beach | 5452 | 15.24 |
Brooklyn | East Flatbush-Farragut | 3008 | 15.01 |
Brooklyn | Georgetown-Marine Park-Bergen Beach-Mill Basin | 7442 | 15.01 |
Queens | South Ozone Park | 7321 | 14.97 |
Queens | Jamaica Estates-Holliswood | 4254 | 14.73 |
Queens | Oakland Gardens | 6059 | 14.73 |
Queens | Auburndale | 5332 | 14.59 |
This table reveals that Queens has 6 neighborhoods within the top 10 for average DBH.
borough_dbh = trees_2015 |>
drop_na(tree_dbh) |>
group_by(borough) |>
mutate(borough_dbh = mean(tree_dbh))
borough_dbh |>
filter(tree_dbh < 101) |>
plot_ly(y = ~tree_dbh, color = ~borough, type = "box", colors = "viridis") |>
layout(title = "Distribution of DBH in Each Borough",
xaxis = list(title = 'Borough'),
yaxis = list(title = 'Measured DBH (inches)'),
showlegend = FALSE)
The boxplots shown here all have a 25% quartile around 5 inches, but vary in 75% quartile. Manhattan had both the smallest difference between 25% and 75% quartile bounds and the smallest range of confidence intervals. The largest outliers belong to the Bronx.
borough_dbh |>
filter(tree_dbh < 50) |>
ggplot(aes(x = tree_dbh, fill = borough)) +
geom_density() + facet_grid(borough ~ .) +
scale_x_continuous(breaks = scales::pretty_breaks(12)) +
labs(
title = "Density Plot of All Measured Tree Diameters Below 50 inches",
x = "Measured DBH (Inches)",
y = "Density"
)
This density plot shows the spread of adult tree age in each borough, while eliminating some outliers shown in the previous box plot. The wider the right tail extends, the greater the number of older trees that exist in that borough, so this plot shows that Queens has the greatest number of older trees out of all boroughs.
dem_race =
pop_race_2010 |>
group_by(borough) |>
summarize(white = sum(white_nonhis),
black = sum(black_nonhis),
asian = sum(asian_nonhis),
am_ind_alaska = sum(am_ind_alaska_nonhis),
hawaii_pac_island = sum(hawaii_pac_isl_nonhis),
other = sum(other_nonhis),
hispanic_all_races = sum(hispanic_any_race),
two_races = sum(two_races)
) |>
pivot_longer(
white:two_races,
names_to = "race",
values_to = "n"
) |>
mutate(borough = fct_reorder(borough, n))
ggplot(data = dem_race, aes(x = borough, y = n, fill = race)) + geom_bar(stat = 'identity', position = 'dodge') +
labs(
title = "Population of Borough by Race",
x = "Borough",
y = "Total Population"
)
From these side-by-side bar charts, it can be concluded that the majority racial/ethnic group is White for all boroughs except for the Bronx, where Hispanic (all races) is the majority.
borough_data =
all_data |>
group_by(borough) |>
summarize(n_trees = n_distinct(tree_id),
n_white = (sum(white_nonhis)/sum(total_population)*100),
n_black = (sum(black_nonhis)/sum(total_population)*100),
n_asian = (sum(asian_nonhis)/sum(total_population)*100),
n_pacific_islander = (sum(hawaii_pac_isl_nonhis)/sum(total_population)*100),
n_amer_ind = (sum(am_ind_alaska_nonhis)/sum(total_population)*100),
n_other = (sum(other_nonhis)/sum(total_population)*100),
n_hispanic = (sum(hispanic_any_race)/sum(total_population)*100),
n_two_races = (sum(two_races)/sum(total_population)*100)
) |>
mutate(n_trees = as.numeric(n_trees)) |>
arrange(desc(n_trees))
borough_data |>
knitr::kable(digits = 3, caption = "Trees and Population Demographics by Borough")
borough | n_trees | n_white | n_black | n_asian | n_pacific_islander | n_amer_ind | n_other | n_hispanic | n_two_races |
---|---|---|---|---|---|---|---|---|---|
Queens | 250551 | 29.840 | 17.494 | 22.617 | 0.051 | 0.303 | 1.601 | 25.424 | 2.670 |
Brooklyn | 177293 | 34.912 | 33.732 | 9.614 | 0.025 | 0.187 | 0.439 | 19.488 | 1.603 |
Staten Island | 105318 | 71.685 | 6.017 | 6.866 | 0.030 | 0.122 | 0.190 | 13.858 | 1.233 |
Bronx | 85203 | 12.830 | 30.513 | 3.339 | 0.029 | 0.256 | 0.643 | 51.208 | 1.184 |
Manhattan | 65423 | 50.743 | 13.488 | 9.241 | 0.032 | 0.132 | 0.327 | 24.153 | 1.885 |
The table above includes summary statistics on the total number of trees, along with the percentage of the population belonging to each racial/ethnic group in each borough. Despite Queens having the most number of trees, it has the least skewed population demographics out of all boroughs. Staten Island has the highest percentage of white residents, with more than 20% more than the next highest percentage in Manhattan.
tree_status_df <- trees_2015 %>%
select(curb_loc, guards, sidewalk, health, status,
borough,
root_stone, root_grate, root_other,
trunk_wire, trnk_light, trnk_other,
brch_light, brch_shoe, brch_other) %>%
mutate(health = fct_relevel(health, c("Good", "Fair", "Poor")),
borough = as.factor(borough),
All = as.factor("All"),
curb_loc = case_match(curb_loc,
"OffsetFromCurb" ~ "Offset From Curb",
"OnCurb" ~ "On Curb"),
curb_loc = as.factor(curb_loc),
guards = as.factor(guards),
sidewalk = case_match(sidewalk,
"NoDamage" ~ "No Damage",
"Damage" ~ "Damage"),
sidewalk = as.factor(sidewalk),
status = as.factor(status)
) %>%
rename(`Curb Location` = curb_loc,
`All of NYC` = All,
Borough = borough,
Guards = guards,
Sidewalk = sidewalk,
`Alive, Dead, or Stump` = status,
Health = health,
`Roots with Stones` = root_stone,
`Roots in Grate` = root_grate,
`Root Problem (Other)` = root_other,
`Trunk has Wires` = trunk_wire,
`Trunk has Lights` = trnk_light,
`Trunk has Other Problem` = trnk_other,
`Branches have Lights` = brch_light,
`Branches with Shoes` = brch_shoe,
`Branches Problem (Other)` = brch_other) %>%
select(Borough, `All of NYC`, everything())
exposure_options <- colnames(tree_status_df)[3:length(tree_status_df)]
outcome_options <- colnames(tree_status_df)[3:length(tree_status_df)]
# Define function
effective_tree_protection <- function(df, exposure_str, outcome_str){
# Convert character strings to symbols
exposure_sym <- rlang::sym(exposure_str)
outcome_sym <- rlang::sym(outcome_str)
# Make contingency table for the chi-square test
contingency_table <- df %>%
filter(!is.na(!!exposure_sym), !is.na(!!outcome_sym)) %>%
group_by(!!outcome_sym, !!exposure_sym) %>%
count(!!outcome_sym, !!exposure_sym) %>%
pivot_wider(names_from = !!exposure_sym, values_from = n)
# Perform the chi-square test
chisq_test_result <- chisq.test(contingency_table[-1])
chi_square <- chisq_test_result$statistic
p_value <- chisq_test_result$p.value
# Visualize the distribution differences
plot_title <- str_c("Distribution of ", exposure_str,
" by ", outcome_str)
df %>%
filter(!is.na(!!exposure_sym), !is.na(!!outcome_sym)) %>%
group_by(!!exposure_sym, !!outcome_sym) %>%
summarize(n = n(), .groups = 'drop') %>%
group_by(!!exposure_sym) %>%
mutate(Percent = n / sum(n) * 100) %>%
ggplot(aes(x = !!exposure_sym, y = Percent, fill = !!outcome_sym)) +
geom_bar(stat = 'identity', position = 'dodge') +
geom_text(aes(label = sprintf("%.1f%%", Percent)),
position = position_dodge(width = 0.9),
size = 3.5,
vjust = -0.3) +
labs(title = plot_title,
subtitle = sprintf("p value = %.1e (Chi-square = %.2f)", p_value, chi_square))
}
effective_tree_protection(df = tree_status_df,
exposure_str = "All of NYC",
outcome_str = "Health")
effective_tree_protection(df = tree_status_df,
exposure_str = "All of NYC",
outcome_str = "Sidewalk")
effective_tree_protection(df = tree_status_df,
exposure_str = "All of NYC",
outcome_str = "Roots with Stones")
This figure shows the distribution of different attributes of trees and their status across all of NYC. For example, we see that 81.1% of trees are in “good” health, 14.8% of trees are in “fair” health, and 4.1% of trees are in “poor” health. Notably, 28.7% of sidewalks are damaged (71.2% not damaged). The majority of trees (87.8%) do not have guards and 20.5% of tree roots have stones.
effective_tree_protection(df = tree_status_df,
exposure_str = "Borough",
outcome_str = "Health")
Compared to the other four boroughs, Manhattan had the lowest percentage of trees in “good” health, and the highest percentage of trees in both “fair” and “poor” health. Overall in NYC, 81.1% of trees were in “good” health, 14.8% of trees were in “fair” health, and 4.1% were in “poor” health.
nyc_nta = st_read("small_data/geo_export_e924b274-8b6e-427d-87c0-92bfde8ce30a.shp")
## Reading layer `geo_export_e924b274-8b6e-427d-87c0-92bfde8ce30a' from data source `C:\Users\Kulal\OneDrive - Columbia University Irving Medical Center\Fall 2023\Data Science\P8105\p8105_final\small_data\geo_export_e924b274-8b6e-427d-87c0-92bfde8ce30a.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 195 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS: WGS84(DD)
nta_dbh_summarized = trees_2015 |>
drop_na(tree_dbh) |>
group_by(nta) |>
summarize(
trees_per_nta = n(),
avg_dbh = mean(tree_dbh))
nta_dbh_spatial = merge(nyc_nta, nta_dbh_summarized,
by.x = "ntacode",
by.y = "nta")
tm_shape(nta_dbh_spatial) +
tm_polygons(col = "trees_per_nta",
style = "quantile",
n = 5,
palette = "Greens",
border.col = "black",
title = "Number of Trees per NTA") +
tm_layout(main.title = "Total Tree Count Across NYC Neighborhoods", main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","center"),
legend.text.size = 0.52)
tm_shape(nta_dbh_spatial) +
tm_polygons(col = "avg_dbh",
style = "quantile",
n = 5,
palette = "YlGn",
border.col = "black",
title = "Mean Tree DBH (Inches)") +
tm_layout(main.title = "Average Tree DBH Across NYC Neighborhoods",
main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","center"),
legend.text.size = 0.52)
trees_nta_map = trees_2015|>
select(nta_name, nta, borough, status) |>
filter(status == "Alive") |>
count(nta, borough) |>
rename(num_trees = n)
trees_and_acres_map = left_join(trees_nta_map, acres_sub, by = "nta")
num_missing_nta = sum(is.na(trees_and_acres$total_acres))
trees_per_acre_mapdf = trees_and_acres_map |>
filter(!is.na(total_acres)) |>
mutate(trees_per_acre = num_trees/total_acres) |>
arrange(desc(trees_per_acre))
tree_acre_spatial = merge(nyc_nta, trees_per_acre_mapdf,
by.x = "ntacode",
by.y = "nta")
tm_shape(tree_acre_spatial) +
tm_polygons(col = "trees_per_acre",
style = "quantile",
n = 5,
palette = "Greens",
border.col = "black",
title = "Avg. Trees Per Acre by NTA") +
tm_layout(main.title = "Tree Count Per Acre Across NYC Neighborhoods", main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","center"),
legend.text.size = 0.52)
zip_nyc = st_read("small_data/MODZCTA_2010.shp")
## Reading layer `MODZCTA_2010' from data source
## `C:\Users\Kulal\OneDrive - Columbia University Irving Medical Center\Fall 2023\Data Science\P8105\p8105_final\small_data\MODZCTA_2010.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 178 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 913176 ymin: 120122 xmax: 1067382 ymax: 272844
## Projected CRS: Lambert_Conformal_Conic
trees_df =
trees_2015 |>
drop_na() |>
group_by(postcode, borough, health) |>
summarize(
n_trees = n())
trees_zip =
merge(zip_nyc, trees_df,
by.x = "MODZCTA",
by.y = "postcode") |>
janitor::clean_names()
trees_zip |>
tm_shape() +
tm_polygons(col = "health",
style = "equal",
n = 3,
palette = "viridis",
border.col = "black",
title = "Tree Health Status")+
tm_layout(main.title = "Tree Heath by Zip Codes in NYC",
main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","center"),
legend.text.size = 0.52)
trees_2015 |>
drop_na() |>
filter(
nta_name %in% c("Washington Heights South", "Washington Heights North")) |>
plot_ly(color = ~health, colors = "viridis") |>
add_trace(
type = "scattermapbox",
mode = "markers",
lon = ~longitude,
lat = ~latitude,
marker = list(size = 5),
text = ~spc_common,
below = "markers"
) |>
layout(
title = "Tree Health in Washington Heights",
mapbox = list(
style = "carto-positron",
center = list(lon = -73.94, lat = 40.84),
zoom = 12
),
showlegend = T
)
nyc_nta = st_read("small_data/geo_export_e924b274-8b6e-427d-87c0-92bfde8ce30a.shp")
## Reading layer `geo_export_e924b274-8b6e-427d-87c0-92bfde8ce30a' from data source `C:\Users\Kulal\OneDrive - Columbia University Irving Medical Center\Fall 2023\Data Science\P8105\p8105_final\small_data\geo_export_e924b274-8b6e-427d-87c0-92bfde8ce30a.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 195 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS: WGS84(DD)
nta_asthma_spatial = merge(nyc_nta, asthma_hosp_adult_clean,
by.x = "ntaname",
by.y = "nta_name")
tm_shape(nta_asthma_spatial) +
tm_polygons(col = "average_annual_rate_per_10_000",
style = "quantile",
n = 5,
palette = "Purples",
border.col = "black",
title = "Asthma Hospitalization Rate per 10k") +
tm_layout(main.title = "Avg. Annual Hospitalization Rate Across NYC Neighborhoods", main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","center"),
legend.text.size = 0.52)
nyc_comdist = st_read("small_data/geo_export_bd3c92bc-94de-48f3-9de3-e939473f307b.shp")
## Reading layer `geo_export_bd3c92bc-94de-48f3-9de3-e939473f307b' from data source `C:\Users\Kulal\OneDrive - Columbia University Irving Medical Center\Fall 2023\Data Science\P8105\p8105_final\small_data\geo_export_bd3c92bc-94de-48f3-9de3-e939473f307b.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 71 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS: WGS84(DD)
nyc_finepm2015 = read_csv("small_data/nyc_pm2.5_annual2015.csv") |>
janitor::clean_names() |>
filter(time == "Annual Average 2015" & geo_type_desc == "Community District") |>
select(geo_id, geography, avg_pm2.5 = mean_mcg_m3)
trees_cd_summarized = trees_2015 |>
drop_na(tree_dbh) |>
mutate(health = as.factor(health)) |>
group_by(community_board) |>
summarize(
"Number of Trees" = n(),
avg_dbh = mean(tree_dbh),
health_status = mode(health))
nyc_comdistrict_spatial = merge(nyc_comdist, trees_cd_summarized, nyc_finepm2015,
by.x = "boro_cd",
by.y = "community_board")
nyc_comdistrict_treesPM = merge(nyc_comdistrict_spatial, nyc_finepm2015,
by.x = "boro_cd",
by.y = "geo_id")
tm_shape(nyc_comdistrict_treesPM) +
tm_polygons(col = "avg_pm2.5",
style = "quantile",
n = 5,
palette = "YlOrBr",
border.col = "black",
title = "Average PM2.5 Concentration (µg/m3)") +
tm_shape(nyc_comdistrict_treesPM) +
tm_dots(col = "black",
size = "Number of Trees",
style = "quantile",
border.col = "black") +
tm_layout(main.title = "Average 2015 PM2.5 Concentrations
across NYC Community Districts", main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","top"),
legend.text.size = 0.52)
nyc_comdistrict_heatstress = merge(nyc_comdistrict_treesPM, nyc_heatstress_2015,
by.x = "boro_cd",
by.y = "geo_id")
tm_shape(nyc_comdistrict_heatstress) +
tm_polygons(col = "avg_heat_hospit",
style = "jenks",
n = 5,
palette = "OrRd",
border.col = "black",
title = "Avg. Adjusted Hospitalization Rate per 100k") +
tm_shape(nyc_comdistrict_treesPM) +
tm_dots(col = "black",
size = "Number of Trees",
style = "quantile",
border.col = "black") +
tm_layout(main.title = "Average Heatstress Hospitalization Rate
across NYC Community Districts", main.title.size = 1,
frame = FALSE) +
tm_legend(legend.position = c("left","top"),
legend.text.size = 0.52)
Correlation coefficients were calculated to gauge if a linear association exists between our x and y variables. To test the significance of a correlation coefficient, x and y variables must have a bivariate normal distribution. Since most of our x and y variables exhibited a skewed distribution (histograms not shown), we were unable to test the significance of the correlation coefficient at this time.
Since NTAs may differ in size, trees per acre was used to standardize street tree counts in each NTA by the acreage of the NTA. 6 NTAs from the 2015 Tree Census were missing acreage information, so these NTAs were removed for these analyses.
Only alive trees were included in the poverty, education, asthma emergency department visits, and asthma hospitalizations analyses.
trees_and_poverty <- left_join(trees_per_acre_df, poverty_clean, by = "nta_name")
trees_and_poverty |>
plot_ly(data = _, x = ~poverty_percent, y = ~trees_per_acre,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>%below poverty line: ", poverty_percent,
"<br>trees per acre: ", trees_per_acre)) |>
layout(title = "Percent below the poverty line and trees per acre in NYC",
xaxis = list(title = 'Percentage of people whose income <br> is below the poverty line'),
yaxis = list(title = 'Number of trees per acre'),
legend = list(title=list(text='Neighborhood')))
trees_and_poverty |>
filter(borough == "Staten Island") |>
plot_ly(data = _, x = ~poverty_percent, y = ~trees_per_acre,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>%below poverty line: ", poverty_percent,
"<br>trees per acre: ", trees_per_acre)) |>
layout(title = "Percent below the poverty line and trees per acre in Staten Island",
xaxis = list(title = 'Percentage of people whose income <br> is below the poverty line'),
yaxis = list(title = 'Number of trees per acre'),
legend = list(title=list(text='Neighborhood')))
staten_island_trees_and_poverty <- trees_and_poverty |>
filter(borough == "Staten Island")
trees_and_poverty |>
filter(borough == "Queens") |>
plot_ly(data = _, x = ~poverty_percent, y = ~trees_per_acre,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>%below poverty line: ", poverty_percent,
"<br>trees per acre: ", trees_per_acre)) |>
layout(title = "Percent below the poverty line and trees per acre in Queens",
xaxis = list(title = 'Percentage of people whose income <br> is below the poverty line'),
yaxis = list(title = 'Number of trees per acre'),
legend = list(title=list(text='Neighborhood')))
queens_trees_and_poverty <- trees_and_poverty |>
filter(borough == "Queens")
There is an inconsistent association between the percentage of people living below the poverty line and the number of trees per acre. This indicates that additional factors may be at play and the relationship may be sensitive to locale.
trees_and_education <- left_join(trees_per_acre_df, education_clean, by = "nta_name")
trees_and_education |>
plot_ly(data = _, x = ~graduated_hs_percent, y = ~trees_per_acre,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough, "<br>% graduated HS: ",
graduated_hs_percent, "<br>trees per acre: ", trees_per_acre)) |>
layout(title = "Percent graduated high school and trees per acre in NYC",
xaxis = list(title = 'Percent graduated <br> high school (includes equivalency)'),
yaxis = list(title = 'Number of trees per acre'),
legend = list(title=list(text='Neighborhood')))
trees_and_education |>
filter(borough == "Staten Island") |>
plot_ly(data = _, x = ~graduated_hs_percent, y = ~trees_per_acre,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough, "<br>% graduated HS: ",
graduated_hs_percent, "<br>trees per acre: ", trees_per_acre)) |>
layout(title = "Percent graduated high school and trees per acre in Staten Island",
xaxis = list(title = 'Percent graduated <br> high school (includes equivalency)'),
yaxis = list(title = 'Number of trees per acre'),
legend = list(title=list(text='Neighborhood')))
staten_island_trees_and_education <- trees_and_education |>
filter(borough == "Staten Island")
There is an inconsistent association between the percentage of people who graduated high school and the number of trees per acre. This indicates that additional factors may be at play and the relationship may be sensitive to locale.
trees_and_asthma_ed_adult <- left_join(trees_per_acre_df, asthma_ed_adults_clean, by = "nta_name")
trees_and_asthma_ed_adult |>
plot_ly(data = _, x = ~trees_per_acre, y = ~average_annual_rate_per_10_000,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>tree per acre: ", trees_per_acre, "<br>rate of adult asthma ED visits: ",
average_annual_rate_per_10_000)) |>
layout(title = "Trees per acre and rate of adult asthma ED visits in NYC",
xaxis = list(title = 'Number of trees per acre'),
yaxis = list(title = 'Average annual rate of <br> adult asthma emergency dept visits (per 10,000)'),
legend = list(title=list(text='Neighborhood')))
trees_and_asthma_ed_adult |>
filter(borough == "Staten Island") |>
plot_ly(data = _, x = ~trees_per_acre, y = ~average_annual_rate_per_10_000,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>tree per acre: ", trees_per_acre, "<br>rate of adult asthma ED visits: ",
average_annual_rate_per_10_000)) |>
layout(title = "Trees per acre and rate of adult asthma ED visits in Staten Island",
xaxis = list(title = 'Number of trees per acre'),
yaxis = list(title = 'Average annual rate of <br> adult asthma emergency dept visits (per 10,000)'),
legend = list(title=list(text='Neighborhood')))
trees_and_asthma_ed_adult |>
filter(borough == "Queens") |>
plot_ly(data = _, x = ~trees_per_acre, y = ~average_annual_rate_per_10_000,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>tree per acre: ", trees_per_acre, "<br>rate of adult asthma ED visits: ",
average_annual_rate_per_10_000)) |>
layout(title = "Trees per acre and rate of adult asthma ED visits in Queens",
xaxis = list(title = 'Number of trees per acre'),
yaxis = list(title = 'Average annual rate of <br> adult asthma emergency dept visits (per 10,000)'),
legend = list(title=list(text='Neighborhood')))
There is an inconsistent association between number of trees per acre and the annual rate of adult asthma emergency department visits. This indicates that additional factors may be at play and the relationship may be sensitive to locale.
trees_and_asthma_hosp_adult <- left_join(trees_per_acre_df, asthma_hosp_adult_clean, by = "nta_name")
trees_and_asthma_hosp_adult |>
plot_ly(data = _, x = ~trees_per_acre, y = ~average_annual_rate_per_10_000,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>tree per acre: ", trees_per_acre,
"<br>rate of adult asthma hospitalizations: ", average_annual_rate_per_10_000)) |>
layout(title = "Trees per acre and rate of adult asthma hospitalizations in NYC",
xaxis = list(title = 'Number of trees per acre'),
yaxis = list(title = 'Average annual rate of <br> adult asthma hospitalizations (per 10,000)'),
legend = list(title=list(text='Neighborhood')))
trees_and_asthma_hosp_adult |>
filter(borough == "Staten Island") |>
plot_ly(data = _, x = ~trees_per_acre, y = ~average_annual_rate_per_10_000,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>tree per acre: ", trees_per_acre,
"<br>rate of adult asthma hospitalizations: ", average_annual_rate_per_10_000)) |>
layout(title = "Trees per acre and rate of adult asthma hospitalizations in Staten Island",
xaxis = list(title = 'Number of trees per acre'),
yaxis = list(title = 'Average annual rate of <br> adult asthma hospitalizations (per 10,000)'),
legend = list(title=list(text='Neighborhood')))
trees_and_asthma_hosp_adult |>
filter(borough == "Queens") |>
plot_ly(data = _, x = ~trees_per_acre, y = ~average_annual_rate_per_10_000,
color = ~nta_name,
colors = "viridis",
type = "scatter",
mode = "markers",
text = ~paste("neighborhood: ", nta_name, "<br>borough: ", borough,
"<br>tree per acre: ", trees_per_acre,
"<br>rate of adult asthma hospitalizations: ", average_annual_rate_per_10_000)) |>
layout(title = "Trees per acre and rate of adult asthma hospitalizations in Queens",
xaxis = list(title = 'Number of trees per acre'),
yaxis = list(title = 'Average annual rate of <br> adult asthma hospitalizations (per 10,000)'),
legend = list(title=list(text='Neighborhood')))
There is an inconsistent association between the number of trees per acre and the annual rate of adult asthma hospitalizations. This indicates that additional factors may be at play and the relationship may be sensitive to locale.
effective_tree_protection(df = tree_status_df,
exposure_str = "Curb Location",
outcome_str = "Branches have Lights")
effective_tree_protection(df = tree_status_df,
exposure_str = "Sidewalk",
outcome_str = "Root Problem (Other)")
Several associations were explored between variables such as health, curb location, guards, sidewalk quality, and root, trunk, and branch quality (for example, if there are lights on the branches). There is an association between the location of the trees on the curb and whether branches have lights (trees on the curb are more likely to have lights on their branches). Additionally, there is an assoiation between sidewalk quality and the existence of root problems (damage on the sidewalk is associated with more root problems).
In conclusion, our exploratory analyses provide a picture of how tree distribution and attributes vary across neighborhoods and boroughs in NYC. Additionally, our preliminary results identifying correlations between tree access, socio-demographics variables, and health outcomes in specific NYC boroughs merit further investigation to better understand these relationships. This research will aid in informing public health interventions to promote equitable access to urban street trees and improve health. Further discussion of our results for specific analyses are below.
When looking at overall tree counts, Queens has the most trees and Manhattan contains the least. This contrast is likely due to the urban density of downtown Manhattan, and the less dense, more residential space of Easternmost Queens.
Within all boroughs, the majority of trees have a DBH ranging between 0 and 20 inches. Average tree DBH is largest outside of Manhattan, which suggests that Manhattan has the smallest and youngest trees out of all boroughs. This contributes to the conclusion that urbanization may be a significant disrupter to the overall success of living trees in New York City. Neighborhoods in the Bronx have the second lowest DBH averages as well as second lowest tree counts. This finding may reveal a gap in our data resulting from the limitations caused by using a data set which only includes street trees. There are many parks in the Bronx that have trees, most notably the Bronx Botanical Gardens, so our findings may be inaccurate in classifying the number of trees and DBH for the Bronx.
When looking at DBH averages, Queens has 6 of the top 10 neighborhoods with the highest average tree DBH, all of which are between 14 - 15 inches. Combined with findings from the Average DBH by NTA Map, our research suggests Queens and Brooklyn have neighborhoods with the largest proportion of older growth trees. From the ecological literature we referenced, many tree species would take over 30 - 40 years to grow a DBH of 15 inches (variability depending on the tree species’ growth factor). Larger more established trees have broader tree canopies, and provide a greater magnitude of ecosystem services like stormwater retention, carbon storage in woody biomass, air pollution filtration, and habitat for small forest organisms (Morgenroth et al. 2020). The spatial DBH relationships we are finding suggest that a greater number of street trees in Queens and Brooklyn are tended to and allowed to grow over decades, while Manhattan and Bronx have a greater number of low DBH trees that may be routinely replaced due to poor health or frequent aesthetic tree replantings. Our findings suggest that neighborhoods that have higher concentrations of young, smaller trees with low DBH around 6 - 7 inches, should prioritize resources toward nurturing existing trees and improving overall health so that they can grow for decades and reach this more mature stage of tree growth.
Despite having the lowest number of trees, Manhattan consistently has more trees per 10,000 people in all neighborhoods except for the Upper West Side. This finding may be due to there being more people per square foot in the west side, thus reducing the ratio of trees to people. Tree Health remains varied throughout all five boroughs, which is consistent with our initial predictions. There appears to be no hotspots of good nor poor tree health in any parts of NYC. Future investigations should examine other external factors that may contribute to tree health and survival.
Relationships between tree health, surroundings, and branch, trunk, or root status were explored. Overall, it is notable that for almost every comparison, the chi-square p value was less than 0.05. This is likely because the large sample size enables very small effects to be detected. The chi-square test essentially measures whether the distributions of a variable are significantly different than would be expected by chance. For the comparisons that were larger than two-by-two, it is important to remember that the test was noting whether any part of the distribution was different than would be expected by a null.
Several reasonable associations were found, including that there are more lights on the branches of trees that are on the curb than off, and there is a greater percentage of trees with root problems on sidewalks that are damaged. Roots offset from the curb are more likely to have stones and be interfered with by grates, but had fewer “other” root problems. Branches, however, were more likely to have lights on them when the trees were on the curb.
Overall in NYC, 28.7% of the sidewalks were rated as “damaged.” Brooklyn had the worst sidewalk conditions, followed by the Bronx, Queens, Manhattan, and Staten Island had the best. Damaged sidewalks were significantly associated with stones in the roots, other root problems, trunks with wires, and branches with lights on them.
Manhattan had more trees offset from the curb than the other boroughs, and more instances of tree guards, although many of them were rated as “unhelpful.” Manhattan seemed to have the most problems with roots in grates (3.8%, while the other boroughs all had less than 0.4%). However, Manhattan had more “other” problems with branches (10.0% vs 1.2-4.4% in the other boroughs).
There were more branches with lights than we expected, other than in Manhattan’s 1.3%, the other boroughs ranged from 7.5% to 12.2%! There were a total of 411 trees with shoes in their branches, but this is a very low percentage of the overall number of trees (0.1% overall). There were the most trees with shoes in Brooklyn (149 trees), followed by the Bronx, Queens, Manhattan, and Staten Island (11 trees).
When examining our findings by borough, the top 3 boroughs with the highest number of trees have a majority white population. At the neighborhood level in Manhattan, neighborhoods with the greatest number of trees also have majority white populations. These data suggest that there may be racial/ethnic disparities in access to greenspace and street trees in Manhattan neighborhoods, which aligns with our initial hypothesis. While no definitive conclusions can be made at this time, future research should dive deeper into potential sociodemographic disparities in street tree access, especially among different racial/ethnic groups.
Across all neighborhoods in NYC, there is a very weak, negative, linear association between percentage of people living below the poverty line and number of trees per acre (r = - 0.0220). However, associations differed by borough, suggesting borough-specific effects. Notably, neighborhoods across Staten Island (r = -0.4646) and Queens (r = -0.4384) exhibit a moderate, negative, linear association between percentage of people living below the poverty line and number of trees per acre; as the percentage of people living below the poverty line increases, the number of trees per acre decreases.
Similarly, across all neighborhoods in NYC, there is a weak, positive, linear association between the percentage of people who graduated high school and the number of trees per acre (r = 0.0940). However, associations differed by borough, suggesting borough-specific effects. Notably, neighborhoods across Staten Island (r = 0.5839) exhibit a moderate, positive, linear association between percentage of people who graduated high school and the number of trees per acre; as the percentage of people who graduated high school increases, the number of trees per acre increases.
Our preliminary results suggest that neighborhoods in Staten Island and Queens with high poverty levels and neighborhoods in Staten Island with lower education levels should be prioritized for increasing tree density to improve equitable access to street trees. Future analyses should consider including additional variables to better understand relationships between poverty or education and tree density and how the relationships vary by locale. In addition, future investigations should consider examining the suitability of non-linear models for examining these relationships.
Looking at the spatial variation of asthma hospitalization rate, neighborhoods of the Bronx and central Brooklyn appear to have the most concentrated spatial clusters of high asthma hospitalization. In particular, these neighborhoods of the Bronx have both lower tree densities and smaller tree DBH averages. In contrast, Manhattan neighborhoods tend to have low asthma hospitalizations, low tree counts, and lower tree DBH, which deviates from the expected trend of low tree distribution correlating with high asthma hospitalization. When comparing the Asthma Hospitalization Map with the Tree Counts Per Acre NTA Map, many NTAs that have high hospitalization also have lower tree counts per acre around or below 3 - 4 trees each acre. For example, Northern Staten Island has a pocket of high asthma rate that is mirrored by low tree density. Eastern Queens neighborhoods with high tree density have low asthma hospitalization rates.
Across all neighborhoods in NYC, there is a very weak, negative, linear association between number of trees per acre and the average annual rate of adult asthma emergency department visits per 10,000 people (r = - 0.0324). Similarly, across all neighborhoods in NYC, there is a weak, negative, linear association between number of trees per acre and the average annual rate of adult asthma hospitalizations per 10,000 people (r = -0.1008). However, associations differed by borough, suggesting borough specific effects. Notably, neighborhoods across Staten Island and Queens exhibited moderate, negative, linear associations between number of trees per acre and asthma emergency department visits as well as asthma hospitalizations; as the number of trees per acre increases, the rate of adult asthma emergency department visits or asthma hospitalizations decreases.
Our preliminary results suggest that respiratory health in neighborhoods within Staten Island and Queens with high adult asthma emergency department visits and hospitalizations may benefit from increased tree density. Future analyses should consider including additional variables (e.g., economic status, health insurance) to better understand relationships between asthma outcomes and tree density and how the relationships vary by locale. In addition, future investigations should consider examining the suitability of non-linear models for examining these relationships.
When visualizing ambient average PM2.5 concentrations, which is a known criteria pollutant that increases asthma development and asthma exacerbation, Manhattan, the Bronx, and Downtown Brooklyn have the highest average concentrations of fine particulate matter. PM2.5 is lowest in the neighborhoods farthest from the downtown heart of the city, such as Southeastern Brooklyn and Eastern Queens, which also contain the highest tree density and DBH metrics discussed previously. This result provided support for our earlier hypothesized research predictions, that increased tree counts would be able to retain and mitigate excess pm2.5 concentrations. However, while southern Manhattan again has high pollution, southern Manhattan has some of the lowest Asthma hospitalizations. It is likely greater economic mobility and social privilege across demographic groups residing in Downtown Manhattan contribute due lower asthma hospitalization risk. The final map on the health page is an initial foray to understand potential relationships between tree distribution and recorded heat stress events. Similar to recorded Asthma Hospitalization, the Bronx and central Brooklyn contained the highest concentration of hospitalization events. The relationship between heat stress events and trees is a bit more variable than the spatial relationships of asthma hospitalization and trees. The high count of trees in Queens community districts does align with lowered heat stress hospitalizations. But again the low tree counts and low heat stress hospitalization in lower Manhattan may be due to other factors aside from street trees that help mitigate the urban heat island effect, and improved medical / healthcare access and socioeconomic standing in that area.
The 2015 tree dataset includes only street trees, and does not include any trees from private gardens, Central Park, or any of the other greenspaces managed by the NYC Park and Recreation Department. Because of this data siloing, the full biodiversity and tree distribution across all of New York City is not accounted for, and our conclusions surrounding tree relationships with health and sociodemographic indicators are limited to the impact of urban street trees alone.
The size of the 2015 NYC Street Tree dataset, which had over 680,0000 individual rows of surveyed trees for 45 variables, was a significant hindrance in the efficiency of our coding process. We originally wanted to assess longitudinal changes in the distribution of trees throughout New York City by comparing the 2015 dataset to the 1995 and 2005 NYC Street Tree datasets. However, after we discovered the large size of these datasets, and that the older datasets may have significant data cleaning and wrangling challenges and inconsistencies with the 2015 dataset (e.g., using census tract levels from census other than the 2010 U.S. Census), we decided to not pursue this analysis.
We addressed the issue of the large file size of the 2015 NYC Street
Tree dataset, which was over 200 mb, and by utilizing
git.ignore
. Supplementary datasets of large file size were
also placed in the git.ignore
. We included the path to our
large data folder, so that knitting and building the website would be
faster when working with these datasets. Additionally, using the
summarize()
function to bring our observation averages to
the size of grouped spatial resolutions (zip code, nta, community
district) made dataframe sizes much more manageable for performing joins
to shapefiles and rendering the data as maps or charts.
After deciding to use the 2015 NYC Street Tree dataset, we wanted to make sure that our supplementary data sets were collected around the same time frame. We had some difficulty in finding data sets that included our desired variables during 2015 or a few years around it and thus, had to choose the best available options. This difficulty was compounded by the challenge of finding data for variables of interest (e.g., sociodemographic, health outcomes) at the spatial resolution desired. For instance, many health outcomes cataloged in the New York City Environmental Health Data Portal were not at the NTA level. Many outcomes we were hoping to explore only had measurements available at much higher spatial resolutions (e.g., citywide or borough). Some health variables only had broad summary results that were too generalized for us to use in meaningful comparative analysis with the street tree dataset.
Lastly, there was limited opportunity to perform formal statistical analyses. Correlation coefficients were computed for some analyses, however, due to the skewed distributions observed in the relevant x and y variables, we were not able to test the significance of the correlation coefficient at this time. In the future, more advanced statistical expertise and complex modeling will help to better address questions of association between street trees and sociodemographics and health outcomes. Nevertheless, our exploratory and descriptive analyses, combined with our visualizations and preliminary findings, provide an exciting foundation to launch future studies.