Toronto Building Evaluation 2022

Data Exploration and Visualization

Importing relevant libraries

library(tidyverse)
library(stats)
library(naniar)
library(visdat)
library(VIM)
library(DT)
library(lubridate)
library(forcats)
library(leaflet)

Importing Toronto Dataset and Checking the overall structure of the dataset using str() function. The dataset is also publicly available for download at https://open.toronto.ca/dataset/apartment-building-evaluation/

data <- read.csv("apartments_toronto.csv")
str(data)

## 'data.frame':    11651 obs. of  40 variables:
##  $ X_id                       : int  2406105 2406106 2406107 2406108 2406109 2406110 2406111 2406112 2406113 2406114 ...
##  $ RSN                        : int  5186997 5118732 5156142 5156008 5156127 5132414 5207679 5175953 4154229 5156112 ...
##  $ YEAR_REGISTERED            : num  NA 2022 NA 2022 2022 ...
##  $ YEAR_EVALUATED             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ YEAR_BUILT                 : num  2019 2021 NA 1885 1972 ...
##  $ PROPERTY_TYPE              : chr  "PRIVATE" "PRIVATE" "PRIVATE" "PRIVATE" ...
##  $ WARD                       : int  12 13 13 9 13 11 12 10 7 12 ...
##  $ WARDNAME                   : chr  "Toronto-St. Paul's" "Toronto Centre" "Toronto Centre" "Davenport" ...
##  $ SITE_ADDRESS               : chr  "200 MADISON AVE" "25 NICHOLAS AVE" "109 PEMBROKE ST" "267 BROCK AVE" ...
##  $ CONFIRMED_STOREYS          : int  6 29 4 3 30 43 20 36 4 4 ...
##  $ CONFIRMED_UNITS            : int  82 346 20 11 240 595 177 286 144 25 ...
##  $ EVALUATION_COMPLETED_ON    : chr  "2023-01-31" "2023-01-27" "2023-01-27" "2023-01-24" ...
##  $ SCORE                      : int  100 100 67 73 93 99 100 100 85 75 ...
##  $ RESULTS_OF_SCORE           : chr  "Evaluation needs to be conducted in 3 years" "Evaluation needs to be conducted in 3 years" "Evaluation needs to be conducted in 2 years" "Evaluation needs to be conducted in 2 years" ...
##  $ NO_OF_AREAS_EVALUATED      : int  19 20 14 14 20 19 19 20 19 15 ...
##  $ ENTRANCE_LOBBY             : num  5 5 3 3 5 5 5 5 4 3 ...
##  $ ENTRANCE_DOORS_WINDOWS     : num  5 5 4 3 5 5 5 5 4 3 ...
##  $ SECURITY                   : num  5 5 3 4 5 5 5 5 5 5 ...
##  $ STAIRWELLS                 : num  5 5 3 4 5 4 5 5 4 5 ...
##  $ LAUNDRY_ROOMS              : num  5 5 2 NA 5 NA NA 5 4 3 ...
##  $ INTERNAL_GUARDS_HANDRAILS  : num  5 5 3 4 5 5 5 5 5 4 ...
##  $ GARBAGE_CHUTE_ROOMS        : num  5 5 NA NA 4 5 5 5 4 NA ...
##  $ GARBAGE_BIN_STORAGE_AREA   : num  5 5 4 5 4 5 5 5 4 4 ...
##  $ ELEVATORS                  : num  5 5 NA 5 4 5 5 5 4 NA ...
##  $ STORAGE_AREAS_LOCKERS      : num  5 5 NA NA 5 5 5 5 4 NA ...
##  $ INTERIOR_WALL_CEILING_FLOOR: num  5 5 3 3 5 5 5 5 5 3 ...
##  $ INTERIOR_LIGHTING_LEVELS   : num  5 5 3 4 4 5 5 5 5 3 ...
##  $ GRAFFITI                   : num  5 5 5 3 5 5 5 5 5 5 ...
##  $ EXTERIOR_CLADDING          : num  5 5 3 2 5 5 5 5 4 3 ...
##  $ EXTERIOR_GROUNDS           : num  5 5 3 4 5 5 5 5 4 4 ...
##  $ EXTERIOR_WALKWAYS          : num  5 5 4 3 5 5 5 5 4 4 ...
##  $ BALCONY_GUARDS             : num  5 5 NA NA 4 5 5 5 4 NA ...
##  $ WATER_PEN_EXT_BLDG_ELEMENTS: num  5 5 4 4 4 5 5 5 5 4 ...
##  $ PARKING_AREA               : num  5 5 NA NA 4 5 5 5 3 3 ...
##  $ OTHER_FACILITIES           : num  NA 5 NA NA 5 5 5 5 NA NA ...
##  $ GRID                       : chr  "S1237" "S1330" "S1328" "S0936" ...
##  $ LATITUDE                   : num  43.7 NA 43.7 43.6 43.7 ...
##  $ LONGITUDE                  : num  -79.4 NA -79.4 -79.4 -79.4 ...
##  $ X                          : num  312300 316226 315042 310019 314347 ...
##  $ Y                          : num  4837164 4835309 4835360 4834170 4835607 ...

str() function returned the “structure” information about the Toronto apartments dataset, including its type and content, and provides a concise way to examine the properties of the Toronto Apartment dataset. The Toronto apartments dataset contains 11651 observations (rows) and 40 variables (columns)

Filtering the dataset using filter() function to only contain the York South-Weston ward

# Changing the ward attribute into categorical
data$WARDNAME <- as.factor(data$WARDNAME)

# Filtering the data and ensuring that the data only contain York South-Weston ward
york_south_weston <- data %>%
  filter(WARDNAME=="York South-Weston")

The filtered data york_south_weston only contains York South-Weston ward data and has 836 observation(rows) and 40 attributes(columns)

Before diving deep into the dataset, we first need to tidy the data, especially for the NAs value in york_south_weston dataset.

To check for NAs value, apply() and is.na() function could be used to check for NAs value in each attribute

# Tabulating the NAs Value, including the percentage from the total observation
york_south_weston_na <- apply(X = is.na(york_south_weston), MARGIN = 2, FUN = sum)
york_south_weston_na <- data.frame(count_na=york_south_weston_na, percentage = (york_south_weston_na/dim(york_south_weston)[1]) %>% round(4))
york_south_weston_na

##                             count_na percentage
## X_id                               0     0.0000
## RSN                                0     0.0000
## YEAR_REGISTERED                   39     0.0467
## YEAR_EVALUATED                   123     0.1471
## YEAR_BUILT                         4     0.0048
## PROPERTY_TYPE                      0     0.0000
## WARD                               0     0.0000
## WARDNAME                           0     0.0000
## SITE_ADDRESS                       0     0.0000
## CONFIRMED_STOREYS                  0     0.0000
## CONFIRMED_UNITS                    0     0.0000
## EVALUATION_COMPLETED_ON            0     0.0000
## SCORE                              0     0.0000
## RESULTS_OF_SCORE                   0     0.0000
## NO_OF_AREAS_EVALUATED              0     0.0000
## ENTRANCE_LOBBY                     0     0.0000
## ENTRANCE_DOORS_WINDOWS             0     0.0000
## SECURITY                           0     0.0000
## STAIRWELLS                         0     0.0000
## LAUNDRY_ROOMS                     21     0.0251
## INTERNAL_GUARDS_HANDRAILS          0     0.0000
## GARBAGE_CHUTE_ROOMS              567     0.6782
## GARBAGE_BIN_STORAGE_AREA           0     0.0000
## ELEVATORS                        455     0.5443
## STORAGE_AREAS_LOCKERS            500     0.5981
## INTERIOR_WALL_CEILING_FLOOR        0     0.0000
## INTERIOR_LIGHTING_LEVELS           0     0.0000
## GRAFFITI                           6     0.0072
## EXTERIOR_CLADDING                  0     0.0000
## EXTERIOR_GROUNDS                   2     0.0024
## EXTERIOR_WALKWAYS                  0     0.0000
## BALCONY_GUARDS                   375     0.4486
## WATER_PEN_EXT_BLDG_ELEMENTS        4     0.0048
## PARKING_AREA                      10     0.0120
## OTHER_FACILITIES                 702     0.8397
## GRID                               0     0.0000
## LATITUDE                           9     0.0108
## LONGITUDE                          9     0.0108
## X                                  4     0.0048
## Y                                  4     0.0048

# Summing the NAs value in the York South-Weston dataset
paste("The sum of the NA values in york_south_weston is:", sum(york_south_weston_na$count_na))

## [1] "The sum of the NA values in york_south_weston is: 2834"

The above results returned the tabulated NAs value in each of the attributes within the dataset. This was done by applying is.na() function in conjunction with sum function, and using apply function with MARGIN=2 to apply it to the entire attributes. The is.na function will return the Boolean result whether each row in the dataset equals to NA value or not.

The percentage of NA value in each attribute was also presented in the table above by dividing the total NA value with the total observation within that specific attributes.

The total number of NA value in York South-Weston dataset is calculated to be 2834 observations.

Variables with missing values are: YEAR_REGISTERED, YEAR_EVALUATED, YEAR_BUILT, LAUNDRY_ROOMS, GARBAGE_CHUTE_ROOMS, ELEVATORS, STORAGE_AREAS_LOCKERS, GRAFFITI, EXTERIOR_GROUNDS, BALCONY_GUARDS, WATER_PEN_EXT_BLDG_ELEMENTS, PARKING AREA, OTHER_FACILITIES, LATITUDE, LONGITUDE, X, and Y.

Three selected columns with missing values, YEAR_REGISTERED, YEAR_EVALUATED, YEAR_BUILT, might be missing possibly due to the building is quite old in terms of age so that there was no proper documentation in terms of when it was build and when it was actually registered to the government. The other possible explanation might be because of the human error (forget to note the registration, evaluation, and built year).

The NAs value could also be visualized using the naniar and visdat library

vis_dat(york_south_weston)

This visualization provides a great detail on each observation and its type for each attributes. Since the dataset contains lots of attribute (40 attributes), the label for each attributes does not clearly presented and overlapping with each other. Below graph might also be used to visualize the missing value in the dataset.

gg_miss_var(york_south_weston) + ggtitle("Missing Values Plot")

From the missing value result above, there are four columns with more than 50% missing value which are GARBAGE_CHUTE_ROOMS, ELEVATORS, STORAGE_AREAS_LOCKERS, and OTHER_FACILITIES. To make the analysis more “user-friendly”, these three columns will be removed from the dataset. Below code was used to remove those columns.

york_south_weston <- york_south_weston %>%
  select(-GARBAGE_CHUTE_ROOMS, -ELEVATORS, -STORAGE_AREAS_LOCKERS, -OTHER_FACILITIES)

#Checking whether the columns have been removed
colnames(york_south_weston)

##  [1] "X_id"                        "RSN"                        
##  [3] "YEAR_REGISTERED"             "YEAR_EVALUATED"             
##  [5] "YEAR_BUILT"                  "PROPERTY_TYPE"              
##  [7] "WARD"                        "WARDNAME"                   
##  [9] "SITE_ADDRESS"                "CONFIRMED_STOREYS"          
## [11] "CONFIRMED_UNITS"             "EVALUATION_COMPLETED_ON"    
## [13] "SCORE"                       "RESULTS_OF_SCORE"           
## [15] "NO_OF_AREAS_EVALUATED"       "ENTRANCE_LOBBY"             
## [17] "ENTRANCE_DOORS_WINDOWS"      "SECURITY"                   
## [19] "STAIRWELLS"                  "LAUNDRY_ROOMS"              
## [21] "INTERNAL_GUARDS_HANDRAILS"   "GARBAGE_BIN_STORAGE_AREA"   
## [23] "INTERIOR_WALL_CEILING_FLOOR" "INTERIOR_LIGHTING_LEVELS"   
## [25] "GRAFFITI"                    "EXTERIOR_CLADDING"          
## [27] "EXTERIOR_GROUNDS"            "EXTERIOR_WALKWAYS"          
## [29] "BALCONY_GUARDS"              "WATER_PEN_EXT_BLDG_ELEMENTS"
## [31] "PARKING_AREA"                "GRID"                       
## [33] "LATITUDE"                    "LONGITUDE"                  
## [35] "X"                           "Y"

The four columns with missing value percentage >50% have been removed from the dataset.

Based on the dataset description, column EVALUATION_COMPLETED_ON contains date type of data, bu R classified the type of the data within the column as character(chr) type. Conversion needs to be performed first before moving further with the dataset analysis.

Below code is used to convert the data type in column EVALUATION_COMPLETED_ON to dates:

york_south_weston$EVALUATION_COMPLETED_ON <- as.Date(york_south_weston$EVALUATION_COMPLETED_ON, format = "%Y-%m-%d")
str(york_south_weston$EVALUATION_COMPLETED_ON)

##  Date[1:836], format: "2022-12-30" "2022-12-30" "2022-12-30" "2022-12-22" "2022-12-22" ...

The EVALUATION_COMPLETED_ON variable have been converted to dates format.

Now that the dataset has been cleaned and tidied, lets do some quick data exploration. Since my birth month is January, i am interested to see how many inspections were done at York South-Weston ward on January. Below code is used to find the number of inspection done in York South-Weston ward apartments on January:

york_south_weston <- york_south_weston %>%
  mutate(Month = as.integer(format(york_south_weston$EVALUATION_COMPLETED_ON, "%m")))%>%
  mutate(jan_inspection = ifelse(Month==1, 1,0))

paste("Number of Inspection on January:", sum(york_south_weston$jan_inspection))

## [1] "Number of Inspection on January: 60"

There were 60 inspections made on January for the Apartment in Toronto.

Checking Summary Statistics

Lets check for another summary statistics for some of the attributes that were being inspected:

# Median number of confirmed storey
paste("Median number of confirmed storey in York South-Weston:", median(york_south_weston$CONFIRMED_STOREYS))

## [1] "Median number of confirmed storey in York South-Weston: 4"

# Mean number of confirmed storey
paste("Mean number of confirmed storey in York South-Weston:", mean(york_south_weston$CONFIRMED_STOREYS))

## [1] "Mean number of confirmed storey in York South-Weston: 6.99521531100478"

# Checking for the distribution of the number of confirmed storey
hist(york_south_weston$CONFIRMED_STOREYS, breaks=10, main="Histogram of Confirmed Storeys", xlab="Value Interval")

The value of mean and median are different for the CONFIRMED_STOREYS variable within the York South-Weston apartment dataset. Based on the histogram plotted above, the difference might be due to the distribution is heavily skewed to the right tail with the majority of the data point lies in the left tail. In other words, the “average-confirmed-storey” number of units is 4, while the average “confirmed-storey-number-of-units” is 6.99. This might indicate the presence of outlier.

Checking for the percentage of apartments in need of inspection

I also interested to find the percentage of apartments that needs to be inspected within 3 years. Below code is used to do that:

# Changing the RESULTS OF SCORE variable to be categorical
york_south_weston$RESULTS_OF_SCORE <- as.factor(york_south_weston$RESULTS_OF_SCORE)

# Checking for the unique value within the RESULT OF SCORE variable
unique(york_south_weston$RESULTS_OF_SCORE)

## [1] Evaluation needs to be conducted in 2 years
## [2] Evaluation needs to be conducted in 3 years
## [3] Evaluation needs to be conducted in 1 year 
## [4] Building Audit                             
## 4 Levels: Building Audit ... Evaluation needs to be conducted in 3 years

# Summing the number of buildings with result Evaluation needs to be conducted in 3 years
york_south_weston <- york_south_weston %>%
  mutate(inspection_three_yrs = ifelse(RESULTS_OF_SCORE=="Evaluation needs to be conducted in 3 years", 1, 0))

paste("Number of Building with Result of Evaluation Needs to be Conducted in 3 years:", sum(york_south_weston$inspection_three_yrs))

## [1] "Number of Building with Result of Evaluation Needs to be Conducted in 3 years: 79"

# Calculating the percentage of that particular buildings
paste("Percentage of Number of Building with Result of Evaluation Needs to be Conducted in 3 years (%):", ((sum(york_south_weston$inspection_three_yrs))/dim(york_south_weston)[1])*100)

## [1] "Percentage of Number of Building with Result of Evaluation Needs to be Conducted in 3 years (%): 9.44976076555024"

The number of building with result of “Evaluation Needs to be Conducted in 3 years is 79 with percentage of 9.44% of the total building recorded within York South-Weston Ward.

Checking for the oldest building in York South-Weston ward and its overall score

# Checking for the oldest building in York South-Weston Ward
oldest <- york_south_weston[which.min(york_south_weston$YEAR_BUILT),]
oldest

##       X_id     RSN YEAR_REGISTERED YEAR_EVALUATED YEAR_BUILT PROPERTY_TYPE WARD
## 56 2406894 4155112            2017             NA       1915       PRIVATE    5
##             WARDNAME     SITE_ADDRESS CONFIRMED_STOREYS CONFIRMED_UNITS
## 56 York South-Weston 83 CLEARVIEW HTS                 3              40
##    EVALUATION_COMPLETED_ON SCORE                            RESULTS_OF_SCORE
## 56              2022-11-18    72 Evaluation needs to be conducted in 2 years
##    NO_OF_AREAS_EVALUATED ENTRANCE_LOBBY ENTRANCE_DOORS_WINDOWS SECURITY
## 56                    15              3                      4        5
##    STAIRWELLS LAUNDRY_ROOMS INTERNAL_GUARDS_HANDRAILS GARBAGE_BIN_STORAGE_AREA
## 56          4             3                         4                        3
##    INTERIOR_WALL_CEILING_FLOOR INTERIOR_LIGHTING_LEVELS GRAFFITI
## 56                           4                        3        3
##    EXTERIOR_CLADDING EXTERIOR_GROUNDS EXTERIOR_WALKWAYS BALCONY_GUARDS
## 56                 3                3                 4             NA
##    WATER_PEN_EXT_BLDG_ELEMENTS PARKING_AREA  GRID LATITUDE LONGITUDE        X
## 56                           5            3 W0532 43.69262 -79.48296 306158.2
##          Y Month jan_inspection inspection_three_yrs
## 56 4838964    11              0                    0

# Extracting the Overall Evaluation Score for this building
paste("Overall evaluation score for the oldest building in York South-Weston:", oldest$SCORE)

## [1] "Overall evaluation score for the oldest building in York South-Weston: 72"

The oldest registered building in York South-Weston is building with RSN 4155112 which was built in 1915. Even though the building is old, but the building is performing good with overall score of 72.

Using `lubridate` library to create new column called season

# Creating new column named season
york_south_weston <- york_south_weston %>%
  mutate(season1 = quarter(
  york_south_weston$EVALUATION_COMPLETED_ON,
  type = "quarter",
  fiscal_start = 1,
))
york_south_weston$season1 <- as.factor(york_south_weston$season1)

# Renaming the season
york_south_weston <- york_south_weston %>%
  mutate(season = ifelse(season1==1,"Winter",
                                ifelse(season1==2,"Spring", 
                                       ifelse(season1==3, "Summer", ifelse(season1==4, "Fall", 0)))))

# Removing intermediary column
york_south_weston <- york_south_weston %>%
  select(-season1)

plotting the barplot to show the number of evaluation done in each season

ggplot(york_south_weston, aes(reorder(season, table(season)[season]))) +
  geom_bar(fill=c("#009E73","#F0E442","#0072B2","#D55E00"))+
  theme_classic() + 
  coord_flip() +
  ggtitle("Total Cumulative Number of Inspections per Season") + 
  xlab("Season") + 
  ylab("Cumulative Inspections")

The barplot above showed that the number of inspection was highest during Fall season. One possible explanation is that the fall season is a time when students often return to the city for the beginning of the school year, which could explain why building inspections are more common during this time. Since there may be higher demand for rental housing in the fall, landlords may be more inclined to perform repairs and upgrades on their rental units in order to attract new tenants. This may result in a greater number of requests for inspections during the fall season.

Plotting property types vs graffiti ratings on a ggplot barplot

# Changing property type variable from character to categorical
york_south_weston$PROPERTY_TYPE <- as.factor(york_south_weston$PROPERTY_TYPE)

# Plotting barplot
ggplot(york_south_weston, aes(x = reorder(PROPERTY_TYPE, GRAFFITI), y = GRAFFITI)) +
  geom_bar(stat = "summary", fill=c("#009E73","#F0E442","#0072B2"))+
  theme_classic() + 
  coord_flip() + 
  ggtitle("Graffiti Rating per Property Types") + 
  xlab("Property Type") + 
  ylab("Mean of Graffiti Rating")

The bar plot showed that the Private and Social Housing type of property has less graffiti than TCHC property, with mean of Graffiti rating at 4.6 for both Private and Social Housing. Some possible explanations on why TCHC has lower graffiti rating is that TCHC properties tend to be more densely populated than other types of properties, which may make them more attractive targets for graffiti vandals. Another thing is that TCHC properties are often located in high-traffic areas or areas with high levels of foot traffic. This may make them more visible and therefore more likely to be targeted by vandals.

Plotting a histogram to show the distribution of SCORE variable

ggplot(york_south_weston, aes(x=SCORE)) + 
  geom_histogram(aes(y = ..density..), color = "black", fill = "white", bins=20) +
  geom_density(alpha = 0.2, fill = "orange", color="orange2") +  
  theme_classic() +
  ggtitle("Building Overall Score Histogram") + 
  xlab("Score") + 
  ylab("Density/Frequency")

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The histogram plot above showed that the SCORE variable is normally distributed with no observable heavy tail in both right or left.

Plotting a histogram to show the distribution of GRAFFITI variable

ggplot(york_south_weston, aes(x=GRAFFITI)) + 
  geom_histogram(aes(y = ..density..), color = "black", fill = "tomato", bins=5) +
  theme_classic() +
  ggtitle("Building Graffiti Rating Histogram") + 
  xlab("Graffiti Rating") + 
  ylab("Density/Frequency")

The histogram plot above showed the distribution of GRAFFITI variables. The distribution is skewed the the left with the majority of the value lies between 4-5.

Creating faceted histogram between SCORE and RESULT_OF_SCORE variables

ggplot(york_south_weston, aes(x = SCORE, fill = RESULTS_OF_SCORE)) +
  geom_histogram(binwidth = 5, color = "black", position = "dodge") +
  facet_wrap(~RESULTS_OF_SCORE, ncol = 2) +
  labs(title = "Faceted Histograms of Score by Results of Score", x = "Score", y = "Frequency", fill = "RESULTS_OF_SCORE")+theme(legend.position="None")

The faceted histogram of SCORE on the RESULT_OF_SCORE variables showed four facets of the SCORE distribution based on each category on the RESULT_OF_SCORE variables. The connection between SCORE value and the RESULT_OF_SCORE outcome is that the higher the score value is, the longer the next evaluation would be conducted. While the SCORE itself previously is normally distributed, all of the value from each facets in this histogram if combined will form a normal distribution with Building Audit resides in the left tail, Evaluation needs to be conducted in 3 years resides in the right tail, and both Evaluation needs to be conducted in 1 years and Evaluation needs to be conducted in 2 years resides in the middle of the normal distribution, constituting the majority of the data points within the SCORE variable.

Filtering York South-Weston dataset to only include properties from five most common address

Creating a separate dataframe to check for the 5 most common address within the york_south-weston ward.

# Changing SITE_ADDRESS variable from character to categorical
york_south_weston$SITE_ADDRESS <- as.factor(york_south_weston$SITE_ADDRESS)

# use table() function to get frequency counts and sort it
street_frequency <- table(york_south_weston$SITE_ADDRESS) %>% sort(decreasing=TRUE)

# select the top 5 most common streets
top_5_street <- data.frame(head(street_frequency, n = 5))
top_5_street

##                  Var1 Freq
## 1      1306 WESTON RD    6
## 2     101 HUMBER BLVD    5
## 3    137 WOODWARD AVE    5
## 4        1570 JANE ST    5
## 5 1619 LAWRENCE AVE W    5

The 5 most common streets in York South-Weston dataset are 101 HUMBER BLVD, 1570 JANE ST, 1619 LAWRENCE AVE W, 1306 WESTON RD, and 137 WOODWARD AVE

Creating new dataframe by filtering york_south_weston dataset to include only properties from the 5 most common streets in York South-Weston.

# Filtering the york_south_weston to only include properties from 5 most common streets
york_south_weston_5streets <- york_south_weston %>%
  filter(SITE_ADDRESS %in% top_5_street[,1])

# Checking whether the filtered dataset only contains SITE_ADDRESS value from top_5_street
unique(york_south_weston_5streets$SITE_ADDRESS)

## [1] 101 HUMBER BLVD     1570 JANE ST        1619 LAWRENCE AVE W
## [4] 1306 WESTON RD      137 WOODWARD AVE   
## 233 Levels: 1 GREENBROOK DR 10 MAPLE LEAF DR ... 98 TRETHEWEY DR

# The new dataset containing only properties from 5 most common streets in York South-Weston is now ready to be used

# Creating scatterplot for YEAR_BUILT vs SCORE
ggplot(york_south_weston_5streets, aes(x=YEAR_BUILT, y=SCORE, color=SITE_ADDRESS))+
  geom_point(size=3.5)+
  scale_color_brewer(palette="Dark2")+
  theme_light()+
  ggtitle("Building Year Built vs Overall Score") + 
  xlab("Year Built") + 
  ylab("Score")

The scatterplot above showed the relation between SCORE and YEAR_BUILT variables. The plot showed the tendency that the “younger” the building is, the higher the overall score it gets even though might not be so obvious (weak correlation). Another interesting takeaway is that each street hosts building that was built on the same year! For example, 137 WOODWARD AVE hosts only the building that was built on 1954 while 101 HUMBER BLVD only hosts building that was built on 1969! it seems that during those era, the development of the city was gradually increasing and when the new, thus younger, road was built, the new building was also built on that new road.

Toronto Building Evaluation 2022

Introduction

Data Exploration and Visualization

Checking Summary Statistics

Checking for the percentage of apartments in need of inspection

Checking for the oldest building in York South-Weston ward and its overall score

Using `lubridate` library to create new column called season

plotting the barplot to show the number of evaluation done in each season

Plotting property types vs graffiti ratings on a ggplot barplot

Plotting a histogram to show the distribution of SCORE variable

Plotting a histogram to show the distribution of GRAFFITI variable

Creating faceted histogram between SCORE and RESULT_OF_SCORE variables

Filtering York South-Weston dataset to only include properties from five most common address

Overlay Toronto York South-Weston Ward in Map Format

Overlay Toronto York South-Weston Ward in Map Format 2 NASAGIBS.ViirsEarthAtNight2012.

Toronto Building Evaluation 2022

Introduction

Data Exploration and Visualization

Checking Summary Statistics

Checking for the percentage of apartments in need of inspection

Checking for the oldest building in York South-Weston ward and its overall score

Using lubridate library to create new column called season

plotting the barplot to show the number of evaluation done in each season

Plotting property types vs graffiti ratings on a ggplot barplot

Plotting a histogram to show the distribution of SCORE variable

Plotting a histogram to show the distribution of GRAFFITI variable

Creating faceted histogram between SCORE and RESULT_OF_SCORE variables

Filtering York South-Weston dataset to only include properties from five most common address

Overlay Toronto York South-Weston Ward in Map Format

Overlay Toronto York South-Weston Ward in Map Format 2 NASAGIBS.ViirsEarthAtNight2012.

Using `lubridate` library to create new column called season