This page contains the analysis performed on the crime data between 1996 to 2021 for the municipality of Groningen in the Netherlands. The data was aquired from an open dataset uploaded on Kaggle. All the data analysis python code can be found on my Github. Before explaining the details of the dataset, the analysis and insights, first a summary of the current perceived state of crime activity in Groningen is shown.
Numbeo is an organisation that gives insights into safety for areas across the world. For the Municipality of Groningen they give the following summary. The figures and additional information can be found through the LINK HERE.
The dataset
An overview of crime data for the municipality of Groningen from 1996 – 2021 is shown in Figure 3. It is broken down by crime type. The crime type and subtypes are available both in Dutch and in English. English translations are done by Google translate. This is not the original layout of the dataset. It has been transposed, text columns were then removed and Year changed from text values to integers.
See the Jupyter notebook for a more detailed summary of the dataset. Here we can state that the dataset contains no missing values for all of the 26 years of data for all features. No abnormalities in the data are found.
The dataset preparation steps were as follows:
- transpose the data
- remove irrelevant text columns
- create a Year column based on the previous year headers which were turned into index by the transpose step. They are turned into integers instead of text values.
- normalize each crime feature, so we can compare the relative change for each across years.
- Add a new column which is the mean of the normalized crime features. This new columns will be used as the indicator the level (and change) of crime over time.
The analysis
Firstly we want to understand if there is any correlation between the different crime features. This is shown in the correlation plot of Figure 5.
The most obvious and most interesting insight from the correlation plot is that the Year feature has a strong signification negative correlation with a majority of the crime features. This indicates that for those features crime goes down over time. So not all crime activities decrease over time, but the majority do.
We also see that some features such as Robbery have a strong positive correlation with the theft of cars, theft from cars, theft from companies and pickpocketing. Not that surprising since all are related to stealing. You can spend some time looking at all the correlations, for this analysis we focus on how time affects crime rate.
The Discussion
If we only plot the Mean column over time we get Figure 7. Whenever the line is above 0 it means that crime has increased compared to the previous year. It is comforting to see that after 2015 there is a large and significant decrease in the normalized crime rate. There is too few data to say anything about the effect of the covid-19 lockdowns.
Figure 8 shows details about the LSTM model and Figure 9 shows the same plot as Figure 7 but adds the predictions of the LSTM model.
We can conclude that the model is not too bad at prediction the crime statistics. It is more conservative than the actual dataset, which shows a more steep decrease in crime after 2015. This could indicate that there are variables that negatively affected crime rate which was not recorded in the used dataset. We could also add more nodes to the model to increase the fit. Increasing the epochs for learning would not have an effect as shown in Figure 10. After 100 epochs we hardly see improvements in the model as model loss has stagnated.
So finally we can conclude that not only is Groningen quite a safe region to begin with, it is becoming more safe over time. Quite a comforting conclusion! What more could you ask for : )
Further research
There are many ways in which follow up research could be conducted. Firstly if we had monthly crime data we could investigate if there is a seasonality effect. Likely during warmer months the crime increases as more people are outside.
A second option for further research is looking into why crime has started to decrease after 2015. We can notice the difference, but why is it happening? That would be interesting to find out.
Lastly, we could improve the prediction by adding other variables such as amount of students in the area over time or any other type of population change. We could also look at cost of living changes over time and investigate if a change in cost of living affects the rate of crime.
As a final note I want to add that this has been personal data exploratory research, which should not be mistaken for genuine scientific research and therefore all conclusions should be taken with a grain of salt.