This page contains the analysis performed on the crime data between 1996 to 2021 for the municipality of Groningen in the Netherlands. The data was aquired from an open dataset uploaded on Kaggle. All the data analysis python code can be found on my Github. Before explaining the details of the dataset, the analysis and insights, first a summary of the current perceived state of crime activity in Groningen is shown.

Numbeo is an organisation that gives insights into safety for areas across the world. For the Municipality of Groningen they give the following summary. The figures and additional information can be found through the LINK HERE.

Figure 1: Numbeo figures on Crime rates in Groningen 2022. It shows that generally speaking Groningen is quite a safe place on an international level. We still wonder, is it getting safer or more dangerous. Let’s figure it out!
Figure 2: Groningen Crime Index compared to a few other cities across the world.

The dataset

An overview of crime data for the municipality of Groningen from 1996 – 2021 is shown in Figure 3. It is broken down by crime type. The crime type and subtypes are available both in Dutch and in English. English translations are done by Google translate. This is not the original layout of the dataset. It has been transposed, text columns were then removed and Year changed from text values to integers.

Figure 3: First five entries of the crime dataset. It contains 14 features for 26 years of data (between 1996 to 2021).

See the Jupyter notebook for a more detailed summary of the dataset. Here we can state that the dataset contains no missing values for all of the 26 years of data for all features. No abnormalities in the data are found.

The dataset preparation steps were as follows:

  • transpose the data
  • remove irrelevant text columns
  • create a Year column based on the previous year headers which were turned into index by the transpose step. They are turned into integers instead of text values.
  • normalize each crime feature, so we can compare the relative change for each across years.
  • Add a new column which is the mean of the normalized crime features. This new columns will be used as the indicator the level (and change) of crime over time.
Figure 4: First five entries of the normalized crime dataset with at the end the newly created ‘mean’ column. This will be the indicator for the state and change of crime in Groningen.

The analysis

Firstly we want to understand if there is any correlation between the different crime features. This is shown in the correlation plot of Figure 5.

Figure 5: correlation between different types of crime in Groningen where the p-value is smaller than 0.05.

The most obvious and most interesting insight from the correlation plot is that the Year feature has a strong signification negative correlation with a majority of the crime features. This indicates that for those features crime goes down over time. So not all crime activities decrease over time, but the majority do.

We also see that some features such as Robbery have a strong positive correlation with the theft of cars, theft from cars, theft from companies and pickpocketing. Not that surprising since all are related to stealing. You can spend some time looking at all the correlations, for this analysis we focus on how time affects crime rate.

Figure 6: ploting all 14 crime features over time. This figure is not insightful at all to indicate if crime has been decreasing or not over time. On visual inspection we might be able to see a small decrease in crime but we need a deeper analysis to make this conclusion. How we do this is by adding a new column to the dataset which is the mean of the normalized crime features. This new columns will be used as the indicator for the level of crime over time.

The Discussion

If we only plot the Mean column over time we get Figure 7. Whenever the line is above 0 it means that crime has increased compared to the previous year. It is comforting to see that after 2015 there is a large and significant decrease in the normalized crime rate. There is too few data to say anything about the effect of the covid-19 lockdowns.

Figure 7: plotting the normalized mean crime rates shows that after 2015 crime has been steadily decreasing.

Figure 8 shows details about the LSTM model and Figure 9 shows the same plot as Figure 7 but adds the predictions of the LSTM model.

Figure 8: LSTM settings
Figure 9: normalized crime rate in Groningen over time plotted with the LSTM model predictions in Green.

We can conclude that the model is not too bad at prediction the crime statistics. It is more conservative than the actual dataset, which shows a more steep decrease in crime after 2015. This could indicate that there are variables that negatively affected crime rate which was not recorded in the used dataset. We could also add more nodes to the model to increase the fit. Increasing the epochs for learning would not have an effect as shown in Figure 10. After 100 epochs we hardly see improvements in the model as model loss has stagnated.

Figure 10: LSTM model loss after 100 epochs.

So finally we can conclude that not only is Groningen quite a safe region to begin with, it is becoming more safe over time. Quite a comforting conclusion! What more could you ask for : )

Further research

There are many ways in which follow up research could be conducted. Firstly if we had monthly crime data we could investigate if there is a seasonality effect. Likely during warmer months the crime increases as more people are outside.

A second option for further research is looking into why crime has started to decrease after 2015. We can notice the difference, but why is it happening? That would be interesting to find out.

Lastly, we could improve the prediction by adding other variables such as amount of students in the area over time or any other type of population change. We could also look at cost of living changes over time and investigate if a change in cost of living affects the rate of crime.

As a final note I want to add that this has been personal data exploratory research, which should not be mistaken for genuine scientific research and therefore all conclusions should be taken with a grain of salt.

This is how you should interpret the results. They are fun, interesting and likely show a real decrease of crime over time. Still, do not confuse it with proper scientific research.

Leave a Reply

Your email address will not be published. Required fields are marked *