Data Visualisation. Overlaying Data on a Map.

For some types of data visualisation, it may be useful to overlay the data points on a map as the background, particularly if the data has some sort of location significance. One example would be to visualise house resale prices on a map overlay so that the viewer can get a sense of where the higher priced houses are clustered, if at all.

In this example, we will use the following real estate valuation data consisting of a rather small set of data (414 entries) on houses from a section of Taipei city.

Reference
https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set (Accessed on 17 Aug 2020)
Citation Request:
Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271.

We will only be using three features from the dataset – the ‘latitude’, ‘longtitude’, and the ‘house price of unit area’ attributes. The ‘latitude’, ‘longtitude’ will be plotted as a scatter plot (just like how it is represented on a map), and a heat map will be used to colour the individual points based on the numerical value of its ‘house price of unit area’. Next this plot will be overlaid on a screenshot from Google Map of the corresponding area to give the final plot.

Let us consider the block of code below. The code shown would not start from Line 1 or have continuous Line numbering as with previous post, as some comments/notes present in the code has been omitted for simplicity sake.

Lines 8 to 10 imports the modules to be used.
Line 13 assigns the dataset ‘data.csv‘ file to a dataframe variable, and Line 14 allows us to check the first five rows of data.
The column headers in the dataset are listed in Lines 18 to 25 for easy reference. As mentioned earlier, some columns would not be used in this data visualisation and will be dropped from the dataframe, as shown in Lines 29 to 33.

Line 37 to 40 uses Pandas scatter plot function to plot the ‘latitude’, ‘longtitude’ as the y- and x-axis respectively. A heat map will be applied to each point where its colour would correspond to the price. In addition, the size of the plotted dots will also be dependent on the price. The colour scheme used here is “tab20c”; the alpha parameter determines opacity of the dots.

Line 43 to 46 then uses a separate .png file of the map where the scatter plot will be overlaid. Line 45 and 46 labels the axes of the plot. We will mention Line 44 in more detail below; again the alpha parameter determines the opacity of the map layer.
Line 48 displays the legend. Here the plot is saved as a .png file, instead of just being displayed in the IPython console. Finally Line 51 is the usual plt.show() to display the plot.

What can we do to get the screenshot of the desired area from Google Maps?

The scatter plot has to be overlaid on a map that is slightly larger than the boundaries of the plots. This could be determined by looking at the minimum and maximum of both the ‘latitude’ and ‘longtitude’ data, and then increasing that value slightly. For example, in Line 44, the extent of interest is [121.46, 121.58, 24.92, 25.02] – the first two numbers are the range for the longitude and the last two number are the range for the latitude.

With these four location coordinates, we can pin them on Google Maps, take a screenshot, crop the image to the four points as accurately as possible, and save the file as taipei.png in the same folder as the Python code.

After running the Python code, the following plot will be shown – a scatter plot overlaid on a map. This can inform the viewer about the location of the real estate (whether it is near a mass transit station, main road etc.), as well as the price. As shown below, the data consists of mainly houses from the Xindian District, with a few higher priced real estate near the green mass transit line.


Of course, the accuracy of the overlay depends on how accurately the crop of the map is made.

It is also to be noted that if dynamic resizing of the map (i.e zooming in and out) is required, then this simple method using standard Python libraries would not be suitable.

Is there any way to make the above codes more concise and elegant? Feel free to comment.

Leave a comment