How could POSMOs Transport Mode Detection be improved further using publicly available data
Author
Lukas Bieri (bieriluk) & Valentin Hett (hettval1)
Published
July 1, 2023
Abstract
Computational Movement Analysis is a widely studied field that aims to analyse and validate trajectory data, to identify correlations, patterns and outliers in different forms of movement.
GPS data forms an elementary baseline for the analysis of movement patterns in various applications. This project focuses on Transport Mode Detection (TMD) using GPS tracking data from the POSMO app. This project implements algorithms in R for data processing, analysis and visualisation using publicly available context data such as public transport networks. The results demonstrate the effectiveness of multi-criteria analysis (MCA) for TMD even with limited optimisation of the underlying algorithms. Our project shows that an improvement of POSMOs TMD could be possible with our approach, needing an inclusion of other algorithms and data. The accuracy against ground truth, already showed a high comparability to POSMO, in bus detection the results were even better.
The project, while having some limitations in the implementation, presents ideas for further improvements in TMD from POSMO data, including the addition of height modelling, accelerometer data and supervised learning algorithms.
Introduction
Show the code
## 1. Information### 1.1 Project Info#Module: Patterns and Trends HS22#Course: Semester Project#Lecturer: Prof. Dr. Patrick Laube#Assistent Lecturers: Nils Ratnaweera & Dominic Lüönd#Autors: Valentin Hett (hettval1) & Lukas Bieri (bieriluk)#Date: 30.06.2023#Info: Most visualizations have been commented out due buffer overflow, with the expection of the figues in this report.
Computational Movement Analysis is a widely researched field that uses algorithms and visual techniques to analyse and validate trajectory data to detect relationships, patterns and outliers in movement. However, current visualization systems predominantly target multilevel applications and macro-level results. GPS tracking is becoming more and more important in Movement Analysis. One major limitation of GPS is, that it can only record positions and cannot provide context or semantics (Van der Spek et al., 2009). Even apps with support functions, where people are asked to fill in a movement protocol often leads to incomplete data, due to laziness or forgotten memories (Sadeghian et al., 2022).
The utilized tracking app POSMO records, analyses and visualizes trajectories from mobility data. POSMO has already implemented algorithms to determine the transport modes (TM) for recorded trajectories. Like any GPS recording, there is noise (unintended datapoints) and variability, which can lead to incorrect conclusions. Accurate Transport Mode Detection (TMD) is necessary for many movement analysis tasks, e.g. to improve public transport planning or to find the most fuel-saving routes by car. There are many different approaches found in literature review to TMD. Sadeghian et al. (2022) showed that accurate TMD was possible using a combination of unsupervised and supervised leaning algorithms with a spatial multi criteria analysis (MCA) incl. context maps.
For this project, we set out to answer the following research questions:
Can Transport Mode Detection (TMD) for the POSMO tracking data be improved using the stepwise procedure described in Sadeghian et al. (2022) and with public transport data?
Where do we see potential for improvement with POSMOS TMD from our improvement trials and literature?
Considering the constrained resources available for this semester project, including time and computational power, the primary goal was changed. Instead of revolutionising TMD, the project goal is to explore different approaches from literature by implementing them and brainstorm potential ways to improve TMD from POSMO data. Additionally, it is assumed that not all these approaches can be implemented fully in the form of algorithms within the framework of this project.
Show the code
### 1.2 Software used#R version 4.2.1 (2022-06-23 ucrt) -- "Funny-Looking Kid" Copyright (C) 2022 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit)#RStudio 2023.06.0+421 "Mountain Hydrangea" Release (583b465ecc45e60ee9de085148cd2f9741cc5214, 2023-06-05) for windows Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) RStudio/2023.06.0+421 Chrome/110.0.5481.208 Electron/23.3.0 Safari/537.36
Material and Methods
The POSMO app saves tracking data in its online datamap tool, where it can be downloaded. The app already assigns a transportation mode incl. train, car, bus, walking and even airplane (Genossenschaft Posmo Schweiz, 2023).
For this research project, only the transport mode walking (incl. running), biking, train (incl. gondolas & cable cars) and buses (incl. trams) are considered and compared. Ships and aerial vehicles were not included. The system boundaries were set for the canton of Zurich.
For the different TMD improvement approaches we mostly followed the method set out by Sadeghian et al. (2022) with a focus on MCA.
The preprocessing, analysis and visualization is done in R (4.2.1/2022-06-23) using RStudio (2023.06.0+421) and the packages: “ggplot2”, “dplyr”, “tidyr”, “readr”, “zoo”, “data.table”, “sf”, “terra”, “tmap”, “stats”, “randomForest”, “lubridate”, “trajr”, “gstat”, “geosphere”, “nngeo”, “vegan”, “hms”, “tibble”, “useful”, “DescTools”, “utils” and “janitor”. Last update of these packages was on the 23.06.2023.
Show the code
# Install (if necessary) and load all necessary packages with this functionipak <-function(pkg){ new.pkg <- pkg[!(pkg %in%installed.packages()[, "Package"])]if (length(new.pkg)) install.packages(new.pkg, repos ="http://cran.us.r-project.org", dependencies =TRUE)sapply(pkg, require, character.only =TRUE)}packages <-c("ggplot2", "dplyr", "tidyr", "readr", "zoo", "data.table", "sf", "terra", "tmap", "stats", "randomForest", "lubridate", "trajr", "gstat", "geosphere", "nngeo", "vegan", "hms", "tibble", "useful", "DescTools", "utils", "janitor")ipak(packages)#Set the tMap mode to "view"tmap_mode(mode ="view")
All necessary data sets are loaded into R, reprojected (where necessary) and filtered. An exploratory data analysis is done for the POSMO data to determine the appropriate settings for data cleaning and outlier removal.
Show the code
## 3. Preprocessing### 3.1 Import, check and transport data#### 3.1.1 Boundries# Import Boundries data setst_layers("datasets/swissTLMRegio_BOUNDARIES_LV95.gdb")kanton_zh <-st_read("datasets/swissTLMRegio_BOUNDARIES_LV95.gdb", layer ="TLMRegio_KANTONSGEBIET")kanton_zh <- kanton_zh |>filter(NAME =="Zürich")#Check if the coordinate system is correctly assignedst_crs(kanton_zh)#Visualize to verify# tm_shape(kanton_zh) +# tm_polygons() +# tm_basemap("Esri.WorldImagery")#### 3.1.2 Posmo data#Import raw unverified setposmo <-read_delim("datasets/posmo_2023-01-01T00-00-00_2023-06-16T23-59-59_unvalidated_def.csv", delim =",")#Import the manually verified data set (in the POSMO datamap online tool)posmo_valid <-read_delim("datasets/posmo_2023-01-01T00-00-00_2023-06-16T23-59-59_validated_def.csv", delim =",")#Check if the import got the Time Zone for the POSIXct colum correctstr(posmo)str(posmo_valid)Sys.time()#Store your data frame as a spatial data frame and transform the coordinate system from WGS84 (i.e. EPSG 4326) to CH1903+ LV95 (EPSG 2056) and filter it to the canton of Zurich (intersect)posmo <-st_as_sf(posmo, coords =c("lon_x","lat_y"), crs =4326) |>st_transform(2056) |>st_filter(kanton_zh, .pred = st_intersects)#Same for the validated dataposmo_valid <-st_as_sf(posmo_valid, coords =c("lon_x","lat_y"), crs =4326) |>st_transform(2056) |>st_filter(kanton_zh, .pred = st_intersects)#Check the results in a table and by visualizationhead(posmo)head(posmo_valid)# tm_shape(posmo) +# tm_dots(col = "red") +# tm_basemap("Esri.WorldImagery")# tm_shape(posmo_valid) +# tm_dots(col = "red") +# tm_basemap("Esri.WorldImagery")#Extract the coordinates into separate colums to use them for euclidean distance calculationposmo_coordinates <-st_coordinates(posmo)posmo <-cbind(posmo, posmo_coordinates)#Same for the validated dataposmo_valid_coordinates <-st_coordinates(posmo_valid)posmo_valid <-cbind(posmo_valid, posmo_valid_coordinates)#### 3.1.3 Railway routes data#Check the layer of the gdbst_layers("datasets/swissTLMRegio_Produkt_LV95.gdb")#Import the Railway Layertrain_routes <-st_read("datasets/swissTLMRegio_Produkt_LV95.gdb", layer ="TLMRegio_Railway")#Check if the coordinate system is correctly assignedst_crs(train_routes)#Filter for "Normalspurbahn", "Schmalspurbahn", "Standseilbahn", "Seilbahn", "Gondelbahn", "Sessellift" and "Autoverlad", exclude "Güterbahn", "Museumsbahn", "Bahn ausser Betrieb", "Bahn im Bau" and limit it to the canton of zurich (intercest)train_routes <- train_routes |>filter(OBJVAL !=3& UNDERCONST ==0) |>st_filter(kanton_zh, .pred = st_intersects)#add train stopstrain_stops <-st_read("datasets/swissTLMRegio_Produkt_LV95.gdb", layer ="TLMRegio_Terminal")train_stops <- train_stops |>filter(OBJVAL ==1) |>st_filter(kanton_zh, .pred = st_intersects)#Visualize to verify# tm_shape(train_routes) +# tm_lines(col = "red") +# tm_shape(train_stops) + # tm_dots(col = "red") +# tm_basemap("Esri.WorldImagery")#### 3.1.4 Bus & tram data#Check the layer of the gpkgst_layers("datasets/Linien_des_offentlichen_Verkehrs_-OGD.gpkg")#Import the layer with all the public transport lines (filterd at download to exclude railway "S-Bahn")bus_routes <-st_read("datasets/Linien_des_offentlichen_Verkehrs_-OGD.gpkg", layer ="ZVV_LINIEN_L")#Check if the coordinate system is correctly assignedst_crs(bus_routes)#Filter to the canton of zurich (intersect). This does not exclude segments that start in the canton and leave it, but that doesn't seem to be an issue for public transport as the canton boder is arbitrarily set system boundrybus_routes <- bus_routes |>st_filter(kanton_zh, .pred = st_intersects)#Visualize to verify# tm_shape(bus_routes)+# tm_lines()+# tm_basemap("Esri.WorldImagery")#### 3.1.5 Road network data#Check the layer of the gdbst_layers("datasets/swissTLMRegio_Produkt_LV95.gdb")#Import the layer with all major roads and filter it for the canton of zurichroads <-st_read("datasets/swissTLMRegio_Produkt_LV95.gdb", layer ="TLMRegio_Road")roads <- roads |>st_filter(kanton_zh, .pred = st_intersects)#Visualize to verify# tm_shape(roads) +# tm_lines(col = "red") +# tm_basemap("Esri.WorldImagery")### 3.2 Getting an overview & EDA#### 3.2.1 For how long were the individual tracked? Are there gaps? Were all individuals tracked concurrently or sequentially?#Check the posmo data by inspecting it in detailhead(posmo)tail(posmo)head(posmo_valid)tail(posmo_valid)class(posmo$datetime)class(posmo_valid$datetime)tz(posmo$datetime)tz(posmo_valid$datetime)#### 3.2.2 How many individuals were tracked # Make sure all data from one individual posmo$user_id |>unique()#### 3.2.3 List of all transport modes#Create a list of all transport modes in the POSMO data incl. numerical codes fotr the TMsunique(posmo$transport_mode)unique(posmo_valid$transport_mode)numbers <-c(0, 1, 2, 3, 4, 5, 6, 8)names <-c("Unkonwn", "Walk", "Car", "Bus", "Train", "Bike", "Tram", "Other")transport_mode <-c(NA, "Walk", "Car", "Bus", "Train", "Bike", "Tram", "Other1")all_transport_modes <-data.frame(numbers, names, transport_mode)all_transport_modes#Join the Transport Modes with the raw data to have the numerical codes for TM in the data framesposmo <- posmo |>left_join(all_transport_modes, by ="transport_mode") |>rename(tm_unval = numbers)posmo_valid <- posmo_valid |>left_join(all_transport_modes, by ="transport_mode") |>rename(tm_val = numbers)
The Public Transport data is sourced from Open Data Platforms and governmental GIS data bases. The project uses the railway and boundary data from swissTLMRegio (swisstopo, 2022a, 2022b) and bus data from the Zurich Transport Network (ZVV) (Verkehrsbetriebe Zürich VBZ, 2022). The project uses POSMO tracking data from one student between 12. April 2023 and 16. June 2023. This data set was manually validated with ground truth for TM by memory using the POSMO datamap online tool. The segmentation of trajectories was not changed for validation from the POSMO segmentation due to the high workload of this procedure.
Show the code
#Visualize the used kontext data into one figurefigure_1 <-tm_shape(kanton_zh) +tm_borders(col ="red",lwd =3) +tm_shape(train_routes) +tm_lines(col ="green") +tm_shape(roads) +tm_lines(col ="black")+tm_shape(bus_routes) +tm_lines(col ="blue")+tm_add_legend(type ="fill", labels =c("Canton Zurich", "Railway lines", "Major Roads", "Bus/Tram routes"),col =c("red", "green", "black", "blue"),border.lwd =0.5,title ="Data used for MCA")figure_1