Diplom- und Master-Arbeiten (eigene und betreute):
"Imputation and Prediction of Multivariate Travel Time Data";
Betreuer/in(nen): P. Filzmoser, C. Rudloff;
In cooperation with the Austrian Institute of Technology this master thesis
was written as a part of the HealthLog-project. The aim of the project was to
build a reliable dispatching system for Samariterbund Wien that provides an
assignment to the dispatcher focusing on short response time and patients'
convenience. This thesis deals only with the static dispatching problem modeling
the demanded route travel times from observed link travel times. A
central part was devoted to replacement of missing values in the link travel
The reference data set of taxi travel times was collected on Vienna's ring
road G urtel dividing the route Westbahnhof to AKH into 31 links. Data
is available from July 1st, 2008 until June 30th 2010. After grouping the
data into the four categories 'holidays weekday', 'holiday weekend', 'school
day weekday' and 'school day weekend', di erent imputation methods are
applied, namely principal component analysis using singular value decomposition
and the NIPALS algorithm as well as a nearest neighbour approach.
Evaluating the three methods, the nearest neighbour approach performs best,
especially for varying missing value rates. Accurate estimates are produced
for up to 30% of missing values.
To develop methods for the prediction of total travel times, another data set
is collected from January 1st, 2009, until December 31st, 2009, consisting
of trips that start on link 1 or 2 and end in link 30 or 31. Multiple linear
regression is applied to these data and a stepwise regression method using
Akaike's information criterion applied to select the most appropriate predictor
variables. To ful l the assumptions of the regression model, the response
variable is log-transformed, as well as those predictor variables that denote
similar characteristics to preserve a good interpretation of the model.
The obtained model is afterwards compared to a smaller model omitting
average link travel times and deviations from the average speed as well as
a deviation model that estimates the di erences between the total travel
time and the computed average travel time for the corresponding category
(school/holiday, weekday/weekend) and hour instead of the total travel time.
The original model and the small model perform nearly similar, whereas the
deviation model gives rather poor results. At an average total travel time of
about 4 5 minutes nearly 50% of the data can be preciously estimated to
within half a minute.
Furthermore, also observations up to the previous period are included in
the model, naturally, improving the quality of the model. Using average
link speeds instead of estimated travel times of the whole trip obtained by
grouping data into the four categories 'holidays weekday', 'holiday weekend',
'school day weekday' and 'school day weekend' as predictor variables to examine
if some important links already give enough information about the
route impares the results.
Erstellt aus der Publikationsdatenbank des AIT Austrian Institute of Technology.