Development, calibration, and validation of a large-scale traffic simulation model: Belgium road network

. Development of large-scale traffic simulation models have always been challenging for transportation researchers. One of the essential steps in developing traffic simulation models, which needs lots of resources, is travel demand modeling. Therefore, proposing travel demand models that require less data than classical travel demand models is highly important, especially in large-scale networks. This paper first presents a travel demand model named as probabilistic travel demand model, then it reports the process of development, calibration and validation of Belgium traffic simulation model. The probabilistic travel demand model takes cities' population, distances between the cities, yearly vehicle-kilometer traveled, and yearly truck trips as inputs. The extracted origin-destination matrices are imported into the SUMO traffic simulator. Mesoscopic traffic simulation and the dynamic user equilibrium traffic assignment are used to build the base case model. This base case model is calibrated using the traffic count data. Also, the validation of the model is performed by comparing the real (extracted from Google Map API) and simulated travel times between the cities. The validation results ensure that the model is a superior representation of reality with a high level of accuracy. The model will be helpful for road authorities, planners, and decision-makers to test different scenarios, such as the impact of abnormal conditions or the impact of connected and autonomous vehicles on the Belgium road network.


Introduction
Several transportation researchers use traffic simulation models to test different scenarios in different network scales. The inputs of traffic simulation models are usually supply and demand data. Supply data includes all the information related to the transportation network and services, such as the geometric and functional specification of the road network, traffic control; public transport services; and other data, such as fleet vehicles. Also, the travel demand data are typically extracted from travel demand models. The travel demand model is a set of mathematical relationships which describes when, why, and how people and goods move within a particular geographic area. These models estimate travel behavior and demand for a specific (future) time frame, based on several assumptions about the population, land use, household, etc. Travel demand models incorporate economic aspects, technical aspects, lifestyle aspects of society, a specific individual, psychological elements, and factor of time to provide the most accurate representation of a specific travel demand problem (e.g., number of passengers traveling between Brussels to Antwerp by car at P.M. peak hour). Demand data contain the mobility needs of people and goods, which questionnaires, mobile data, etc., can collect.
Several travel demand models have been developed, like 1-Classical four-step model, 2-Tour-based model, 3-Activity-scheduling model, etc. [1]. The most well-known travel demand model is the classical four-step model: 1. Trip generation: determines the number of passengers traveling from a specific city or region. 2. Trip distribution: estimates the number of trips between particular cities or regions. The output of this step is an Origin-Destination (O-D) matrix. The O-D matrix determines the number of trips between each origin and destination. 3. Modal split: determines which transportation mode passengers will use when traveling from their origin to their destination. 4. Traffic assignment: determines the routes passengers choose to reach their destination. For a classical transportation model, the output of the first three steps, an O-D matrix for each transport mode, is fed into a traffic assignment model (either simulation-based or analytical traffic assignment models) to calculate links loads. The final outputs are used to describe, explain, correlate, and forecast transport demand. The fourstep travel demand model has been used by many researchers and showed its excellent performance. However, the most critical disadvantage of this model is that it relies heavily upon household surveys and census data that are very costly and time-consuming to collect. Thus, the availability of household data is very challenging in implementing a four-step model, especially for large-scale road networks (e.g., Belgium). In this study, an alternative approach to four-step modeling is proposed to develop a traffic simulation model for Belgium road network. First, a travel demand model named as probabilistic travel demand model is proposed. This travel demand model only considers the population, distances, passenger-kilometer traveled, and the number of yearly truck trips as the input data to provide hourly O-D matrices on a country level (Belgium). The reason for developing such a travel demand model is that it needs less data than classical four-step modeling. Then, the extracted O-D matrices are inputted into the traffic simulation model. This traffic simulation model is calibrated and validated by traffic count and travel time data.
The following section (section 2) gives information about the case study (available supply and demand data). Section 3 explains the methodology, including the development of the probabilistic travel demand model and building the base case model. Then, the process of calibration and validation of the model is provided in section 4. Finally, sections 5 and 6 describe the results and the paper's conclusion.

Case Study
Belgium is a European country with a land area of 30,688 km² and a population of 11.5 million [2]. Belgium is divided into three regions: the Flanders in the north, the Wallonia in the south, and the Brussels-Capital Region. The transport network in Belgium, including road, rail, sea, and air, is well-developed and well-connected to other parts of Europe. This transport network includes 13.2 thousand kilometers of main/national roads; 5 international airports; 3,602 kilometers of usable rail network; and five seaports [3], [4]. Belgium plays a crucial role in road travel in Europe and ranks 7 in terms of passenger-kilometer among European Union countries. Also, Belgium's motorway network is the third dense after the Netherlands and Luxembourg in Europe [5]. There are more than eight international E-roads in Belgium which connect the east of Europe to the west and south of Europe to the north. In Belgium, three mobility surveys have been carried out by SPF Mobilité et Transports to examine mobility and road safety patterns in detail, using both household and individual data. These surveys, named MOBEL (1999), BELDAM (2012), and MONITOR (2018), provide a comprehensive understanding of the subject [6]. As far as the authors are aware, the origin-destination matrix from these studies is not accessible to the public

Supply Data: Belgium Road Network
The Open Street Map (OSM) file was extracted to build the network. The OSM is a free editable map of the whole world that users make. The road network file is directly imported into traffic simulation software (SUMO). The OSM file contains all categories of roads (named Motorway, Trunk, and Primary roads in OSM). Roads in Belgium are categorized into three types, which are 1-Highways ("Autoroute") (e.g., A2 (E314)); 2-Provincial and regional roads ("Routes provinciales et régionales") (e.g., A501); 3-Municipal roads ("Routes communales"). Since this study provides a travel demand model for outer-city trips, only highways and provincial and regional roads (classified as Motorway, Trunk, and Primary roads on OSM) are modeled, and inner-city traffic roads are not modeled. The network is checked and fixed manually for any error in the SUMO environment (using the SUMO network warning and error tool). Cities are considered as centroids that can generate and attract trips. In total, 60 cities are modeled. The criteria for selecting cities are described in the next section. The OSM file already included road network features like speed and capacity. However, they were double-checked with Google Maps data to ensure accuracy. Figure 1 shows the inserted network into SUMO. Also, to ensure that the network's geometry and the highways' length are imported to SUMO correctly, the shortest distance between the cities in the simulation is compared with reality. The accurate shortest distances are taken from Google API. A comparison between the shortest distances in reality and the simulated network is given in Figure 2. The shortest distances between 3600 O-D pairs in reality and simulation are compared. This figure shows that the length of highways and the geometry of the network is simulated with a high accuracy.

Demand Data
The data considered as inputs to the probabilistic travel demand model for passenger car trips include the population of each city, the distance between cities, and the passenger-kilometer traveled by the Belgium population each year. Population data for all Belgian municipalities are extracted from STATBEL [7]. STATBEl is the Belgian statistical office that collects, produces, and disseminates reliable and relevant figures on the Belgian economy, society, and territory. Belgium has three regions and 11 provinces. The provinces are subdivided into 43 administrative arrondissements and 581 municipalities, 83 of which have a population of more than 30,000. A list of Belgium's arrondissements with their municipalities was provided. Then, 60 municipalities (cities) were chosen as centroids in the travel demand model based on the following criteria: A. All municipalities with over 30,000 population were first selected as centroid (to address the effects of most populated cities). Then, if the distance between two municipalities (within the same arrondissement) is less than 15 km, two municipalities are combined and considered one centroid. B. If an arrondissement consists of less than four municipalities, the first populous municipality is selected (even if the population is less than 30,000). C. If the arrondissement consists of more than five cities, at least three are included.
The geographic distance between cities is also calculated based on the longitude and latitude of cities. The passenger-kilometer travel data for passenger cars is extracted from Federal Planning Bureau (FPB) website [8]. This independent public agency draws up studies and projections on economic, social, and environmental policy issues. According to Table 1, the total passenger-kilometer traveled per year for all three regions of Belgium in highways, provincial and regional roads are equal to 85.762 × 10 9 . Also, for modeling the freight transport trips (trucks), the number of yearly trips by Belgian trucks (by country of loading and unloading) is extracted from STATBEL [9]. The number of truck trips loaded in Belgium and unloaded in Belgium is given in Table 2.

Methodology
The overall process of the traffic simulation modeling for the Belgium road network is illustrated in Figure 3. Each step is described in as follow.

Probabilistic Travel Demand Model
The probabilistic travel demand model first determines the total number of daily trips based on the passenger-kilometer traveled data. Then, it indicates the origin and destination of each trip by applying a random selection on the weighted distribution function of population and distances between the cities [10], [11]. The core assumption behind the model is that the larger a city's population, the greater its likelihood of being chosen as the origin city of a trip. Also, more population of a city and lower distance between cities increase its likelihood of being selected as the city of destination. This model's principles are similar to the gravity model [12]. The steps of the probabilistic travel demand model are described in the following subsections.

Probabilistic Travel Demand Model
The probabilistic travel demand model first determines the total number of daily trips based on the passenger-kilometer traveled data. Then, it indicates the origin and destination of each trip by applying a random selection on the weighted distribution function of population and distances between the cities [10], [11]. The core assumption behind the model is that the larger a city's population, the greater its likelihood of being chosen as the origin city of a trip. Also, more population of a city and lower distance between cities increase its likelihood of being selected as the city of destination. This model's principles are similar to the gravity model [12]. The steps of the probabilistic travel demand model are described in the following sections.

Determination of the total number of passenger cars daily trips
As shown in Table 1, the total number of passenger-kilometer traveled on highways, and the provincial and regional road is equal to 85.762 × 10 9 . Assuming that each passenger travels 85 kilometers per trip and the weekend traffic is considered to be half as high, this gives approximately 3.2 million trips per day in the Belgium road network. The equation for calculating the total number of passenger cars daily trips ( ) is as follows:

Determination of the total number of trucks daily trips
By summing up the passenger car trips and truck trips, the total number of all trips can be calculated.

Determination of trips Origins
After determining the total number of daily trips in Belgium (both passenger cars and trucks) ( = 3.3), the origin for each trip ( ) should be specified. It is assumed that the probability of selecting a city as the origin city is proportional to its population. A random selection is applied to the weighted probability distribution of cities' populations. The weighted probability distribution function is given in equation 4. The logic behind this selection is that a city with a larger population is more likely to be chosen as the origin city.
Where ( ) is the weighted population probability distribution function; ( = ) is the probability of city to be chosen as the origin city for trip ; and is the population of city .

Determination of trips Destinations
In this step, the destination for each trip is determined. The decision for selecting the city of destination is based on the assumption that the larger the population of a city and the closer it is to the city of origin, the higher the likelihood of it being chosen as the destination city. To put this assumption into mathematical form, the weighted distance distribution function is defined for each origin city as follows: Where ( ) is the weighted distance distribution function; ( = | ) is the probability of selecting city as the destination if city is chosen as origin city; and is the distance between cities and . Then, the population distribution and the distance distribution are mixed to generate the new distribution. The mixed population-distance probability distribution function is defined as: Where ( ) is the mixed population-distance distribution function; ( = | ) is the probability of selecting city as destination if city was selected as origin; ( ) is the weighted population probability distribution function; ( ) is the weighted distance probability distribution function; is a calibration parameter that indicates the importance of considering the population or distance between the cities in the probability of being selected as the destination city. The destination city of trip is determined by applying a random selection on the mixed population-distance distribution function mentioned above.
It's worth noting that in this study, the behavior of both passenger cars and trucks trips are assumed to be the same in terms of origin and destination. However, the nature of truck trips may vary based on the locations of terminals, distribution centers, and companies within different sectors, and further research is necessary to investigate these differences.

Determination of departure times
Each trip is assigned to each hour of the day by the typical hourly distribution of travel (based on the type of vehicles: passenger car or truck), which is presented by NCHRP (2004) (Figure  4).

Figure 4: Typical hourly Distribution of Traffic Demand [13]
At the end of step 5 of this probabilistic travel demand model, the hourly O-D matrix for each hour of the day is available. The Pseudocode of the model is given in Table 3. i. Form the weighted population probability distribution function ( ( )) ii.
Apply a random selection on ( ) to determine the city of origin for trip . iii.
Form the weighted distance function ( ( )) for the origin city. iv.
Form the mixed population-distance function ( ( )) for the origin city. v.
Apply a random selection on ( ) to determine the city of destination for the trip . vi.
Apply a random selection on the typical hourly traffic demand distribution (based on the type of vehicle) to determine the departure time of trip . vii.

Base Case Model
After obtaining the hourly O-D matrix, a base model was simulated. Dynamic User Equilibrium (DUE) traffic assignment was used to assign travel demand to the network. The traffic assignment tool in SUMO, duaIterate.py, is used to perform DUE. Please refer to [14], [15] for more information on dynamic traffic assignment in SUMO. Dijkstra's algorithm is used to find the shortest path. The Logit model is used as the route choice model. The simulation is performed at the mesoscopic level, distinguishing between passenger cars and trucks.
It should be noted that in this basic model, all of the traffic flow model's parameters and the traffic assignment model's parameters are considered the default values of SUMO. Then, they are modified based on the calibration process explained in the next section. The simulation period was 24 hours; however, this paper reports on calibration and validation for the morning peak.

Calibration and Validation of the Model
Model calibration is the process of variating model parameters in a way that the system performance of the model meets real data output. This is the most critical and complex step of traffic simulation. Previous studies have suggested that two types of parameters should be calibrated in traffic simulation [16]- [18]: 1. Calibration of traffic flow model parameters (capacity calibration): This type of calibration consists of local and global parameter modifications and tries to reproduce observed traffic capacities in the field by modifying traffic flow model parameters (reaction time, headway, etc.).

Calibration of dynamic traffic assignment (global or local parameters) model parameter:
This calibration is intended to make the path selection of vehicles in simulation close to reality. Usually, it is done by comparing the real and simulated traffic counts on specific links. To make the simulated traffic count closer to reality, either the inserted O-D matrix is modified, or the traffic assignment model is modified (by changing the assignment method, route choice model parameters, number of iterations, etc.).

Calibration of traffic flow model parameters
The traffic flow modeling is performed on the mesoscopic scale in SUMO. The mesoscopic model of SUMO is based on the work of Eissfeldt [19]. This model is a queue base model which computes the time at which a vehicle travels from a queue based on the traffic state in the current and subsequent queue, the minimum travel time, and the stage of intersection (e.g., red, green, yellow). Some examples of mesoscopic parameters are minimum headway, queue length, junction control, edge length, etc. This model's parameters are calibrated for largescale networks in the work of Presinger [20]. This study uses the same parameters of the queuing model as the work of Presinger. Please refer to (DLR, 2021; Presinger, 2021) for more information about queuing model parameters.

Calibration of dynamic traffic assignment model parameters
The dynamic traffic assignment model was calibrated based on a comparison of real count data and the model-assigned count data for 50 detectors on the network. The segments are selected in a way that covers the entire Belgian network. The traffic count data was extracted from the website of the Flanders government [21]. This calibration consists of two parts. First, the O-D matrix is adjusted by testing the different values of . As mentioned in previous sections, is a calibration parameter in the travel demand model. It determines the importance of a city's population and its distance from the city of origin in determining each trip's destination.
After testing several values of , it was finally concluded that = 0.25 leads to an O-D matrix with the closest simulated traffic count to the real traffic count. The second part is to modify parameters of DUE. The models and parameters considered in the calibration process are warm-up time, routing algorithm, route choice model (e.g., deterministic or stochastic), route choice model parameters, number of available alternatives, swapping algorithm, and number of iterations. By altering these parameters, it was found that the Dijkstra routing algorithm with the Logit route choice model and ten iterations gives the best results. Figure 5 shows the hourly simulation results versus the observed flows. In addition to the scatter plot, the GEH criterion is calculated to compare the simulated and real traffic volumes. GEH determines the tolerance of relative and absolute errors on the network's traffic count. GEH formula is as follows: is the simulation's hourly traffic volume, and is the real-world hourly traffic count. The average GEH is equal to 4.9. In this study, 78% of observations have GEH criteria less than 7.5, which is in the acceptable threshold [16].

Validation
Various methods exist to validate traffic simulation models, including traffic count and travel time comparison between reality and simulation. One of the new methods of large-scale traffic simulation models' validation, which has been used in previous studies [22], is the comparison of real and simulated travel times at the origin-destination pair level. This method extracts the simulated travel time between each origin and destination (cities) from DUE (for a specific time interval). The real travel times are exported from Google Maps' Distance Matrix API. The Distance Matrix API provides travel distance and time for a matrix of origins and destinations [23]. In this study, this method is implemented for validation.

Calibrated and Validated Model Results
The findings of the calibrated and validated traffic simulation model for Belgium during a A.M. peak hour are displayed in Table 4. This table reveals that the overall travel time was 338896 hours and the average speed for all vehicles was 61 km/hr. Additionally, Figure 7 presents a comparison between the simulated and actual speeds on the roads in Belgium. As demonstrated in the figure, the model correctly identifies the locations of traffic congestion.

Conclusion
This study reports the development, calibration, and validation of the traffic simulation model of Belgium. First, a probabilistic travel demand model is developed using population, distances between the cities, yearly vehicle-kilometer travelled by passenger cars, and annual truck trips.
The probabilistic travel demand model calculates the number of all trips based on the yearly vehicle-kilometer travelled and the trucks' trips. The origin and destination of each trip are determined by applying random selection on the population distribution and the mixed distribution of population and the distance between the cities, respectively. The departure time of each trip is based on the typical distribution of travel demand. Then the probabilistic travel demand model's outputs (hourly O-D matrices) are imported to SUMO's traffic simulation software. After that, a base model is simulated using the mesoscopic feature of the SUMO traffic simulator and the DUE traffic assignment to assign travel demand to the network. This basic model is calibrated by real traffic count data. The calibration process includes the calibration of traffic flow model parameters (queuing model) and the parameters of dynamic traffic assignment. Finally, the model is validated using real travel times between cities in congested conditions. The real travel times are extracted from Google map Distance Matrix API. The results of the validation prove the accurate performance of the traffic simulation model.
The proposed traffic simulation model of Belgium can help researchers, decision-makers, and policy-makers, to test different transportation planning scenarios at the country level. For future studies, developing the proposed probabilistic travel demand model for other case studies is recommended to check the model's performance. Also, in this study, for modeling the freight demand, only trucks with Belgian license plates were taken into consideration. However, for achieving highly precise outcomes, it's imperative to factor in the transit traffic of cargo vehicles from other nations as well. This task, though, requires access to cargo data from those other countries.

Data availability statement
The data that support the findings of this study are not publicly available. Access to the data may be granted upon request to the corresponding author (behzad.bamdad@uclouvain.be).

Author contributions
Behzad Bamdad Mehrabani contributed to the conceptualization of the study, developed the methodology, conducted the investigation, curated the data, wrote the original draft, and created the visualizations. Luca Sgambi contributed to the conceptualization of the study, validated the results, reviewed and edited the writing, provided supervision, managed the project administration, and acquired the funding. Sven Maerivoet assisted with the methodology, validated the results, and reviewed and edited the writing. Maaike Snelder contributed to the conceptualization of the study, developed the methodology, validated the results, analyzed and interpreted the data, and reviewed and edited the writing. All authors reviewed the results and approved the final version of the manuscript.