Abstract:
Traffic congestion is a significant problem in Dhaka – the capital city of Bangladesh. Traffic congestion is caused by several factors, including the high volume of non-motorized traffic, the lack of public transportation, and the poor road infrastructure. One of the challenges in mitigating traffic congestion is accurately estimating the volume and proportion of different types of vehicles during different periods. Manual vehicle counting is a state-of-the-art technique for traffic volume determination on the roads of Dhaka. However, it is a labor-intensive and time-consuming procedure. Automated counting of vehicles from videographic surveys has been proven to be less error-prone than manual counting. However, the state-of-the-art algorithms for automated traffic volume detection from video graphics data are not suitable for Dhaka city due to Dhaka's unique traffic conditions. First, the number of non-motorized vehicles, such as bicycles and rickshaws, plying on the roads of Dhaka is high. Second, the existing algorithms for automated vehicle detection have not been validated for non-lane-based traffic operations. This paper proposes a novel algorithm for automated traffic volume detection from video graphics data for Dhaka city. The proposed algorithm is based on a deep learning approach, YOLO - You Only Look Once, a state-of-the-art object detection algorithm. The existing YOLO algorithm was trained with traffic images from various key junctions in Dhaka, and the hyperparameters were tuned to enable the network to learn the nuances of the city's unique and complex traffic situation. Various techniques, such as data augmentation by rotation, flipping, and random cropping, were used to prevent the network from overfitting. The Cosine Learning Schedular, as opposed to Step Scheduler, was used to achieve better validation accuracy. The modified YOLO algorithm was benchmarked on a dataset of images extracted from a videographic survey of Dhaka’s traffic. To determine the total vehicle volume, two methods were employed. In the first method, the number of vehicles in each frame was counted, and they were added. Then the summation was divided by the number of frames to get the average vehicle count per frame. This number was multiplied by the average time that each vehicle spends in the video to get the average vehicle volume. The second method utilized a line and the intersection over union (IoU) technique within adjacent frames of the video. In this approach, a bounding box was drawn around a vehicle using the YOLO v7 algorithm, and the vehicle count was incremented by one when the bounding box crossed the line. The line helped to keep track of how many vehicles crossed the junction. The count of vehicles was then divided by the duration of observation. The outcomes of both algorithms were then passed through a shallow artificial neural network. The final output value of this network was compared with the traditional method of counting. Google Collaboratory, which provides access to powerful computing resources, was used to train and deploy the model. The algorithm could also be used to develop new transportation applications, such as real-time traffic monitoring and navigation. The continuous count of the classified vehicle could also be used to calculate the saturation flow rate at intersections and determine the operational capacity of highways. Overall, the developed algorithm could help assess the performance and efficiency of the transportation system, identify
58
improvement areas, and improve the safety of heterogeneous traffic operations. It can be noted that to improve the robustness of the algorithm, it would be prudent to apply the developed algorithm to datasets collected from other developing countries, such as Sri Lanka. However, the authors do not have access to such datasets currently, and hence, it has been left as a future research endeavor.