A High-performance Distributed Machine Learning Framework for Graph-based Streaming Data with Smart City Applications

RGC Theme-based Research Scheme

PC: Kai Chen (HKUST)

Co-PIs: Qiang Yang (HKUST), Dit-Yan Yeung (HKUST), Hai Yang (HKUST), Jiannong Cao (PolyU), Chi-Keung Tang (HKUST), Lei Chen (HKUST), Dahua Lin (CUHK)


Many smart city applications can be modeled as graphs, where nodes represent different entities/locations such as buildings, road intersections, districts, etc., and links consist of roads connecting these nodes. In the big data era, each node or link may involve continuous data streams. Advanced machine learning techniques are instrumental in mining inherent patterns in such dynamic data streams for making predictions. Computational efficiency and prediction accuracy are fundamental requirements for a machine learning framework to effectively support smart city applications such as transportation optimization, urban planning, and crowd sensing. However, none of the existing machine learning frameworks can achieve both computational efficiency and prediction accuracy for a large graph of streaming data. The underlying grand challenges include data scarcity, algorithm limitation, and computing power inefficiency.

By addressing the above challenges, in this project, we will develop a new machine learning framework for large-scale graph-based streaming data with smart city applications. Specifically, this project will make four main contributions:

  1. A new deep learning methodology for graph-based streaming data: existing deep learning algorithms such as CNNs (convolutional neural networks) are mostly designed for regular structures such as image grids, which are ineffective for irregular graphs with highly dynamic data streams. We will design new deep learning algorithms to effectively learn from irregular city graphs with streaming data.
  2. A new transfer learning framework for inter-city knowledge sharing: since Hong Kong is lacking the necessary data in large volumes, we will develop inter-city transfer learning algorithms to transfer knowledge learned from other cities with rich data sources to our Hong Kong model. As no two cities are the same, transferable and non-transferable knowledge must be identified and separated. To this end, we will propose new domain adaptation and adversarial neural network techniques.
  3. A high-performance distributed AI computing architecture to support the above deep learning and transfer learning over large graph streaming data. In particular, efficient RDMA (remote direct memory access) technique will be adopted to achieve high-throughput, low-latency communications among computing nodes in large AI clusters in order to improve the overall cluster computing efficiency.
  4. In collaboration with the Hong Kong Transport Department and the Hong Kong Observatory, we will apply the proposed machine learning framework to optimize the transportation system for Hong Kong. In particular, we will first implement an AI-driven taxi dispatching system for Hong Kong based on the taxi scheduling data we have collected. Then, with the first milestone, we will expand our machine learning platform to optimize for the entire transportation network in Hong Kong, including the buses, MTR, ferries, and so on.

Chinese version/中文版

More updates to come.

Hong Kong Smart City Blueprint