Track 6 Interplay between AI and Networking System

Track Chair: Waixi Liu, PhD, Associate Professor, Guangzhou University, China, Email: lwx@gzhu.edu.cn

In recent years, we have witnessed: (1) development of fully programmable control of software-defined networking (SDN), programmable data planes and languages for programming them; and (2) the emergence of new platforms, tools, and algorithms for Artificial Intelligence (AI). These technological advancements and scientific innovations create exciting new opportunities. On the one hand, the scientific innovations of AI have the potential to simplify network management (monitoring as well as control). On the other hand, the advancements of networking technology have the potential to improve the performance of AI system.

AI for Networking. The networking researchers, equipment vendors, and Internet service providers are using AI to tackle challenging problems in network design, management, and optimization, which are traditionally addressed using mathematical optimization theory or human-generated heuristics. Building an “autonomous” or “self-driving” network is a desired vision of networking community where network management and control decisions are made real-timely, automatically, and autonomously. AI is also an essential ingredient in the realization of self-driving networks. However, there are lots of challenges in the area of AI for networking, including, the lack of open datasets, open-source toolkits and benchmark suites, interpretability, and robustness of the ML models, etc. Thus, there is a need for more powerful methods to solve the challenges.

Networking for AI. The advancement of AI, has led to remarkable breakthroughs in a variety of application domains, such as computer vision, natural language processing, and robotics, etc. However, to obtain better performance, AI, especially deep learning, models are getting larger and deeper (e.g., Megatron-Turing has over 530 billion parameters), and training data sets are also increasing (e.g., the BDD100K auto-driving data set has 120 million images). Training deep learning models may take days to months on a single GPU or TPU. Thus, a common practice is to exploit distributed machine learning (ML) to accelerate the training process by using multiple processors. In terms of distributed machine learning, with the fast-growing computing power of the AI processors such as GPUs, TPUs, and FPGAs, the data communications among multiple processors gradually become the well-known performance bottleneck. The design of communication-efficient distributed machine learning systems is attracting greater attention from both academia and industry. There remain many challenges in the design, deployment, and management of networks for distributed machine learning.

This track aims to bring together the state-of-the-art research results of AI for networking and networking for AI. Both theoretical and system-oriented studies are invited for participation. The topics of interest of this track include, but are not limited to:

8Open datasets of networking systems for AI research

8Techniques to collect and analyze network data in a privacy-preserving manner

8Benchmark suites for AI research in networking systems

8ML for traffic prediction and classification

8ML for routing, congestion control, and network management

8ML for network security, including anomaly detection, intrusion detection, etc.

8Reinforcement learning for networking systems

8Federated learning for networking systems

8Interpretability and robustness of AI for networking systems

8AI for autonomous and self-driving networks. New use cases for self-driving networks in data centers, WANs, wireless networks, CDNs, home networks, etc.

8Learning models to capture the relationship between network events and control actions

8AI for flexible and scalable network measuring

8Closed-loop systems that use measuring to drive network control (e.g., congestion control, traffic engineering, QoE, QoS, etc.) with minimal human intervention

8New network architecture, network topology, network protocols, switch architectures, communication schemes, scheduling of computing and communication tasks for distributed deep learning, deep reinforcement learning, or federated learning

8New coding techniques for distributed machine learning

8New networking techniques (e.g., programmable switches, SmartNICs, RDMA, NVLink, etc.) to optimize distributed machine learning

8Measurement and analysis of network traffic for distributed machine learning

8Scheduling of multiple distributed machine learning jobs

8Network traffic modeling and performance analysis for distributed machine learning

8Network science for distributed machine learning