DCWiz: Transforming Data Center Operations and Management with AI

主讲人:Dr. Yonggang Wen(文勇刚)

Abstract: Data center, dedicated space to house computer systems and associated components, plays an important role in propelling digital transformation globally, especially in the presence of a global pandemic (e.g., COVID-19). The resulted extensive use of services such as cloud, big data, IoT and artificial intelligence is prompting DC operators to consolidate high-performance mission critical IT infrastructure in hyper-scale data centers. In practice, data centers are managed and optimized towards two operational objectives: 1) business resilience, serving to evaluate the commercial impact of service disruption, and 2) operations sustainability, serving to evaluate the cost of providing services. AI solutions, made popular by Google DeepMind, have been touted with great potential to optimize the aforementioned operational metrics. However, adopting AI techniques for data center operations and management (O&M) faces substantial challenges due to data scarcity and risk-averse mindset. In this talk, we present DCWiz, a transformative industrial AI solution that integrates an industry-grade digital twin with emerging AI techniques to digitalize, optimize and automate data center O&M for operational excellence. Specifically, DCWiz leverages advanced AI techniques (e.g., deep reinforcement learning and generative adversarial network) to train an industry-grade digital twin for any physical data center, in terms of data accuracy, and deploy the digital twin as a pivotal module to provide descriptive, predictive and prescriptive AI for DC O&M. We delve into two use cases, namely hot-spot management and PUE optimization, to show that DCWiz would potentially lead to 30% of cooling cost saving in tropical environments. We further share several commercial trials with Alibaba, Singapore National Supercomputing Center (NSCC), Singapore, each of which represents different applications of the DCWiz solution. The trial results demonstrate the effectiveness of our DCWiz solution in improving manageability, reducing cost and mitigating operational risks for mission-critical data centers.

Speaker Bio: Dr. Yonggang Wen is a Professor and President’s Chair in Computer Science and Engineering at Nanyang Technological University (NTU), Singapore. He also serves as the Associate Vice President (Capability Building) at NTU Singapore. Previously he served as the Associate Dean (Research) at College of Engineering (2018-2023), the acting Director for Nanyang Technopreneurship Center (NTC) (2017-2019) and the Assistant Chair (Innovation) at the School of Computer Science and Engineering (2016-2018), at NTU Singapore. He received his PhD degree in Electrical Engineering and Computer Science (minor in Western Literature) from Massachusetts Institute of Technology (MIT), Cambridge, USA, in 2008. Dr. Wen has published over 300 papers in top journals and prestigious conferences. His systems research has gained global recognitions. His work in Multi-Screen Cloud Social TV has been featured by global media (more than 1600 news articles from over 29 countries) and received ASEAN ICT Award 2013 (Gold Medal). His work on Cognitive Digital Twin for Data Centre, has won the 2015 Data Centre Dynamics Awards – APAC (the ‘Oscar’ award of data centre industry), 2016 ASEAN ICT Awards (Gold Medal), 2020 IEEE TCCPS Industrial Technical Excellence Award, 2021 W.Media APAC Cloud and Datacenter Technology Leader Award, and 2022 Singapore Computer Society Digital Achiever Tech Leader Award.  He was the winner of 2019 Nanyang Research Award and the sole winner of 2016 Nanyang Awards for Innovation and Entrepreneurship, both of which are the highest recognition at NTU. He is a co-recipient of multiple Best Paper Awards from top journals, including 2019 IEEE TCSVT and 2015 IEEE Multimedia, and at international conferences, including 2016 IEEE Globecom, 2016 IEEE Infocom MuSIC Workshop, 2015 EAI Chinacom, 2014 IEEE WCSP, 2013 IEEE Globecom and 2012 IEEE EUC. He is the Editor in Chief of IEEE Transactions on Multimedia (TMM), serves or has served on editorial boards for multiple IEEE and ACM transactions, and was elected as the Chair for IEEE ComSoc Multimedia Communication Technical Committee (2014-2016). His research interests include cloud computing, green data center, big data analytics, multimedia network and mobile computing. He is a Fellow of IEEE and Singapore Academy of Engineering, and an ACM Distinguished Member.