Guest Lecture by Jorge Cardoso (Huawei): AIOps — Anomalous Span Detection in Distributed Traces Using Deep Learning
We are delighted to announce that Dr. Jorge Cardoso will be giving a guest lecture in our machine-learning module, on Wednesday, the 2nd of October at 12:00 o’clock in the Joly Theatre (Hamilton Building). Guests are very welcome to join.
Jorge Cardoso is Chief Architect for Planet-scale AIOps at Huawei’s Ireland and Munich Research Centers. Previously he worked for several major companies such as SAP Research (Germany) on the Internet of Services and the Boeing Company in Seattle (USA) on Enterprise Application Integration. He previously gave lectures at the Karlsruhe Institute of Technology (Germany), University of Georgia (USA), University of Coimbra and University of Madeira (Portugal). His current research involves the development of the next generation of AIOps platforms, Cloud Operations and Analytics tools driven by AI, Cloud Reliability and Resilience, and High-Performance Business Process Management systems. He has a Ph.D. in Computer Science from the University of Georgia (USA).
Title: AIOps — Anomalous Span Detection in Distributed Traces Using Deep Learning
Abstract: The field of AIOps, also known as Artificial Intelligence for IT Operations, uses algorithms and machine learning to dramatically improve the monitoring, operation, and maintenance of distributed systems. Its main premise is that operations can be automated using monitoring data to reduce the workload of operators (e.g., SREs or production engineers). Our current research explores how AIOps – and many related fields such as deep learning, machine learning, distributed traces, graph analysis, time-series analysis, sequence analysis, and log analysis – can be explored to effectively detect, localize, and remediate failures in large-scale cloud infrastructures (>50 regions and AZs). In particular, this lecture will describe how a particular monitoring data structure, called distributed trace, can be analyzed using deep learning to identify anomalies in its spans. This capability empowers operators to quickly identify which components of a distributed system are faulty.