April 9, 2019

Juliana Freire (New York University)
Session Chair: Divesh Srivastava, AT&T Labs-Research

Jingren Zhou (Alibaba Group)
Session Chair: Wolfgang Lehner, TU Dresden

April 10, 2019

Michael Carey (UC Irvine and Couchbase, Inc.)
Session Chair: Xuemin Lin, University of New South Wales

Jianjun Chen (Huawei US Silicon Valley R&D Center)
Session Chair: Wenfei Fan, University of Edinburgh

April 11, 2019

Xin Luna Dong (Amazon)
Session Chair: Divesh Srivastava, AT&T Labs-Research

Georg Gottlob (University of Oxford)
Session Chair: Wenfei Fan, University of Edinburgh

Keynote (1): Towards Usability, Transparency, and Trust in Data-Driven Exploration

Abstract:Data-driven exploration has revolutionized science, industry and government alike. The abundance of data coupled with cheap and widely-available computing and storage resources has created a perfect storm that enabled this revolution. Now, the main bottleneck lies with people. To extract actionable insight from data, complex computational processes are required that are not only hard to assemble but that can also behave (and break) in unforeseen ways. Thus, when results are derived, an important question is whether you can trust them. Given that decisions are increasingly driven by data, erroneous conclusions can have serious consequences. In this talk, I will present recent research on techniques and systems that aim to empower domain experts to explore their own data, while supporting transparency, and enabling the experts to reason and build trust on the results they derive.
Juliana Freire is a Professor of Computer Science and Data Science at New York University. She is the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD) and a council member of the Computing Research Association’s Computing Community Consortium (CCC). Her research interests are in large-scale data analysis, curation and integration, visualization, provenance management, and web information discovery. She has made fundamental contributions to data management methods and tools that address problems introduced by emerging applications including urban analytics and computational reproducibility. Freire has published over 180 technical papers, several open-source systems, and is an inventor of 12 U.S. patents. She has co-authored 5 award-winning papers, including one that received the ACM SIGMOD Most Reproducible Paper Award. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She received M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook.

Keynote (2): Managing, Analyzing, and Learning Heterogeneous Graph Data: Challenges and Opportunities

Abstract: An increasing number of big data applications naturally represent data in graphs, in order to capture complex relationship and dynamic interaction among entities. These enterprise graphs typically contain billions of vertices and edges with rapid updates at massive scale, which imposes new challenges on graph data management, graph analysis, and graph learning. In this talk, I will present some interesting techniques and experiences of dealing with such challenges in the context of an e-commerce platform. I hope understanding of real-world challenges in big graph systems can inspire novel research ideas and opportunities.
Jingren Zhou is Vice President at Alibaba Group. He is responsible for driving data intelligence infrastructure and several key data-driven businesses at Alibaba. Specifically, he leads work to develop cloud-scale distributed computing platform, data analytic products, and various business solutions. He also leads work to develop advanced techniques for personalized search, product recommendation, and advertisement at Alibaba’s e-commerce platforms, including Taobao and Tmall. His research interests include cloud-computing, databases, and large scale machine learning systems. He received his PhD in Computer Science from Columbia University. He is a Fellow of IEEE.​​

Keynote (3): AsterixDB Mid-Flight: A Case Study in Building Systems in Academia

Abstract: Building large software systems is always a challenging venture, but especially so in academia. This talk will describe the experiences that the speaker and his partners in software crime at UC Irvine and UC Riverside had that culminated in the Big Data Management System now available as Apache AsterixDB. The talk will cover a mix of the history and technical content of the nearly ten-year-old project, starting with its inception during the MapReduce craze, and will describe the phases that the effort has gone through and some of the lessons learned along the way. The talk will also cover the speaker’s own reflections and opinions on the challenges of systems-building, as well as writing about it, in our current culture in academia. Included will be the case for doing this sort of thing – including the dangers of doing “systems” research in the absence of an actual system, and why the gain outweighs the pain of building (and also sharing) database software in academia. As of late 2018, Apache AsterixDB is also having commercial impact; it is the storage and parallel query engine underlying the new offering from Couchbase called Couchbase Analytics. The last part of the talk will explain how we are attempting to balance the uses of AsterixDB as (i) a generally available open source Apache software platform, (ii) an ongoing end-to-end research testbed for university students’ work, and (iii) the technology powering a commercial NoSQL product offering.

Michael Carey received his B.S. and M.S. degrees from Carnegie-Mellon University and his Ph.D. from the University of California, Berkeley. He is currently a Bren Professor of Information and Computer Sciences and Distinguished Professor of Computer Science at UC Irvine, where he leads the AsterixDB project, as well as a Consulting Architect at Couchbase, Inc. Before joining UCI in 2008, he worked at BEA Systems for seven years and led the development of their AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. He is an ACM Fellow, an IEEE Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests center around data-intensive computing and scalable data management (a.k.a. Big Data).

Keynote (4): Data Management at Huawei: Recent Accomplishments and Future Challenges

Abstract: Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. With integrated solutions across four key domains – telecom networks, IT, smart devices, and cloud services – Huawei is committed to bringing digital transformation to every person, home and organization for a fully connected, intelligent world. Founded in 1987, Huawei currently has more than 180,000 employees, and operates in more than 170 countries and regions with revenue over 100 billion USD in 2018.
Data management has been playing a key role in Huawei in all of the four key domains above. We have developed innovative products and solutions to support rapid business growth driven by customer requirements. While some data management problems are common, each domain also has its own special requirements and challenges. In this talk, we will go through some recent advancements in Huawei data management, such as our highly elastic in-memory database for telecom networks (GMDB) and our petabyte scale enterprise analytics platform (FusionInsight MPPDB). In addition, we will talk about new challenges in the data management area that we are actively working on, such as autonomous data management and device-edge-cloud collaboration data platform.

Dr. Jianjun Chen, technical VP of Huawei US Silicon Valley R&D Center, leads advanced database research and development in Huawei database group. He joined Huawei in January 2017 and is responsible for defining the vision, system architecture and key technologies of multiple Huawei database products. He is a seasoned database expert with over 16 years of R&D and technical leadership experience. His expertise includes cloud data management, query optimization, mobile cloud computing and NoSQL databases.
Dr. Chen received his Ph.D in 2001 from Computer Sciences department of University of Wisconsin, Madison, under Professor David DeWitt’s supervision. After graduation, he worked in Microsoft SQL Server Query Optimizer team and later became one of the founding engineers of Microsoft SQL Azure. After almost 9 years in Microsoft, he joined Yahoo! Lab Cloud Sciences group and worked on PNUTS system. Next, he worked in various Google Cloud Platform (GCP) and Ads Backend database teams for more than 4 years, including Cloud SQL, Cloud Datastore and F1. During his tenure in GCP, he was an active proponent of mobile cloud computing and led a team across GCP and Android to build an MBaaS (a.k.a. mobile backend as a service) product from scratch.
Dr. Chen is a recipient of the SIGMOD 10 year Test-Time award in 2010 for the visionary work in scalable continuous query processing, as part of his Ph.D dissertation research.

Keynote (5): Building a Broad Knowledge Graph for Products

Abstract: Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph. In this talk we describe our efforts in building a broad product graph, a graph that starts shallow with core entities and relationships, and allows easily adding verticals and relationships in a pay-as-you-go fashion. We describe our efforts on knowledge extraction, linkage, and cleaning to significantly improve the coverage and quality of product knowledge. We also present our progress towards our moon-shot goals including harvesting knowledge from the web, hands-off-the-wheel knowledge integration and cleaning, human-in-the-loop knowledge learning, and graph mining and graph-enhanced search.
Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, was awarded ACM Distinguished Member, VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”, and Best Demo award in Sigmod 2005. She serves in VLDB endowment and PVLDB advisory committee, and is the PC co-chair for VLDB 2021, ICDE Industry 2019, VLDB Tutorial 2019, Sigmod 2018 and WAIM 2015.​​

Keynote (6): Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology

Abstract: Many modern companies wish to maintain knowledge in the form of a corporate knowledge graph and to use and manage this knowledge via a knowledge graph management system (KGMS). We formulate various requirements for a fully-fledged KGMS. In particular, such a system must be capable of performing complex reasoning tasks but, at the same time, achieve efficient and scalable reasoning over Big Data with an acceptable computational complexity. Moreover, a KGMS needs interfaces to corporate databases, the web, and machine learning and analytics packages. We present KRR formalisms and a system achieving these goals, and give examples of applications where machine learning and logical reasoning complement each other.
Georg Gottlob is Professor of Informatics at Oxford University and at TU Wien. His interests include Query optimization, web data processing, AI, knowledge representation, reasoning over big data, and knowledge graphs. Gottlob has received the Wittgenstein Award (Austria) and the Ada Lovelace Medal (UK).He is an ACM Fellow, an ECCAI Fellow, a Fellow of the Royal Society, and a member of the Austrian and the German academies of Sciences, and the Academia Europaea. He chaired the Program Committees of IJCAI 2003 and ACM PODS 2000. He was the main founder of Lixto, a web data extraction software company, which was acquired by McKinsey in 2013. Gottlob was awarded an ERC Advanced Investigator’s Grant for the project “DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology”. Based on results of this project, he co-founded Wrapidity Ltd, a company that specialises in fully automated web data extraction that was recently acquired by the Meltwater Media Intelligence Corporation. More recently, Georg co-founded the Knowledge Graph start-up DeepReason.ai as a spin-out of Oxford University.