Jianjun Chen (Huawei US Silicon Valley R&D Center)

AsterixDB Mid-Flight: A Case Study in Building Systems in Academia
Abstract: Building large software systems is always a challenging venture, but especially so in academia. This talk will describe the experiences that the speaker and his partners in software crime at UC Irvine and UC Riverside had that culminated in the Big Data Management System now available as Apache AsterixDB. The talk will cover a mix of the history and technical content of the nearly ten-year-old project, starting with its inception during the MapReduce craze, and will describe the phases that the effort has gone through and some of the lessons learned along the way. The talk will also cover the speaker’s own reflections and opinions on the challenges of systems-building, as well as writing about it, in our current culture in academia. Included will be the case for doing this sort of thing – including the dangers of doing “systems” research in the absence of an actual system, and why the gain outweighs the pain of building (and also sharing) database software in academia. As of late 2018, Apache AsterixDB is also having commercial impact; it is the storage and parallel query engine underlying the new offering from Couchbase called Couchbase Analytics. The last part of the talk will explain how we are attempting to balance the uses of AsterixDB as (i) a generally available open source Apache software platform, (ii) an ongoing end-to-end research testbed for university students’ work, and (iii) the technology powering a commercial NoSQL product offering.

Michael Carey received his B.S. and M.S. degrees from Carnegie-Mellon University and his Ph.D. from the University of California, Berkeley. He is currently a Bren Professor of Information and Computer Sciences and Distinguished Professor of Computer Science at UC Irvine, where he leads the AsterixDB project, as well as a Consulting Architect at Couchbase, Inc. Before joining UCI in 2008, he worked at BEA Systems for seven years and led the development of their AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. He is an ACM Fellow, an IEEE Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests center around data-intensive computing and scalable data management (a.k.a. Big Data).



Building a broad knowledge graph for products
Abstract: Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph. In this talk we describe our efforts in building a broad product graph, a graph that starts shallow with core entities and relationships, and allows easily adding verticals and relationships in a pay-as-you-go fashion. We describe our efforts on knowledge extraction, linkage, and cleaning to significantly improve the coverage and quality of product knowledge. We also present our progress towards our moon-shot goals including harvesting knowledge from the web, hands-off-the-wheel knowledge integration and cleaning, human-in-the-loop knowledge learning, and graph mining and graph-enhanced search.

Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, was awarded ACM Distinguished Member, VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”, and Best Demo award in Sigmod 2005. She serves in VLDB endowment and PVLDB advisory committee, and is the PC co-chair for VLDB 2021, ICDE Industry 2019, VLDB Tutorial 2019, Sigmod 2018 and WAIM 2015.​​


Data Management at Huawei: Recent Accomplishments and Future Challenges
Abstract: Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. With integrated solutions across four key domains – telecom networks, IT, smart devices, and cloud services – Huawei is committed to bringing digital transformation to every person, home and organization for a fully connected, intelligent world. Founded in 1987, Huawei currently has more than 180,000 employees, and operates in more than 170 countries and regions with revenue over 100 billion USD in 2018.
Data management has been playing a key role in Huawei in all of the four key domains above. We have developed innovative products and solutions to support rapid business growth driven by customer requirements. While some data management problems are common, each domain also has its own special requirements and challenges. In this talk, we will go through some recent advancements in Huawei data management, such as our highly elastic in-memory database for telecom networks (GMDB) and our petabyte scale enterprise analytics platform (FusionInsight MPPDB). In addition, we will talk about new challenges in the data management area that we are actively working on, such as autonomous data management and device-edge-cloud collaboration data platform.

Dr. Jianjun Chen, technical VP of Huawei US Silicon Valley R&D Center, leads advanced database research and development in Huawei database group. He joined Huawei in January 2017 and is responsible for defining the vision, system architecture and key technologies of multiple Huawei database products. He is a seasoned database expert with over 16 years of R&D and technical leadership experience. His expertise includes cloud data management, query optimization, mobile cloud computing and NoSQL databases.
Dr. Chen received his Ph.D in 2001 from Computer Sciences department of University of Wisconsin, Madison, under Professor David DeWitt’s supervision. After graduation, he worked in Microsoft SQL Server Query Optimizer team and later became one of the founding engineers of Microsoft SQL Azure. After almost 9 years in Microsoft, he joined Yahoo! Lab Cloud Sciences group and worked on PNUTS system. Next, he worked in various Google Cloud Platform (GCP) and Ads Backend database teams for more than 4 years, including Cloud SQL, Cloud Datastore and F1. During his tenure in GCP, he was an active proponent of mobile cloud computing and led a team across GCP and Android to build an MBaaS (a.k.a. mobile backend as a service) product from scratch.
Dr. Chen is a recipient of the SIGMOD 10 year Test-Time award in 2010 for the visionary work in scalable continuous query processing, as part of his Ph.D dissertation research.