Big Sequence Management: Scaling up and Out (3h)
Karima Echihabi (Mohammed VI Polytechnic University, Morocco)
Kostas Zoumpatianos (LIPADE, Université de Paris, France)
Themis Palpanas (LIPADE, Université de Paris & French University Institute (IUF), France)
Data series are a prevalent data type that has attracted lots of interest in recent years. Specifically, there has been an explosive interest towards the analysis of large volumes of data series in many different domains. This is both in businesses (e.g., in mobile applications) and in sciences (e.g., in biology). In this tutorial, we focus on applications that produce massive collections of data series, and we provide the necessary background on data series storage, retrieval and analytics. We look at systems historically used to handle and mine data in the form of data series, as well as at the state of the art data series management systems that were recently proposed. Moreover, we discuss the need for fast similarity search for supporting data mining applications, and describe efficient similarity search techniques, indexes and query processing algorithms. Finally, we look at the gap of modern data series management systems in regards to support for efficient complex analytics, and we argue in favor of the integration of summarizations and indexes in modern data series management systems. We conclude with the challenges and open research problems in this domain.
Karima Echihabi is an Assistant Professor at Mohammed VI Polytechnic University (UM6P) in Morocco. She is interested in scalable data analytics and data series management and has performed an extensive analysis of data series indexes. She holds a PhD degree from Mohammed V University (Morocco) and the University of Paris (France) and a Masters Degree in Computer Science from the University of Toronto. She has worked as a software engineer in the Windows team at Microsoft, Redmond (USA), and the Query Optimizer team at the IBM Toronto Lab (Canada).
Kostas Zoumpatianos is a Software Engineer at Snowflake Computing. He has been a Marie Curie Fellow at the University of Paris and a postdoctoral researcher at Harvard University. He got his PhD from the University of Trento in topics related to indexing and managing large collections of data series. He also holds a M.Sc. in Information Management and a Dipl.Eng. in Information and Communication Systems Engineering from the University of the Aegean in Greece.
Themis Palpanas is Senior Member of the French University Institute (IUF), a distinction that recognizes excellence across all academic disciplines, and professor of computer science at the University of Paris (France), where he is director of the Data Intelligence Institute of Paris (diiP), and director of the data management group, diNo. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. His interests include problems related to data science (big data analytics and machine learning applications). He is the author of 9 US patents and 2 French patents. He is the recipient of 3 Best Paper awards, and the IBM Shared University Research (SUR) Award. He is currently serving on the VLDB Endowment Board of Trustees, and as an Editor in Chief for the BDR Journal. He has served as General Chair for VLDB 2013, and in the program committees of all major conferences in the areas of data management and analysis.
Deep Learning Approaches for Text-to-SQL Systems (1.5h)
George Katsogiannis-Meimarakis (Athena Research Center, Greece)
Georgia Koutrika (Athena Research Center, Greece)
To bridge the gap between users and data, numerous text-to-SQL systems have been developed that allow users to pose natural language questions over relational databases. Recently, novel text-to-SQL systems are adopting deep learning methods with very promising results. At the same time, several challenges remain open making this area an active and flourishing field of research and development. To make real progress in building text-to-SQL systems, we need to de-mystify what has been done, understand how and when each approach can be used, and, finally, identify the research challenges ahead of us. The purpose of this tutorial is to provide a systematic study of the recent advances of deep learning techniques for text-to-SQL translation, and to highlight open problems and new research opportunities for researchers and practitioners in the fields of database systems, natural language processing and deep learning.
George Katsogiannis-Meimarakis is a research assistant at Athena Research Center in Athens, Greece, where he works at the INODE (Intelligent Open Data Exploration) project, focusing on the text-to-SQL problem. He is a graduate of the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens, where he completed his thesis with the title “Translating Natural Language to SQL using Deep Learning”. Currently, he is attending a MSc programme on Data Science and Information Technologies with a specialisation on Artificial Intelligence and Big Data.
Dr. Georgia Koutrika is Research Director at Athena Research Center, Greece. She has more than 15 years of experience in multiple roles at HP Labs, IBM Almaden, and Stanford. Her work focuses on data exploration, recommendations, and data analytics, and has been incorporated in commercial products, described in 13 granted patents and 16 patent applications in the US and worldwide, and published in more than 90 papers in top-tier conferences and journals. Her academic activities include: Editor-in-chief for VLDB Journal, PC co-chair for VLDB 2023, associate editor for TKDE, SIGMOD2021 and VLDB2022, and ICDE2021 sponsorship chair, and general chair for ACM SIGMOD 2016.
Tutorial on the History of Knowledge Graph’s Main Ideas (1.5h)
Claudio Gutierrez (DCC, Universidad de Chile and IMFD, Chile)
Juan F. Sequeda (data.world, USA)
Knowledge Graphs can be considered as fulfilling an early vision in Computer Science of creating intelligent systems that integrate knowledge and data at large scale. Stemming from scientific advancements in research areas of Semantic Web, Databases, Knowledge representation, NLP, Machine Learning, among others, Knowledge Graphs have rapidly gained popularity in academia and industry in the past years. The integration of such disparate disciplines and techniques give the richness to Knowledge Graphs, but also present the challenge to practitioners and theoreticians to know how current advances develop from early techniques in order, on one hand, take full advantage of them, and on the other, avoid reinventing the wheel. This tutorial will provide a historical context on the roots of Knowledge Graphs grounded in the advancements of Logic, Data, and the combination thereof.
Claudio Gutierrez is full professor at the Computer Science Department, Universidad de Chile and Senior Research at the Millenium Institute for Foundation of Data. His research experiences lies in the intersection of Databases and the Semantic Web, focusing in data models and query languages for RDF layer, particularly RDF and SPARQL.
Juan Sequeda is the Principal Scientist at data.world. He joined through the acquisition of Capsenta, a company he founded as a spin-off from his research. He holds a PhD in Computer Science from The University of Texas at Austin. His research interests are at the intersection of Logic and Data for (ontology-based) data integration and semantic/graph data management and Knowledge Graphs.
Tutorial on the Internals of Permissioned Blockchains and on How to Build Applications with Hyperledger Fabric (3h)
Zsolt István (IT University of Copenhagen, Denmark)
Permissioned blockchains are becoming increasingly mainstream and are being considered for solving problems similar to what databases have traditionally solved, with the main difference that permissioned blockchains distribute trust and can work even with several participants who do not fully trust each other. As a result, there are numerous research proposals in the intersection of databases and blockchains. Sadly, there are still many misconceptions about this technology which leads to confusion in the community. The main goal of the tutorial is to provide a background on the technology and to contrast it with public, permissionless, blockchains. We will familiarize participants with the internals of permissioned blockchains and explain how they can be used in non-cryptocurrency scenarios. Through a hands-on part, participants will “learn by doing” how some of the most promising use-cases of permissioned blockchains translate to actual smart contracts. We will focus on a supply-chain management-like application and, as our target platform, we will use Hyperledger Fabric 1.4 LTS, an open-source, modular, widely used enterprise blockchain platform.
The tutorial will be held by Zsolt István. He is an Associate Professor at the IT University of Copenhagen. Before that, he was an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain, with years of experience in databases, distributed systems, and FPGA programming. He holds a PhD and MSc in Computer Science from ETH Zurich, Switzerland and a BSc in Computer Science from UT Cluj-Napoca, Romania. His personal website is at: https://zistvan.github.io.