Keynotes and ICDT Invited Talk
Title: The Quest for Knowledge
Aalborg University, Denmark
Throughout the entire history of mankind, humans have always strived to acquire new knowledge. In a way, striving for knowledge is still what unites researchers across all modern research disciplines. Yet, every single one of them has a slightly different understanding. Read More
About the speaker:
Katja Hose is a professor in Computer Science at Aalborg University. Prior to joining Aalborg University, she was a postdoc at the Max Planck Institute for Informatics in Saarbrücken, Germany, and earned her PhD in Computer Science from Ilmenau University of Technology, Germany.
Her research is rooted in databases and Semantic Web technologies and spans theory, algorithms, and applications of Data Science and Web Science incl. knowledge management, querying, analytics, publishing, and extracting. She has co-authored more than 100 peer-reviewed scientific publications and regularly serves as a reviewer for databases and Semantic Web conferences and journals. She has served in many different roles for a broad range of international conferences incl. VLDB, SIGMOD, ICDE, TheWebConf/WWW, and ISWC. More details are available at http://www.cs.aau.dk/~khose.
Title: Explainability Queries for ML Models and its Connections with Data Management Problems
Pontifical Catholic University of Chile, Chile
In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Read More
The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time.
The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.
About the speaker:
Full Professor at Pontificia Universidad Católica de Chile, where he also acts as Director of the Institute for Mathematical and Computational Engineering. He is the author of more than 80 technical papers, has chaired ICDT 2019, will be chairing ACM PODS 2022, and is currently a member of the editorial committee of Logical Methods in Computer Science. From 2011 to 2014 he was the editor of the database theory column of SIGMOD Record. His areas of interest are database theory, logic in computer science, and the emerging relationship between these areas and machine learning.
Title: Data Profiling – A look back and a look forward
Hasso Plattner Institute (HPI), Germany
Data profiling is the act of extracting many different types of metadata from a given dataset. This research area has recently thrived, due to (i) its simple problem statements, such as “discover all key candidates”, (ii) the high computational complexity of the problems, which are often exponential in the number of columns, (iii) the manifold opportunities for optimizations, such as apriori-inspired pruning or data sampling, and (iv) the various application areas for data profiling results, such as query optimization and data cleaning. Read More
About the speaker:
Prof. Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he completed his PhD thesis in the area of data quality at Humboldt University of Berlin in 2000.
In 2001 and 2002 he worked at the IBM Almaden Research Center on data integration topics. From 2003 – 2006 he was assistant professor for information integration, again at the Humboldt-University of Berlin. Since 2006 he holds the chair for information systems at the Hasso Plattner Institute (HPI) at the University of Potsdam in Germany. He has been visiting researcher at QCRI, AT&T Research, IBM Research, and SAP. His research interests include data profiling, data cleansing, and data integration with over 200 scientific publications. Next to numerous PC memberships for international conferences, he has organized several conferences in various roles including VLDB 2021 as PC co-chair, and he is trustee of the VLDB Endowment. More details are at https://hpi.de/naumann/people/felix-naumann.html.
Title: Comparing Apples and Oranges: Fairness and Diversity in Ranking
New York University, USA
Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based; it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k.Read More
In the past few years, there has been much work on incorporating fairness and diversity requirements into algorithmic rankers, with contributions coming from the data management, algorithms, information retrieval, and recommender systems communities. In my talk I will offer a broad perspective that connects formalizations and algorithmic approaches across subfields, grounding them in a common narrative around the value frameworks that motivate specific fairness- and diversity-enhancing interventions. I will discuss some recent and ongoing work, and will outline future research directions where the data management community is well-positioned to make lasting impact, especially if we attack these problems with our rich theory-meets-systems toolkit.
About the speaker:
Julia Stoyanovich is an Assistant Professor in the Department of Computer Science and Engineering at the Tandon School of Engineering, and the Center for Data Science. She is a recipient of an NSF CAREER award and of an NSF/CRA CI Fellowship. Julia’s research focuses on responsible data management and analysis practices: on operationalizing fairness, diversity, transparency, and data protection in all stages of the data acquisition and processing lifecycle. She established the Data, Responsibly consortium, and serves on the New York City Automated Decision Systems Task Force (by appointment by Mayor de Blasio). In addition to data ethics, Julia works on management and analysis of preference data, and on querying large evolving graphs. She holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst.
ICDT Invited Talk
Title: What Makes a Variant of Query Determinacy (Un) Decidable?
University of Wrocław, Poland
Suppose there is a database we have no direct access to, but there are views of this database available to us, defined by some queries Q_1, Q_2 ,. . . Q_k. And we are given another query Q. Will we be able to compute Q only using the available views?
The above question, calling it “the question of determinacy”, sounds almost philosophical. One can easily imagine a bearded man in himation chained to the wall of a cave watching the views projected on the wall and pondering whether, from what he is able to see, the reality can be faithfully reconstructed.Read More
For us it is a database theory question though. And a really well motivated one, with motivations ranging from query evaluation plans optimization (where we prefer a positive answer) to privacy issues (where the preferred answer is negative).
Query determinacy is a broad topic, with literally hundreds of papers published since the late 1980s. This talk is not going to be a “survey” (which would be impossible, within a one hour time frame, and with this speaker), but rather a personal perspective of a person somehow involved in recent developments in the area.
First I will explain how, in the last 30+ years, the question of determinacy was formalized. There are many parameters here: obviously one needs to choose the query language of the queries Q_i and the query language of Q. But – surprisingly – there is also some choice as to what the word “to compute” actually means in this context.
Then I will concentrate on the variants of the decision problem of determinacy (for each choice of parameters there is one such problem–Q_1 , Q_2 , . . . Q_k and Q constitute the instance, and the question is whether Q_1 , Q_2 , . . . Q_k determine Q) and I will talk about how I understand the mechanisms rendering different variants of determinacy decidable or undecidable. This will be on a slightly informal level. No new theorems will be presented, but I think I will be able to show simplified proofs of some of the earlier results.
About the speaker:
Jerzy Marcinkowski received his PhD degree, in 1993 , from the Mathematical Department of the University of Wrocław and spent his entire academic career (with the exception of a three semesters long post-doc at the Laboratoire d’I Informatique Fondamentale de Lille in France) working for the University of Wrocław, where he is currently serves as the head of the Department of Computer Science.
His modus operandi is to try to attack well-abstracted, long standing open problems in logic in computer science, including database theory. Database theory people may know him for his work on DATALOG single rule program boundedness (1996), database repairs and consistent query answering, including prioritized repair (with Jan Chomicki and Sławek Staworko; 2000s) and, more recently, for his work on the Chase algorithm, where he proved (with his student Tomasz Gogacz) that all-instances chase termination is undecidable (2014). His main topic of interest in the last five years has been query determinacy.